Experiment: return PyArrow Dataset from read_xarray#95
Experiment: return PyArrow Dataset from read_xarray#95RohanDisa wants to merge 6 commits intoalxmrs:mainfrom
Conversation
|
In my side experiments, I found something out: the default version of pyarrow that the project uses doesn't allow one to pass in a Thus, I recommend we add pyarrow as an explicit dependency to the project, and further, we upgrade all the minimum versions of the project s.t. the default pyarrow version supports the above property. The current pyarrow version of 22.0.0 would fit this bill. |
… order to have the most current pyarrow dependency version. This is necessary for lazy evaluation.
|
TODO:
|
|
Ideally, we could combine the two helper functions into one. I don't think they need to be in the closure of this function either. |
|
I tested the current implementation in a colab notebook, and it is not lazy. I think we need to subtly change how the record batch reader is used to get it to be lazy. |
This PR experiments with returning a PyArrow Dataset from read_xarray instead of a RecordBatchReader.
The goal is to explore whether registering chunked InMemoryDatasets improves or enables more lazy behavior when integrating with Arrow/DataFusion, in light of the ongoing discussion around dataset registration and laziness.
Issue: #93