dask_awkward.report_necessary_buffers
dask_awkward.report_necessary_buffers¶
- dask_awkward.report_necessary_buffers(*args, traverse=True)[source]¶
Determine the buffer keys necessary to compute a collection.
- Parameters
*args (Dask collections or HighLevelGraphs) – The collection (or collection graph) of interest. These can be individual objects, lists, sets, or dictionaries.
traverse (bool, optional) – If True (default), builtin Python collections are traversed looking for any Dask collections they might contain.
args (Any) –
- Returns
Mapping that pairs the input layers in the graph to objects describing the data and shape buffers that have been tagged as required by column optimisation of the given layer.
- Return type
Examples
If we have a hypothetical parquet dataset (
ds
) with the fields“foo”
“bar”
“baz”
And the “baz” field has fields
“x”
“y”
The calculation of
ds.bar + ds.baz.x
will only require thebar
andbaz.x
columns from the parquet file.>>> import dask_awkward as dak >>> ds = dak.from_parquet("some-dataset") >>> ds.fields ["foo", "bar", "baz"] >>> ds.baz.fields ["x", "y"] >>> x = ds.bar + ds.baz.x >>> dak.report_necessary_buffers(x) { "from-parquet-abc123": NecessaryBuffers( data_and_shape=frozenset(...), shape_only=frozenset(...) ) }