dask_awkward.report_necessary_buffers#
- dask_awkward.report_necessary_buffers(*args, traverse=True)[source]#
Determine the buffer keys necessary to compute a collection.
- Parameters:
*args (Dask collections or HighLevelGraphs) – The collection (or collection graph) of interest. These can be individual objects, lists, sets, or dictionaries.
traverse (bool, optional) – If True (default), builtin Python collections are traversed looking for any Dask collections they might contain.
- Returns:
Mapping that pairs the input layers in the graph to objects describing the data and shape buffers that have been tagged as required by column optimisation of the given layer.
- Return type:
Examples
If we have a hypothetical parquet dataset (
ds
) with the fields“foo”
“bar”
“baz”
And the “baz” field has fields
“x”
“y”
The calculation of
ds.bar + ds.baz.x
will only require thebar
andbaz.x
columns from the parquet file.>>> import dask_awkward as dak >>> ds = dak.from_parquet("some-dataset") >>> ds.fields ["foo", "bar", "baz"] >>> ds.baz.fields ["x", "y"] >>> x = ds.bar + ds.baz.x >>> dak.report_necessary_buffers(x) { "from-parquet-abc123": NecessaryBuffers( data_and_shape=frozenset(...), shape_only=frozenset(...) ) }