dask_awkward.necessary_columns

dask_awkward.necessary_columns¶

dask_awkward.necessary_columns(*args, traverse=True)[source]¶

Determine the columns necessary to compute a collection.

Parameters: *args (Dask collections or HighLevelGraphs) – The collection (or collection graph) of interest. These can be individual objects, lists, sets, or dictionaries.
Returns: Mapping that pairs the input layers in the graph to the columns that have been determined necessary from that layer. These are not necessarily in the same order as the original input.
Return type: dict[str, list[str]]

Examples

If we have a hypothetical parquet dataset (ds) with the fields

“foo”
“bar”
“baz”

And the “baz” field has fields

“x”
“y”

The calculation of ds.bar + ds.baz.x will only require the bar and baz.x columns from the parquet file.

>>> import dask_awkward as dak
>>> ds = dak.from_parquet("some-dataset")
>>> ds.fields
["foo", "bar", "baz"]
>>> ds.baz.fields
["x", "y"]
>>> x = ds.bar + ds.baz.x
>>> dak.necessary_columns(x)
{"from-parquet-abc123": ["bar", "baz.x"]}

Notice that foo and baz.y are not determined to be necessary.

Utilities

Contributing

dask-awkward 2023.5.0 documentation

dask_awkward.necessary_columns

dask_awkward.necessary_columns¶