Data IO
Data IO¶
Creating dask-awkward collections typically begins with reading from either local disk or cloud storage. There is built-in support for datasets stored in Parquet or JSON format.
Take this code-block for example:
>>> import dask_awkward as dak
>>> ds1 = dak.from_parquet("s3://path/to/dataset")
>>> ds2 = dak.from_json("/path/to/json-files/*.json")
Both the from_parquet()
and
from_json()
calls will create new
dask_awkward.Array
instances. In the Parquet example we will
read data from Amazon S3; in the JSON example we’re reading data from
local disk (notice the wildcard syntax: all JSON files in that
directory will be discovered, and each file will become a partition in
the collection).
Support for the ROOT file format is provided by the Uproot project.
It’s also possible to instantiate dask-awkward
dask_awkward.Array
instances from other Dask collections
(like dask.array.Array
), or concrete objects like existing
awkward Array instances or Python lists.
See the IO API docs page for more information on the possible ways to instantiate a new dask-awkward Array.