dask_awkward.from_parquet

dask_awkward.from_parquet

dask_awkward.from_parquet(path, storage_options=None, ignore_metadata=True, scan_files=False, columns=None, filters=None, split_row_groups=None)[source]

Read parquet dataset into an Array collection.

Parameters
  • url (str) – Location of data, including protocol (e.g. s3://)

  • storage_options (dict) – For creating filesystem (see fsspec documentation).

  • ignore_metadata (bool) – Ignore parquet metadata associated with the input dataset (the _metadata file).

  • scan_files (bool) – TBD

  • columns (list[str], optional) – Select columns to load

  • filters (list[list[tuple]], optional) – Parquet-style filters for excluding row groups based on column statistics

  • split_row_groups (bool, optional) – If True, each row group becomes a partition. If False, each file becomes a partition. If None, the existence of a _metadata file and ignore_metadata=False implies True, else False.

Returns

Array collection from the parquet dataset.

Return type

Array