dask_awkward.from_parquet

Contents

dask_awkward.from_parquet#

dask_awkward.from_parquet(path, *, columns=None, max_gap=64000, max_block=256000000, footer_sample_size=1000000, generate_bitmasks=False, highlevel=True, behavior=None, attrs=None, ignore_metadata=True, scan_files=False, split_row_groups=False, storage_options=None, report=False)[source]#

Create an Array collection from a Parquet dataset.

See ak.from_parquet() for more information.

Parameters:
  • path (str | list[str]) – Local directory containing parquet files, remote URL directory containing Parquet files, or explicit list of Parquet files, passed to fsspec for resolution. May contain glob patterns.

  • columns (str | list[str] | None) – See ak.from_parquet()

  • max_gap (int) – See ak.from_parquet()

  • max_block (int) – See ak.from_parquet()

  • footer_sample_size (int) – See ak.from_parquet()

  • generate_bitmasks (bool) – See ak.from_parquet()

  • highlevel (bool) – Argument specific to awkward-array that is always True for dask-awkward.

  • behavior (Mapping | None) – See ak.from_parquet()

  • ignore_metadata (bool) – If True, ignore Parquet metadata file (if it exists).

  • scan_files (bool) – Scan files when parsing metadata.

  • split_row_groups (bool | None) – If True, each row group becomes a partition. If False, each file becomes a partition. If None, the existence of a _metadata file and ignore_metadata=False implies True, else False.

  • storage_options (dict[str, Any] | None) – Storage options passed to fsspec.

  • attrs (Mapping[str, Any] | None)

  • report (bool)

Returns:

Collection represented by the Parquet data on disk.

Return type:

Array