dask_awkward.from_parquet#
- dask_awkward.from_parquet(path, *, columns=None, max_gap=64000, max_block=256000000, footer_sample_size=1000000, generate_bitmasks=False, highlevel=True, behavior=None, attrs=None, ignore_metadata=True, scan_files=False, split_row_groups=False, storage_options=None, report=False)[source]#
Create an Array collection from a Parquet dataset.
See
ak.from_parquet()
for more information.- Parameters:
path (str | list[str]) – Local directory containing parquet files, remote URL directory containing Parquet files, or explicit list of Parquet files, passed to fsspec for resolution. May contain glob patterns.
columns (str | list[str] | None) – See
ak.from_parquet()
max_gap (int) – See
ak.from_parquet()
max_block (int) – See
ak.from_parquet()
footer_sample_size (int) – See
ak.from_parquet()
generate_bitmasks (bool) – See
ak.from_parquet()
highlevel (bool) – Argument specific to awkward-array that is always
True
for dask-awkward.behavior (Mapping | None) – See
ak.from_parquet()
ignore_metadata (bool) – If
True
, ignore Parquet metadata file (if it exists).scan_files (bool) – Scan files when parsing metadata.
split_row_groups (bool | None) – If True, each row group becomes a partition. If False, each file becomes a partition. If None, the existence of a
_metadata
file and ignore_metadata=False implies True, elseFalse
.storage_options (dict[str, Any] | None) – Storage options passed to fsspec.
report (bool)
- Returns:
Collection represented by the Parquet data on disk.
- Return type: