dask_awkward.to_parquet#
- dask_awkward.to_parquet(array, destination, list_to32=False, string_to32=True, bytestring_to32=True, emptyarray_to=None, categorical_as_dictionary=False, extensionarray=False, count_nulls=True, compression='zstd', compression_level=None, row_group_size=67108864, data_page_size=None, parquet_flavor=None, parquet_version='2.4', parquet_page_version='1.0', parquet_metadata_statistics=True, parquet_dictionary_encoding=False, parquet_byte_stream_split=False, parquet_coerce_timestamps=None, parquet_old_int96_timestamps=None, parquet_compliant_nested=False, parquet_extra_options=None, storage_options=None, write_metadata=False, compute=True, prefix=None)[source]#
Write data to Parquet format.
This will create one output file per partition.
See the documentation for
ak.to_parquet()
for more information; there are many optional function arguments that are described in that documentation.- Parameters:
array (Array) – The
dask_awkward.Array
collection to write to disk.destination (str) – Where to store the output; this can be a local filesystem path or a remote filesystem path.
list_to32 (bool) – See
ak.to_parquet()
string_to32 (bool) – See
ak.to_parquet()
bytestring_to32 (bool) – See
ak.to_parquet()
emptyarray_to (Any | None) – See
ak.to_parquet()
categorical_as_dictionary (bool) – See
ak.to_parquet()
extensionarray (bool) – See
ak.to_parquet()
count_nulls (bool) – See
ak.to_parquet()
compression (str | dict | None) – See
ak.to_parquet()
compression_level (int | dict | None) – See
ak.to_parquet()
row_group_size (int | None) – See
ak.to_parquet()
data_page_size (int | None) – See
ak.to_parquet()
parquet_flavor (Literal['spark'] | None) – See
ak.to_parquet()
parquet_version (Literal['1.0', '2.4', '2.6']) – See
ak.to_parquet()
parquet_page_version (Literal['1.0', '2.0']) – See
ak.to_parquet()
parquet_metadata_statistics (bool | dict) – See
ak.to_parquet()
parquet_dictionary_encoding (bool | dict) – See
ak.to_parquet()
parquet_byte_stream_split (bool | dict) – See
ak.to_parquet()
parquet_coerce_timestamps (Literal['ms'] | ~typing.Literal['us'] | None) – See
ak.to_parquet()
parquet_old_int96_timestamps (bool | None) – See
ak.to_parquet()
parquet_compliant_nested (bool) – See
ak.to_parquet()
parquet_extra_options (dict | None) – See
ak.to_parquet()
storage_options (dict[str, Any] | None) – Storage options passed to
fsspec
.write_metadata (bool) – Write Parquet metadata.
compute (bool) – If
True
, immediately compute the result (write data to disk). IfFalse
a Scalar collection will be returned such thatcompute
can be explicitly called.prefix (str | None) – An addition prefix for output files. If
None
all parts inside the destination directory will be named"partN.parquet"
; if defined, the names will bef"{prefix}-partN.parquet"
.
- Returns:
If
compute
isFalse
adask_awkward.Scalar
object is returned such that it can be computed later. Ifcompute
isTrue
, the collection is immediately computed (and data will be written to disk) andNone
is returned.- Return type:
Scalar | None
Examples
>>> import awkward as ak >>> import dask_awkward as dak >>> a = ak.Array([{"a": [1, 2, 3]}, {"a": [4, 5]}]) >>> d = dak.from_awkward(a, npartitions=2) >>> d.npartitions 2 >>> dak.to_parquet(d, "/tmp/my-output", prefix="data") >>> import os >>> os.listdir("/tmp/my-output") ['data-part0.parquet', 'data-part1.parquet']