dask_awkward.to_parquet

dask_awkward.to_parquet

dask_awkward.to_parquet(array, destination, list_to32=False, string_to32=True, bytestring_to32=True, emptyarray_to=None, categorical_as_dictionary=False, extensionarray=False, count_nulls=True, compression='zstd', compression_level=None, row_group_size=67108864, data_page_size=None, parquet_flavor=None, parquet_version='2.4', parquet_page_version='1.0', parquet_metadata_statistics=True, parquet_dictionary_encoding=False, parquet_byte_stream_split=False, parquet_coerce_timestamps=None, parquet_old_int96_timestamps=None, parquet_compliant_nested=False, parquet_extra_options=None, storage_options=None, write_metadata=False, compute=True, prefix=None)[source]

Write data to Parquet format.

This will create one output file per partition.

See the documentation for ak.to_parquet() for more information; there are many optional function arguments that are described in that documentation.

Parameters
Returns

If compute is False a dask_awkward.Scalar object is returned such that it can be computed later. If compute is True, the collection is immediately computed (and data will be written to disk) and None is returned.

Return type

Scalar | None

Examples

>>> import awkward as ak
>>> import dask_awkward as dak
>>> a = ak.Array([{"a": [1, 2, 3]}, {"a": [4, 5]}])
>>> d = dak.from_awkward(a, npartitions=2)
>>> d.npartitions
2
>>> dak.to_parquet(d, "/tmp/my-output", prefix="data")
>>> import os
>>> os.listdir("/tmp/my-output")
['data-part0.parquet', 'data-part1.parquet']