dask_awkward.to_parquet

Contents

dask_awkward.to_parquet#

dask_awkward.to_parquet(array, destination, list_to32=False, string_to32=True, bytestring_to32=True, emptyarray_to=None, categorical_as_dictionary=False, extensionarray=False, count_nulls=True, compression='zstd', compression_level=None, row_group_size=67108864, data_page_size=None, parquet_flavor=None, parquet_version='2.4', parquet_page_version='1.0', parquet_metadata_statistics=True, parquet_dictionary_encoding=False, parquet_byte_stream_split=False, parquet_coerce_timestamps=None, parquet_old_int96_timestamps=None, parquet_compliant_nested=False, parquet_extra_options=None, storage_options=None, write_metadata=False, compute=True, prefix=None)[source]#

Write data to Parquet format.

This will create one output file per partition.

See the documentation for ak.to_parquet() for more information; there are many optional function arguments that are described in that documentation.

Parameters:
Returns:

If compute is False a dask_awkward.Scalar object is returned such that it can be computed later. If compute is True, the collection is immediately computed (and data will be written to disk) and None is returned.

Return type:

Scalar | None

Examples

>>> import awkward as ak
>>> import dask_awkward as dak
>>> a = ak.Array([{"a": [1, 2, 3]}, {"a": [4, 5]}])
>>> d = dak.from_awkward(a, npartitions=2)
>>> d.npartitions
2
>>> dak.to_parquet(d, "/tmp/my-output", prefix="data")
>>> import os
>>> os.listdir("/tmp/my-output")
['data-part0.parquet', 'data-part1.parquet']