dask_awkward.to_parquet
dask_awkward.to_parquet¶
- dask_awkward.to_parquet(array: dask_awkward.lib.core.Array, destination: str, *, list_to32: bool, string_to32: bool, bytestring_to32: bool, emptyarray_to: Optional[Any], categorical_as_dictionary: bool, extensionarray: bool, count_nulls: bool, compression: str | dict | None, compression_level: int | dict | None, row_group_size: int | None, data_page_size: int | None, parquet_flavor: Optional[Literal['spark']], parquet_version: Union[Literal['1.0'], Literal['2.4'], Literal['2.6']], parquet_page_version: Union[Literal['1.0'], Literal['2.0']], parquet_metadata_statistics: bool | dict, parquet_dictionary_encoding: bool | dict, parquet_byte_stream_split: bool | dict, parquet_coerce_timestamps: Optional[Union[Literal['ms'], Literal['us']]], parquet_old_int96_timestamps: bool | None, parquet_compliant_nested: bool, parquet_extra_options: dict | None, storage_options: dict[str, Any] | None, write_metadata: bool, compute: Literal[True], prefix: str | None) None [source]¶
- dask_awkward.to_parquet(array: dask_awkward.lib.core.Array, destination: str, *, list_to32: bool, string_to32: bool, bytestring_to32: bool, emptyarray_to: Optional[Any], categorical_as_dictionary: bool, extensionarray: bool, count_nulls: bool, compression: str | dict | None, compression_level: int | dict | None, row_group_size: int | None, data_page_size: int | None, parquet_flavor: Optional[Literal['spark']], parquet_version: Union[Literal['1.0'], Literal['2.4'], Literal['2.6']], parquet_page_version: Union[Literal['1.0'], Literal['2.0']], parquet_metadata_statistics: bool | dict, parquet_dictionary_encoding: bool | dict, parquet_byte_stream_split: bool | dict, parquet_coerce_timestamps: Optional[Union[Literal['ms'], Literal['us']]], parquet_old_int96_timestamps: bool | None, parquet_compliant_nested: bool, parquet_extra_options: dict | None, storage_options: dict[str, Any] | None, write_metadata: bool, compute: Literal[False], prefix: str | None) dask_awkward.lib.core.Scalar
Write data to Parquet format.
This will create one output file per partition.
See the documentation for
ak.to_parquet()
for more information; there are many optional function arguments that are described in that documentation.- Parameters
array – The
dask_awkward.Array
collection to write to disk.destination – Where to store the output; this can be a local filesystem path or a remote filesystem path.
list_to32 – See
ak.to_parquet()
string_to32 – See
ak.to_parquet()
bytestring_to32 – See
ak.to_parquet()
emptyarray_to – See
ak.to_parquet()
categorical_as_dictionary – See
ak.to_parquet()
extensionarray – See
ak.to_parquet()
count_nulls – See
ak.to_parquet()
compression – See
ak.to_parquet()
compression_level – See
ak.to_parquet()
row_group_size – See
ak.to_parquet()
data_page_size – See
ak.to_parquet()
parquet_flavor – See
ak.to_parquet()
parquet_version – See
ak.to_parquet()
parquet_page_version – See
ak.to_parquet()
parquet_metadata_statistics – See
ak.to_parquet()
parquet_dictionary_encoding – See
ak.to_parquet()
parquet_byte_stream_split – See
ak.to_parquet()
parquet_coerce_timestamps – See
ak.to_parquet()
parquet_old_int96_timestamps – See
ak.to_parquet()
parquet_compliant_nested – See
ak.to_parquet()
parquet_extra_options – See
ak.to_parquet()
storage_options – Storage options passed to
fsspec
.write_metadata – Write Parquet metadata.
compute – If
True
, immediately compute the result (write data to disk). IfFalse
a Scalar collection will be returned such thatcompute
can be explicitly called.prefix – An addition prefix for output files. If
None
all parts inside the destination directory will be named"partN.parquet"
; if defined, the names will bef"{prefix}-partN.parquet"
.
- Returns
If
compute
isFalse
adask_awkward.Scalar
object is returned such that it can be computed later. Ifcompute
isTrue
, the collection is immediately computed (and data will be written to disk) andNone
is returned.- Return type
Scalar | None
Examples
>>> import awkward as ak >>> import dask_awkward as dak >>> a = ak.Array([{"a": [1, 2, 3]}, {"a": [4, 5]}]) >>> d = dak.from_awkward(a, npartitions=2) >>> d.npartitions 2 >>> dak.to_parquet(d, "/tmp/my-output", prefix="data") >>> import os >>> os.listdir("/tmp/my-output") ['data-part0.parquet', 'data-part1.parquet']