dask_awkward.to_parquet
dask_awkward.to_parquet¶
- dask_awkward.to_parquet(array: dask_awkward.lib.core.Array, destination: str, *, list_to32: bool, string_to32: bool, bytestring_to32: bool, emptyarray_to: Optional[Any], categorical_as_dictionary: bool, extensionarray: bool, count_nulls: bool, compression: str | dict | None, compression_level: int | dict | None, row_group_size: int | None, data_page_size: int | None, parquet_flavor: Optional[Literal['spark']], parquet_version: Union[Literal['1.0'], Literal['2.4'], Literal['2.6']], parquet_page_version: Union[Literal['1.0'], Literal['2.0']], parquet_metadata_statistics: bool | dict, parquet_dictionary_encoding: bool | dict, parquet_byte_stream_split: bool | dict, parquet_coerce_timestamps: Optional[Union[Literal['ms'], Literal['us']]], parquet_old_int96_timestamps: bool | None, parquet_compliant_nested: bool, parquet_extra_options: dict | None, storage_options: dict[str, Any] | None, write_metadata: bool, compute: Literal[True], prefix: str | None) None[source]¶
- dask_awkward.to_parquet(array: dask_awkward.lib.core.Array, destination: str, *, list_to32: bool, string_to32: bool, bytestring_to32: bool, emptyarray_to: Optional[Any], categorical_as_dictionary: bool, extensionarray: bool, count_nulls: bool, compression: str | dict | None, compression_level: int | dict | None, row_group_size: int | None, data_page_size: int | None, parquet_flavor: Optional[Literal['spark']], parquet_version: Union[Literal['1.0'], Literal['2.4'], Literal['2.6']], parquet_page_version: Union[Literal['1.0'], Literal['2.0']], parquet_metadata_statistics: bool | dict, parquet_dictionary_encoding: bool | dict, parquet_byte_stream_split: bool | dict, parquet_coerce_timestamps: Optional[Union[Literal['ms'], Literal['us']]], parquet_old_int96_timestamps: bool | None, parquet_compliant_nested: bool, parquet_extra_options: dict | None, storage_options: dict[str, Any] | None, write_metadata: bool, compute: Literal[False], prefix: str | None) dask_awkward.lib.core.Scalar
Write data to Parquet format.
This will create one output file per partition.
See the documentation for
ak.to_parquet()for more information; there are many optional function arguments that are described in that documentation.- Parameters
array – The
dask_awkward.Arraycollection to write to disk.destination – Where to store the output; this can be a local filesystem path or a remote filesystem path.
list_to32 – See
ak.to_parquet()string_to32 – See
ak.to_parquet()bytestring_to32 – See
ak.to_parquet()emptyarray_to – See
ak.to_parquet()categorical_as_dictionary – See
ak.to_parquet()extensionarray – See
ak.to_parquet()count_nulls – See
ak.to_parquet()compression – See
ak.to_parquet()compression_level – See
ak.to_parquet()row_group_size – See
ak.to_parquet()data_page_size – See
ak.to_parquet()parquet_flavor – See
ak.to_parquet()parquet_version – See
ak.to_parquet()parquet_page_version – See
ak.to_parquet()parquet_metadata_statistics – See
ak.to_parquet()parquet_dictionary_encoding – See
ak.to_parquet()parquet_byte_stream_split – See
ak.to_parquet()parquet_coerce_timestamps – See
ak.to_parquet()parquet_old_int96_timestamps – See
ak.to_parquet()parquet_compliant_nested – See
ak.to_parquet()parquet_extra_options – See
ak.to_parquet()storage_options – Storage options passed to
fsspec.write_metadata – Write Parquet metadata.
compute – If
True, immediately compute the result (write data to disk). IfFalsea Scalar collection will be returned such thatcomputecan be explicitly called.prefix – An addition prefix for output files. If
Noneall parts inside the destination directory will be named"partN.parquet"; if defined, the names will bef"{prefix}-partN.parquet".
- Returns
If
computeisFalseadask_awkward.Scalarobject is returned such that it can be computed later. IfcomputeisTrue, the collection is immediately computed (and data will be written to disk) andNoneis returned.- Return type
Scalar | None
Examples
>>> import awkward as ak >>> import dask_awkward as dak >>> a = ak.Array([{"a": [1, 2, 3]}, {"a": [4, 5]}]) >>> d = dak.from_awkward(a, npartitions=2) >>> d.npartitions 2 >>> dak.to_parquet(d, "/tmp/my-output", prefix="data") >>> import os >>> os.listdir("/tmp/my-output") ['data-part0.parquet', 'data-part1.parquet']