IO

IO

from_awkward(source, npartitions[, label])

Create a Dask collection from a concrete awkward array.

from_dask_array(array)

Convert a Dask Array collection to a Dask Awkard Array collection.

from_delayed(source[, meta, divisions, prefix])

Create a Dask Awkward Array from Dask Delayed objects.

from_json(urlpath[, blocksize, delimiter, ...])

Create an Awkward Array collection from JSON data.

from_lists(source)

Create a Dask collection from a list of lists.

from_map(func, *iterables[, args, label, ...])

Create an Array collection from a custom mapping.

to_dask_array(array)

Convert awkward array collection to a Dask array collection.

to_delayed(array[, optimize_graph])

Convert the collection to a list of delayed objects.

dask_awkward.from_awkward(source, npartitions, label=None)[source]

Create a Dask collection from a concrete awkward array.

Parameters
  • source (ak.Array) – The concrete awkward array.

  • npartitions (int) – The total number of partitions for the collection.

  • label (str, optional) – Label for the task.

Returns

Resulting awkward array collection.

Return type

Array

Examples

>>> import dask_awkward as dak
>>> import awkward._v2 as ak
>>> a = ak.Array([[1, 2, 3], [4], [5, 6, 7, 8]])
>>> c = dak.from_awkward(a, npartitions=3)
>>> c.partitions[[0, 1]].compute()
<Array [[1, 2, 3], [4]] type='2 * var * int64'>
dask_awkward.from_dask_array(array)[source]

Convert a Dask Array collection to a Dask Awkard Array collection.

Parameters

array (dask.array.Array) – Array to convert.

Returns

The Awkward Array Dask collection.

Return type

Array

Examples

>>> import dask.array as da
>>> import dask_awkward as dak
>>> x = da.ones(1000, chunks=250)
>>> y = dak.from_dask_array(x)
>>> y
dask.awkward<from-dask-array, npartitions=4>
dask_awkward.from_delayed(source, meta=None, divisions=None, prefix='from-delayed')[source]

Create a Dask Awkward Array from Dask Delayed objects.

Parameters
  • source (list[dask.delayed.Delayed] | dask.delayed.Delayed) – List of Delayed objects (or a single object). Each Delayed object represents a single partition in the resulting awkward array.

  • meta (ak.Array, optional) – Metadata (typetracer array) if known, if None the first partition (first element of the list of Delayed objects) will be computed to determine the metadata.

  • divisions (tuple[int | None, ...], optional) – Partition boundaries (if known).

  • prefix (str) – Prefix for the keys in the task graph.

Returns

Resulting Array collection.

Return type

Array

dask_awkward.from_json(urlpath, blocksize=None, delimiter=None, one_obj_per_file=False, compression='infer', meta=None, derive_meta_kwargs=None, storage_options=None)[source]

Create an Awkward Array collection from JSON data.

There are three styles supported for reading JSON data:

  1. Line delimited style: file(s) with one JSON object per line. The function argument defaults are setup to handle this style. This method assumes newline characters are not embedded in JSON values.

  2. Single JSON object per file (this requires one_obj_per_file to be set to True.

  3. Reading some number of bytes at a time. If at least one of blocksize or delimiter are defined, Dask’s read_bytes() function will be used to lazily read bytes (blocksize bytes per partition) and split on delimiter). This method assumes line delimited JSON without newline characters embedded in JSON values.

Parameters
  • urlpath (str | list[str]) – The source of the JSON dataset.

  • blocksize (int | str, optional) – If defined, each partition will be created from a block of JSON bytes of this size. If delimiter is defined (not None) but this value remains None, a default value of 128 MiB will be used.

  • delimiter (bytes, optional) – If defined (not None), this will be the byte(s) to split on when reading blocksizes. If this is None but blocksize is defined (not None), the default byte charater will be the newline (b"\n").

  • one_obj_per_file (bool) – If True each file will be considered a single JSON object.

  • compression (str, optional) – Compression of the files in the dataset.

  • meta (Any, optional) – The metadata for the collection. If None (the default), them metadata will be determined by scanning the beginning of the dataset.

  • derive_meta_kwargs (dict[str, Any], optional) – Dictionary of arguments to be passed to derive_json_meta for determining the collection metadata if meta is None.

  • storage_options (dict[str, Any], optional) – Storage options passed to fsspec.

Returns

The resulting Dask Awkward Array collection.

Return type

Array

Examples

One partition per file:

>>> import dask_awkard as dak
>>> a = dak.from_json("dataset*.json")

One partition ber 200 MB of JSON data:

>>> a = dak.from_json("dataset*.json", blocksize="200 MB")

Same as previous call (explicit definition of the delimiter):

>>> a = dak.from_json(
...     "dataset*.json", blocksize="200 MB", delimiter=b"\n",
... )
dask_awkward.from_lists(source)[source]

Create a Dask collection from a list of lists.

Parameters

source (list[list[Any]]) – List of lists, each outer list will become a partition in the collection.

Returns

Resulting Array collection.

Return type

Array

Examples

>>> import dask_awkward as dak
>>> a = [[1, 2, 3], [4]]
>>> b = [[5], [6, 7, 8]]
>>> c = dak.from_lists([a, b])
>>> c
dask.awkward<from-lists, npartitions=2>
>>> c.compute()
<Array [[1, 2, 3], [4], [5], [6, 7, 8]] type='4 * var * int64'>
dask_awkward.from_map(func, *iterables, args=None, label=None, token=None, divisions=None, meta=None, **kwargs)[source]

Create an Array collection from a custom mapping.

Parameters
  • func (Callable) – Function used to create each partition.

  • *iterables (Iterable) – Iterable objects to map to each output partition. All iterables must be the same length. This length determines the number of partitions in the output collection (only one element of each iterable will be passed to func for each partition).

  • label (str, optional) – String to use as the function-name label in the output collection-key names.

  • token (str, optional) – String to use as the “token” in the output collection-key names.

  • divisions (tuple[int | None, ...], optional) – Partition boundaries (if known).

  • meta (Array, optional) – Collection metadata array, if known (the awkward-array type tracer)

  • **kwargs (Any) – Keyword arguments passed to func.

Returns

Array collection.

Return type

Array

dask_awkward.to_dask_array(array)[source]

Convert awkward array collection to a Dask array collection.

This conversion requires the awkward array to have a rectilinear shape (that is, no lists of variable length lists).

Parameters

array (Array) – The dask awkward array collection.

Returns

The new dask.array.Array collection.

Return type

dask.array.Array

dask_awkward.to_delayed(array, optimize_graph=True)[source]

Convert the collection to a list of delayed objects.

One dask.delayed.Delayed object per partition.

Parameters

optimize_graph (bool) – If True the task graph associated with the collection will be optimized before conversion to the list of Delayed objects.

Returns

List of delayed objects (one per partition).

Return type

list[dask.delayed.Delayed]