Array

Array

Array(dsk, name, meta, divisions)

Partitioned, lazy, and parallel Awkward Array Dask collection.

map_partitions(fn, *args[, label, meta, ...])

Map a callable across all partitions of any number of collections.

class dask_awkward.Array(dsk, name, meta, divisions)[source]

Partitioned, lazy, and parallel Awkward Array Dask collection.

The class constructor is not intended for users. Instead use factory functions like dask_awkward.from_parquet(), dask_awkward.from_json(), etc.

Within dask-awkward the new_array_object factory function is used for creating new instances.

clear_divisions()[source]

Clear the divisions of a Dask Awkward Collection.

property dask

High level task graph associated with the collection.

property divisions

Location of the collections partition boundaries.

eager_compute_divisions()[source]

Force a compute of the divisions.

property fields

Record field names (if any).

property keys

Task graph keys.

property keys_array

NumPy array of task graph keys.

property known_divisions

True of the divisions are known (absence of None in the tuple).

map_partitions(func, *args, **kwargs)[source]

Map a function across all partitions of the collection.

Parameters
  • func (Callable) – Function to call on all partitions.

  • *args (Collections and function arguments) – Additional arguments passed to func after the collection, if arguments are Array collections they must be compatibly partitioned with the object this method is being called from.

  • **kwargs (Any) – Additional keyword arguments passed to the func.

Returns

The new collection.

Return type

dask_awkward.Array

property name

Name of the collection.

property ndim

Number of dimensions.

property npartitions

Total number of partitions.

property partitions

Get a specific partition or slice of partitions.

Return type

dask.utils.IndexCallable

Examples

>>> import dask_awkward as dak
>>> import awkward._v2 as ak
>>> aa = ak.Array([[1, 2, 3], [], [2]])
>>> a = dak.from_awkward(aa, npartitions=3)
>>> a
dask.awkward<from-awkward, npartitions=3>
>>> a.partitions[0]
dask.awkward<partitions, npartitions=1>
>>> a.partitions[0:2]
dask.awkward<partitions, npartitions=2>
>>> a.partitions[2].compute()
<Array [[2]] type='1 * var * int64'>
reset_meta()[source]

Assign an empty typetracer array as the collection metadata.

to_delayed(optimize_graph=True)[source]

Convert the collection to a list of delayed objects.

One dask.delayed.Delayed object per partition.

Parameters

optimize_graph (bool) – If True the task graph associated with the collection will be optimized before conversion to the list of Delayed objects.

Returns

List of delayed objects (one per partition).

Return type

list[Delayed]

dask_awkward.map_partitions(fn, *args, label=None, meta=None, output_divisions=None, **kwargs)[source]

Map a callable across all partitions of any number of collections.

Parameters
  • fn (Callable) – Function to apply on all partitions.

  • *args (Collections and function arguments) – Arguments passed to the function. Partitioned arguments (i.e. Dask collections) will have fn applied to each partition. Array collection arguments they must be compatibly partitioned.

  • label (str, optional) – Label for the Dask graph layer; if left to None (default), the name of the function will be used.

  • meta (Any, optional) – Metadata (typetracer) array for the result (if known). If unknown, fn will be applied to the metadata of the args; if that call fails, the first partition of the new collection will be used to compute the new metadata if the awkward.compute-known-meta configuration setting is True. If the configuration setting is False, an empty typetracer will be assigned as the metadata.

  • output_divisions (int, optional) – If None (the default), the divisions of the output will be assumed unknown. If defined, the output divisions will be multiplied by a factor of output_divisions. A value of 1 means constant divisions (e.g. a string based slice). Any value greater than 1 means the divisions were expanded by some operation. This argument is mainly for internal library function implementations.

  • **kwargs (Any) – Additional keyword arguments passed to the fn.

Returns

The new collection.

Return type

dask_awkward.Array

Examples

>>> import dask_awkward as dak
>>> a = [[1, 2, 3], [4]]
>>> b = [[5, 6, 7], [8]]
>>> c = dak.from_lists([a, b])
>>> c
dask.awkward<from-lists, npartitions=2>
>>> c.compute()
<Array [[1, 2, 3], [4], [5, 6, 7], [8]] type='4 * var * int64'>
>>> c2 = dak.map_partitions(np.add, c, c)
>>> c2
dask.awkward<add, npartitions=2>
>>> c2.compute()
<Array [[2, 4, 6], [8], [10, 12, 14], [16]] type='4 * var * int64'>

Multiplying c (a Dask collection) with a (a regular Python list object) will multiply each partition of c by a:

>>> d = dak.map_partitions(np.multiply, c, a)
dask.awkward<multiply, npartitions=2>
>>> d.compute()
<Array [[1, 4, 9], [16], [5, 12, 21], [32]] type='4 * var * int64'>

This is effectively the same as d = c * a