dask_awkward.partition_compatibility#
- dask_awkward.partition_compatibility(*args)[source]#
Check if multiple collections have compatible partitions.
- Parameters:
*args (Array) – Any number of array collections to check.
- Returns:
Result of the check.
- Return type:
Examples
Starting with an absolutely compatible comparison:
>>> import dask_awkward as dak >>> import awkward as ak >>> concrete = ak.Array([[1, 2, 3], [4], [5, 6], [0, 0, 0, 0]]) >>> lazy = dak.from_awkward(concrete, npartitions=2) >>> selection = dak.sum(lazy, axis=1) == 0 >>> dak.partition_compatibility(lazy, selection) <PartitionCompatibility.YES: 0>
The selection doesn’t change the length of the arrays at each partition, so the divisions are known to be conserved for those operations (the sum on
axis=1
along with the equality comparison).In general we have no way of knowing what the resulting divisions will be after a boolean selection, but the total number of partitions will be conserved, so we have to report
MAYBE
:>>> selected_lazy = lazy[selection] >>> dak.partition_compatibility(lazy, lazy_selection) <PartitionCompatibility.MAYBE: 2>
Due the simple nature of this example we know that after the selection the partitions will not be compatible (because it’s clear only 1 element of the original array will survive the selection, so the divisions will change after that compute). Now we can eagerly compute what the divisions will be on the
lazy_selection
collection and get aNO
result:>>> lazy_selection.eager_compute_divisions() >>> dak.partition_compatibility(lazy, lazy_selection) <PartitionCompatibility.NO: 1>
Remember that
Array.eager_compute_divisions()
is going to trigger a compute to determine the divisions (to know divisions we need to know the length of each partition)