dask_awkward.sample

Contents

dask_awkward.sample#

dask_awkward.sample(arr, factor=None, probability=None)[source]#

Decimate the data to a smaller number of rows.

Must give either factor or probability.

Parameters:
  • arr (dask_awkward.Array) – Array collection to sample

  • factor (int, optional) – if given, every Nth row will be kept. The counting restarts for each partition, so reducing the row count by an exact factor is not guaranteed

  • probability (float, optional) – a number between 0 and 1, giving the chance of any particular row surviving. For instance, for probability=0.1, roughly 1-in-10 rows will remain.

Return type:

Array