ak.run_lengths#
Defined in awkward.operations.ak_run_lengths on line 18.
- ak.run_lengths(array, *, highlevel=True, behavior=None, attrs=None)#
- Parameters:
array – Array-like data (anything
ak.to_layoutrecognizes).highlevel (bool) – If True, return an
ak.Array; otherwise, return a low-levelak.contents.Contentsubclass.behavior (None or dict) – Custom
ak.behaviorfor the output array, if high-level.attrs (None or dict) – Custom attributes for the output array, if high-level.
Computes the lengths of sequences of identical values at the deepest level of nesting, returning an array with the same structure but with
int64type.For example,
>>> array = ak.Array([1.1, 1.1, 1.1, 2.2, 3.3, 3.3, 4.4, 4.4, 5.5]) >>> ak.run_lengths(array) <Array [3, 1, 2, 2, 1] type='5 * int64'>
There are 3 instances of 1.1, followed by 1 instance of 2.2, 2 instances of 3.3, 2 instances of 4.4, and 1 instance of 5.5.
The order and uniqueness of the input data doesn’t matter,
>>> array = ak.Array([1.1, 1.1, 1.1, 5.5, 4.4, 4.4, 1.1, 1.1, 5.5]) >>> ak.run_lengths(array) <Array [3, 1, 2, 2, 1] type='5 * int64'>
just the difference between each value and its neighbors.
The data can be nested, but runs don’t cross list boundaries.
>>> array = ak.Array([[1.1, 1.1, 1.1, 2.2, 3.3], [3.3, 4.4], [4.4, 5.5]]) >>> ak.run_lengths(array) <Array [[3, 1, 1], [1, 1], [1, 1]] type='3 * var * int64'>
This function recognizes strings as distinguishable values.
>>> array = ak.Array([["one", "one"], ["one", "two", "two"], ["three", "two", "two"]]) >>> ak.run_lengths(array) <Array [[2], [1, 2], [1, 2]] type='3 * var * int64'>
Note that this can be combined with
ak.argsortandak.unflattento compute a “group by” operation:>>> array = ak.Array([{"x": 1, "y": 1.1}, {"x": 2, "y": 2.2}, {"x": 1, "y": 1.1}, ... {"x": 3, "y": 3.3}, {"x": 1, "y": 1.1}, {"x": 2, "y": 2.2}]) >>> sorted = array[ak.argsort(array.x)] >>> sorted.x <Array [1, 1, 1, 2, 2, 3] type='6 * int64'> >>> ak.run_lengths(sorted.x) <Array [3, 2, 1] type='3 * int64'> >>> ak.unflatten(sorted, ak.run_lengths(sorted.x)).show() [[{x: 1, y: 1.1}, {x: 1, y: 1.1}, {x: 1, y: 1.1}], [{x: 2, y: 2.2}, {x: 2, y: 2.2}], [{x: 3, y: 3.3}]]
Unlike a database “group by,” this operation can be applied in bulk to many sublists (though the run lengths need to be fully flattened to be used as
countsforak.unflatten, and you need to specifyaxis=-1as the depth).>>> array = ak.Array([[{"x": 1, "y": 1.1}, {"x": 2, "y": 2.2}, {"x": 1, "y": 1.1}], ... [{"x": 3, "y": 3.3}, {"x": 1, "y": 1.1}, {"x": 2, "y": 2.2}]]) >>> sorted = array[ak.argsort(array.x)] >>> sorted.x <Array [[1, 1, 2], [1, 2, 3]] type='2 * var * int64'> >>> ak.run_lengths(sorted.x) <Array [[2, 1], [1, 1, 1]] type='2 * var * int64'> >>> counts = ak.flatten(ak.run_lengths(sorted.x), axis=None) >>> ak.unflatten(sorted, counts, axis=-1).show() [[[{x: 1, y: 1.1}, {x: 1, y: 1.1}], [{x: 2, y: 2.2}]], [[{x: 1, y: 1.1}], [{x: 2, y: 2.2}], [{x: 3, y: 3.3}]]]
See also
ak.num,ak.argsort,ak.unflatten.