# How to reduce dimensions (sum/min/any/all)#

After elementwise functions, dimension-reducer functions are the most commonly used. These functions replace a list of numbers with a single, scalar number by adding, multiplying, minimizing, maximizing, or performing logical-or (“any”) or logical-and (“all”).

These are also called aggregation functions; in relational databases, SQL, and data-frames, aggregations are applied after a “group by” operation. Awkward Array doesn’t have “group by” operations; lists are already grouped.

import awkward as ak
import numpy as np
## First reducer: `ak.sum`

To illustrate all of these functions, let’s consider addition. Given an array:

array = ak.Array([[1, 2, 3], [4, 5], [], [6]])
`ak.sum()`

with no arguments adds all of the values in the nested lists, just like `np.sum`

ak.sum(array)
21
With Awkward Arrays, it’s usually more useful to supply an `axis`

argument to reduce one dimension, rather than all dimensions.

For reasons that will be explained below, `axis=-1`

is the most frequently useful.

ak.sum(array, axis=-1)
[6, 9, 0, 6] --------------- type: 4 * int64

### The `axis`

argument#

Before getting deeper into the `axis`

argument, let’s consider a NumPy array with more dimensions.

array3d = np.array([
[
[ 1, 2, 3, 4, 5],
[ 10, 20, 30, 40, 50],
[ 100, 200, 300, 400, 500],
],
[
[0.1 , 0.2 , 0.3 , 0.4 , 0.5 ],
[0.01 , 0.02 , 0.03 , 0.04 , 0.05 ],
[0.001, 0.002, 0.003, 0.004, 0.005],
],
])
with np.printoptions(suppress=True):
print(array3d)
[[[ 1. 2. 3. 4. 5. ]
[ 10. 20. 30. 40. 50. ]
[100. 200. 300. 400. 500. ]]
[[ 0.1 0.2 0.3 0.4 0.5 ]
[ 0.01 0.02 0.03 0.04 0.05 ]
[ 0.001 0.002 0.003 0.004 0.005]]]
This array has 3 dimensions, so in addition to `axis=None`

(reduce everything to a scalar), there are 3 possible axis values.

The first case, `axis=0`

, adds the first 3×5 block to the second 3×5 block, i.e. summing over the first (length-2) dimension. Thus, the `1`

is added to `0.1`

, the `2`

is added to `0.2`

, and so on until the `500`

is added to `0.005`

.

with np.printoptions(suppress=True):
print(np.sum(array3d, axis=0))
[[ 1.1 2.2 3.3 4.4 5.5 ]
[ 10.01 20.02 30.03 40.04 50.05 ]
[100.001 200.002 300.003 400.004 500.005]]
The second case, `axis=1`

, adds vertically within each 3×5 block, i.e. summing over the second (length-3) dimension. What’s left are two lists of length 5.

with np.printoptions(suppress=True):
print(np.sum(array3d, axis=1))
[[111. 222. 333. 444. 555. ]
[ 0.111 0.222 0.333 0.444 0.555]]
The third case, `axis=2`

, adds horizontally within each 3×5 block, i.e. summing over the third (length-5) dimension. What’s left are two lists of length 3.

with np.printoptions(suppress=True):
print(np.sum(array3d, axis=2))
[[ 15. 150. 1500. ]
[ 1.5 0.15 0.015]]
Since negative `axis`

counts from the other end of the scale,

`axis=0`

is equivalent to`axis=-3`

`axis=1`

is equivalent to`axis=-2`

`axis=2`

is equivalent to`axis=-1`

.

### The `axis`

argument with ragged lists#

Awkward Arrays allow the lengths of lists in an array to differ, so we can have

array_ragged = ak.Array([
[ 1, 2, 3 ],
[ 10, 20 ],
[100, 200, 300, 400],
])
array_ragged
[[1, 2, 3], [10, 20], [100, 200, 300, 400]] ---------------------- type: 3 * var * int64

As before, `axis=-1`

sums over the innermost lists, replacing each of the 3 horizontal rows with a sum.

ak.sum(array_ragged, axis=-1)
[6, 30, 1000] --------------- type: 3 * int64

And `axis=-2`

sums vertically, replacing each of the 4 vertical columns with a sum. Since the list lengths differ, some of the places we might expect to see a value is an empty gap—it contributes nothing to the result.

```
[111, 222, 303, 400] --------------- type: 4 * int64

We also have to choose a convention: should the values be left-aligned or right-aligned within their lists? Awkward Array choses left-aligned.

In ragged data from real datasets, summing over whole lists usually has more meaning than summing over parts of different lists, so `axis=-1`

is usually the most meaningful choice of `axis`

.

### The `axis`

argument with missing data#

Just as empty gaps contribute nothing to the sum, missing values (`None`

) don’t contribute anything, either.

array_ragged = ak.Array([
[None, None, 3, 4],
[ 10, None, 30 ],
[ 100, 200, 300, 400],
])
array_ragged
[[None, None, 3, 4], [10, None, 30], [100, 200, 300, 400]] ---------------------- type: 3 * var * ?int64

`axis=-1`

sums over each inner list, horizontally, replacing it with a scalar.

ak.sum(array_ragged, axis=-1)
[7, 40, 1000] --------------- type: 3 * int64

And `axis=-2`

sums over the outer dimension, vertically.

ak.sum(array_ragged, axis=-2)
[110, 200, 333, 404] --------------- type: 4 * int64

For `ak.sum()`

, each `None`

has the same effect as a `0`

value, for `ak.prod()`

(multiplication), each `None`

has the same effect as a `1`

value, etc.

## The `keepdims`

argument#

Sometimes, you want to replace lists with a length-1 list, rather than a scalar. `keepdims=True`

does that.

ak.sum(array_ragged, axis=-1, keepdims=True)
[[7], [40], [1000]] ------------------- type: 3 * 1 * int64

ak.sum(array_ragged, axis=-2, keepdims=True)
[[110, 200, 333, 404]] ---------------------- type: 1 * var * int64

The `keepdims`

argument is particularly useful for `ak.argmin()`

and `ak.argmax()`

, which return positions in a list where the value is minimized or maximized. Those positions can only be used as slice indexes if they’re at the right nesting level, which `keepdims=True`

maintains.

## Other reducers#

The

`ak.prod()`

reducer multiplies, rather than adding.`ak.min()`

and`ak.max()`

minimize and maximize, returning`None`

for empty lists.`ak.argmin()`

and`ak.argmax()`

return the index positions of the minimum or maximum value, with`None`

for empty lists.`ak.nansum()`

,`ak.nanprod()`

,`ak.nanmin()`

,`ak.nanmax()`

,`ak.nanargmin()`

, and`ak.nanargmax()`

ignore floating-point`nan`

values before operating, the way that all reducers ignore`None`

values before operating.`ak.count_nonzero()`

counts non-zero values.`ak.count()`

simply counts values. In NumPy, there’s no need for such a function because it would return constants (drawn from the NumPy array’s`shape`

), but for ragged arrays, it counts the number of values that enter into a reduction.`ak.num()`

also returns lengths of lists, but in a way that’s more useful for slicing;`ak.count()`

is useful as the denominator of expressions in which another reducer (with the same`axis`

and`keepdims`

choices) is in the numerator.`ak.any()`

and`ak.all()`

reduce like logical-or and logical-and, which makes them particularly useful in slices (below).

## Reducing over “any” and “all”#

`ak.any()`

and `ak.all()`

reduce boolean arrays, asking if a predicate is satisfied by “any” item or “all” items, respectively.

array_bool = ak.Array([
[False, False, True, True],
[False, True, False, True],
[False, True, True, True],
])
array_bool
[[False, False, True, True], [False, True, False, True], [False, True, True, True]] ---------------------------- type: 3 * var * bool

ak.any(array_bool, axis=-1)
[True, True, True] -------------- type: 3 * bool

```
```

```
```

```
```

Since logical-or is like addition of booleans and logical-and is like multiplication, these reducers could have been replaced with `ak.sum()`

and `ak.prod()`

, but they’re very useful to have because they make some boolean-array slices easier to read.

array = ak.Array([[0, 1, 2], [], [-3, 4], [-5], [-6, -7, -8, -9]])
array
[[0, 1, 2], [], [-3, 4], [-5], [-6, -7, -8, -9]] --------------------- type: 5 * var * int64

Select *whole lists* if *any* of their values are negative:

```
```

Select *whole lists* if *all* of their values are negative:

```
```

(If a list is empty, all of its elements satisfy a constraint.)

In both cases above, the selection can be read like an English sentence, “select lists if *any*…” or “select lists if *all*…”.

## Heterogeneous data and records cannot be reduced#

These two kinds of data types are not reducible. Heterogeneous data allows an array to have multiple numbers of dimensions, so the problem is ill-posed:

ak.sum(ak.Array([[1.1, 2.2, 3.3], [], 4.4, 5.5]))
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[25], line 1
----> 1 ak.sum(ak.Array([[1.1, 2.2, 3.3], [], 4.4, 5.5]))
File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/_dispatch.py:64, in named_high_level_function.<locals>.dispatch(*args, **kwargs)
62 # Failed to find a custom overload, so resume the original function
63 try:
---> 64 next(gen_or_result)
65 except StopIteration as err:
66 return err.value
File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/operations/ak_sum.py:210, in sum(array, axis, keepdims, mask_identity, highlevel, behavior, attrs)
207 yield (array,)
209 # Implementation
--> 210 return _impl(array, axis, keepdims, mask_identity, highlevel, behavior, attrs)
File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/operations/ak_sum.py:277, in _impl(array, axis, keepdims, mask_identity, highlevel, behavior, attrs)
274 layout = ctx.unwrap(array, allow_record=False, primitive_policy="error")
275 reducer = ak._reducers.Sum()
--> 277 out = ak._do.reduce(
278 layout,
279 reducer,
280 axis=axis,
281 mask=mask_identity,
282 keepdims=keepdims,
283 behavior=ctx.behavior,
284 )
285 return ctx.wrap(out, highlevel=highlevel, allow_other=True)
File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/_do.py:281, in reduce(layout, reducer, axis, mask, keepdims, behavior)
269 parts = remove_structure(
270 layout,
271 flatten_records=False,
(...)
275 list_to_regular=True,
276 )
278 if len(parts) > 1:
279 # We know that `flatten_records` must fail, so the only other type
280 # that can return multiple parts here is the union array
--> 281 raise ValueError(
282 "cannot use axis=None on an array containing irreducible unions"
283 )
284 elif len(parts) == 0:
285 layout = ak.contents.EmptyArray()
ValueError: cannot use axis=None on an array containing irreducible unions
This error occurred while calling
ak.sum(
<Array [[1.1, 2.2, 3.3], [], 4.4, 5.5] type='4 * union[var * float6...'>
)
And records are sometimes used to represent data with coordinates; applying `ak.sum()`

to non-Cartesian coordinates would be a subtle error.

ak.sum(ak.Array([{"x": 1.1, "y": [1]}, {"x": 2.2, "y": [1, 2]}]), axis=-1)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[26], line 1
----> 1 ak.sum(ak.Array([{"x": 1.1, "y": [1]}, {"x": 2.2, "y": [1, 2]}]), axis=-1)
File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/_dispatch.py:64, in named_high_level_function.<locals>.dispatch(*args, **kwargs)
62 # Failed to find a custom overload, so resume the original function
63 try:
---> 64 next(gen_or_result)
65 except StopIteration as err:
66 return err.value
File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/operations/ak_sum.py:210, in sum(array, axis, keepdims, mask_identity, highlevel, behavior, attrs)
207 yield (array,)
209 # Implementation
--> 210 return _impl(array, axis, keepdims, mask_identity, highlevel, behavior, attrs)
File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/operations/ak_sum.py:277, in _impl(array, axis, keepdims, mask_identity, highlevel, behavior, attrs)
274 layout = ctx.unwrap(array, allow_record=False, primitive_policy="error")
275 reducer = ak._reducers.Sum()
--> 277 out = ak._do.reduce(
278 layout,
279 reducer,
280 axis=axis,
281 mask=mask_identity,
282 keepdims=keepdims,
283 behavior=ctx.behavior,
284 )
285 return ctx.wrap(out, highlevel=highlevel, allow_other=True)
File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/_do.py:333, in reduce(layout, reducer, axis, mask, keepdims, behavior)
331 parents = ak.index.Index64.zeros(layout.length, layout.backend.index_nplike)
332 shifts = None
--> 333 next = layout._reduce_next(
334 reducer,
335 negaxis,
336 starts,
337 shifts,
338 parents,
339 1,
340 mask,
341 keepdims,
342 behavior,
343 )
345 return next[0]
File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/contents/recordarray.py:888, in RecordArray._reduce_next(self, reducer, negaxis, starts, shifts, parents, outlength, mask, keepdims, behavior)
886 reducer_recordclass = find_record_reducer(reducer, self, behavior)
887 if reducer_recordclass is None:
--> 888 raise TypeError(
889 "no ak.{} overloads for custom types: {}".format(
890 reducer.name, ", ".join(self.fields)
891 )
892 )
893 else:
894 # Positional reducers ultimately need to do more work when rebuilding the result
895 # so asking for a mask doesn't help us!
896 reducer_should_mask = mask and not reducer.needs_position
TypeError: no ak.sum overloads for custom types: x, y
This error occurred while calling
ak.sum(
<Array [{x: 1.1, y: [1]}, {...}] type='2 * {x: float64, y: var * in...'>
axis = -1
)
```