# How to reduce dimensions (sum/min/any/all)#

After elementwise functions, dimension-reducer functions are the most commonly used. These functions replace a list of numbers with a single, scalar number by adding, multiplying, minimizing, maximizing, or performing logical-or (“any”) or logical-and (“all”).

These are also called aggregation functions; in relational databases, SQL, and data-frames, aggregations are applied after a “group by” operation. Awkward Array doesn’t have “group by” operations; lists are already grouped.

```import awkward as ak
import numpy as np
```

## First reducer: `ak.sum`#

To illustrate all of these functions, let’s consider addition. Given an array:

```array = ak.Array([[1, 2, 3], [4, 5], [], [6]])
```

`ak.sum()` with no arguments adds all of the values in the nested lists, just like `np.sum`.

```ak.sum(array)
```
```21
```

With Awkward Arrays, it’s usually more useful to supply an `axis` argument to reduce one dimension, rather than all dimensions.

For reasons that will be explained below, `axis=-1` is the most frequently useful.

```ak.sum(array, axis=-1)
```
```[6,
9,
0,
6]
---------------
type: 4 * int64```

### The `axis` argument#

Before getting deeper into the `axis` argument, let’s consider a NumPy array with more dimensions.

```array3d = np.array([
[
[    1,     2,     3,     4,     5],
[   10,    20,    30,    40,    50],
[  100,   200,   300,   400,   500],
],
[
[0.1  , 0.2  , 0.3  , 0.4  , 0.5  ],
[0.01 , 0.02 , 0.03 , 0.04 , 0.05 ],
[0.001, 0.002, 0.003, 0.004, 0.005],
],
])

with np.printoptions(suppress=True):
print(array3d)
```
```[[[  1.      2.      3.      4.      5.   ]
[ 10.     20.     30.     40.     50.   ]
[100.    200.    300.    400.    500.   ]]

[[  0.1     0.2     0.3     0.4     0.5  ]
[  0.01    0.02    0.03    0.04    0.05 ]
[  0.001   0.002   0.003   0.004   0.005]]]
```

This array has 3 dimensions, so in addition to `axis=None` (reduce everything to a scalar), there are 3 possible axis values.

The first case, `axis=0`, adds the first 3×5 block to the second 3×5 block, i.e. summing over the first (length-2) dimension. Thus, the `1` is added to `0.1`, the `2` is added to `0.2`, and so on until the `500` is added to `0.005`.

```with np.printoptions(suppress=True):
print(np.sum(array3d, axis=0))
```
```[[  1.1     2.2     3.3     4.4     5.5  ]
[ 10.01   20.02   30.03   40.04   50.05 ]
[100.001 200.002 300.003 400.004 500.005]]
```

The second case, `axis=1`, adds vertically within each 3×5 block, i.e. summing over the second (length-3) dimension. What’s left are two lists of length 5.

```with np.printoptions(suppress=True):
print(np.sum(array3d, axis=1))
```
```[[111.    222.    333.    444.    555.   ]
[  0.111   0.222   0.333   0.444   0.555]]
```

The third case, `axis=2`, adds horizontally within each 3×5 block, i.e. summing over the third (length-5) dimension. What’s left are two lists of length 3.

```with np.printoptions(suppress=True):
print(np.sum(array3d, axis=2))
```
```[[  15.     150.    1500.   ]
[   1.5      0.15     0.015]]
```

Since negative `axis` counts from the other end of the scale,

• `axis=0` is equivalent to `axis=-3`

• `axis=1` is equivalent to `axis=-2`

• `axis=2` is equivalent to `axis=-1`.

### The `axis` argument with ragged lists#

Awkward Arrays allow the lengths of lists in an array to differ, so we can have

```array_ragged = ak.Array([
[  1,   2,   3     ],
[ 10,  20          ],
[100, 200, 300, 400],
])
array_ragged
```
```[[1, 2, 3],
[10, 20],
[100, 200, 300, 400]]
----------------------
type: 3 * var * int64```

As before, `axis=-1` sums over the innermost lists, replacing each of the 3 horizontal rows with a sum.

```ak.sum(array_ragged, axis=-1)
```
```[6,
30,
1000]
---------------
type: 3 * int64```

And `axis=-2` sums vertically, replacing each of the 4 vertical columns with a sum. Since the list lengths differ, some of the places we might expect to see a value is an empty gap—it contributes nothing to the result.

```ak.sum(array_ragged, axis=0)
```
```[111,
222,
303,
400]
---------------
type: 4 * int64```

We also have to choose a convention: should the values be left-aligned or right-aligned within their lists? Awkward Array choses left-aligned.

In ragged data from real datasets, summing over whole lists usually has more meaning than summing over parts of different lists, so `axis=-1` is usually the most meaningful choice of `axis`.

### The `axis` argument with missing data#

Just as empty gaps contribute nothing to the sum, missing values (`None`) don’t contribute anything, either.

```array_ragged = ak.Array([
[None, None,    3,    4],
[  10, None,   30      ],
[ 100,  200,  300,  400],
])
array_ragged
```
```[[None, None, 3, 4],
[10, None, 30],
[100, 200, 300, 400]]
----------------------
type: 3 * var * ?int64```

`axis=-1` sums over each inner list, horizontally, replacing it with a scalar.

```ak.sum(array_ragged, axis=-1)
```
```[7,
40,
1000]
---------------
type: 3 * int64```

And `axis=-2` sums over the outer dimension, vertically.

```ak.sum(array_ragged, axis=-2)
```
```[110,
200,
333,
404]
---------------
type: 4 * int64```

For `ak.sum()`, each `None` has the same effect as a `0` value, for `ak.prod()` (multiplication), each `None` has the same effect as a `1` value, etc.

## The `keepdims` argument#

Sometimes, you want to replace lists with a length-1 list, rather than a scalar. `keepdims=True` does that.

```ak.sum(array_ragged, axis=-1, keepdims=True)
```
```[[7],
[40],
[1000]]
-------------------
type: 3 * 1 * int64```
```ak.sum(array_ragged, axis=-2, keepdims=True)
```
```[[110, 200, 333, 404]]
----------------------
type: 1 * var * int64```

The `keepdims` argument is particularly useful for `ak.argmin()` and `ak.argmax()`, which return positions in a list where the value is minimized or maximized. Those positions can only be used as slice indexes if they’re at the right nesting level, which `keepdims=True` maintains.

## Reducing over “any” and “all”#

`ak.any()` and `ak.all()` reduce boolean arrays, asking if a predicate is satisfied by “any” item or “all” items, respectively.

```array_bool = ak.Array([
[False, False,  True,  True],
[False,  True, False,  True],
[False,  True,  True,  True],
])
array_bool
```
```[[False, False, True, True],
[False, True, False, True],
[False, True, True, True]]
----------------------------
type: 3 * var * bool```
```ak.any(array_bool, axis=-1)
```
```[True,
True,
True]
--------------
type: 3 * bool```
```ak.any(array_bool, axis=-2)
```
```[False,
True,
True,
True]
--------------
type: 4 * bool```
```ak.all(array_bool, axis=-1)
```
```[False,
False,
False]
--------------
type: 3 * bool```
```ak.all(array_bool, axis=-2)
```
```[False,
False,
False,
True]
--------------
type: 4 * bool```

Since logical-or is like addition of booleans and logical-and is like multiplication, these reducers could have been replaced with `ak.sum()` and `ak.prod()`, but they’re very useful to have because they make some boolean-array slices easier to read.

```array = ak.Array([[0, 1, 2], [], [-3, 4], [-5], [-6, -7, -8, -9]])
array
```
```[[0, 1, 2],
[],
[-3, 4],
[-5],
[-6, -7, -8, -9]]
---------------------
type: 5 * var * int64```

Select whole lists if any of their values are negative:

```array[ak.any(array < 0, axis=-1)]
```
```[[-3, 4],
[-5],
[-6, -7, -8, -9]]
---------------------
type: 3 * var * int64```

Select whole lists if all of their values are negative:

```array[ak.all(array < 0, axis=-1)]
```
```[[],
[-5],
[-6, -7, -8, -9]]
---------------------
type: 3 * var * int64```

(If a list is empty, all of its elements satisfy a constraint.)

In both cases above, the selection can be read like an English sentence, “select lists if any…” or “select lists if all…”.

## Heterogeneous data and records cannot be reduced#

These two kinds of data types are not reducible. Heterogeneous data allows an array to have multiple numbers of dimensions, so the problem is ill-posed:

```ak.sum(ak.Array([[1.1, 2.2, 3.3], [], 4.4, 5.5]))
```
```---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[25], line 1
----> 1 ak.sum(ak.Array([[1.1, 2.2, 3.3], [], 4.4, 5.5]))

File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/_dispatch.py:64, in named_high_level_function.<locals>.dispatch(*args, **kwargs)
62 # Failed to find a custom overload, so resume the original function
63 try:
---> 64     next(gen_or_result)
65 except StopIteration as err:
66     return err.value

File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/operations/ak_sum.py:210, in sum(array, axis, keepdims, mask_identity, highlevel, behavior, attrs)
207 yield (array,)
209 # Implementation
--> 210 return _impl(array, axis, keepdims, mask_identity, highlevel, behavior, attrs)

File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/operations/ak_sum.py:277, in _impl(array, axis, keepdims, mask_identity, highlevel, behavior, attrs)
274     layout = ctx.unwrap(array, allow_record=False, primitive_policy="error")
275 reducer = ak._reducers.Sum()
--> 277 out = ak._do.reduce(
278     layout,
279     reducer,
280     axis=axis,
282     keepdims=keepdims,
283     behavior=ctx.behavior,
284 )
285 return ctx.wrap(out, highlevel=highlevel, allow_other=True)

File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/_do.py:244, in reduce(layout, reducer, axis, mask, keepdims, behavior)
232 parts = remove_structure(
233     layout,
234     flatten_records=False,
(...)
238     list_to_regular=True,
239 )
241 if len(parts) > 1:
242     # We know that `flatten_records` must fail, so the only other type
243     # that can return multiple parts here is the union array
--> 244     raise ValueError(
245         "cannot use axis=None on an array containing irreducible unions"
246     )
247 elif len(parts) == 0:
248     layout = ak.contents.EmptyArray()

ValueError: cannot use axis=None on an array containing irreducible unions

This error occurred while calling

ak.sum(
<Array [[1.1, 2.2, 3.3], [], 4.4, 5.5] type='4 * union[var * float6...'>
)
```

And records are sometimes used to represent data with coordinates; applying `ak.sum()` to non-Cartesian coordinates would be a subtle error.

```ak.sum(ak.Array([{"x": 1.1, "y": [1]}, {"x": 2.2, "y": [1, 2]}]), axis=-1)
```
```---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[26], line 1
----> 1 ak.sum(ak.Array([{"x": 1.1, "y": [1]}, {"x": 2.2, "y": [1, 2]}]), axis=-1)

File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/_dispatch.py:64, in named_high_level_function.<locals>.dispatch(*args, **kwargs)
62 # Failed to find a custom overload, so resume the original function
63 try:
---> 64     next(gen_or_result)
65 except StopIteration as err:
66     return err.value

File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/operations/ak_sum.py:210, in sum(array, axis, keepdims, mask_identity, highlevel, behavior, attrs)
207 yield (array,)
209 # Implementation
--> 210 return _impl(array, axis, keepdims, mask_identity, highlevel, behavior, attrs)

File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/operations/ak_sum.py:277, in _impl(array, axis, keepdims, mask_identity, highlevel, behavior, attrs)
274     layout = ctx.unwrap(array, allow_record=False, primitive_policy="error")
275 reducer = ak._reducers.Sum()
--> 277 out = ak._do.reduce(
278     layout,
279     reducer,
280     axis=axis,
282     keepdims=keepdims,
283     behavior=ctx.behavior,
284 )
285 return ctx.wrap(out, highlevel=highlevel, allow_other=True)

File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/_do.py:296, in reduce(layout, reducer, axis, mask, keepdims, behavior)
294 parents = ak.index.Index64.zeros(layout.length, layout.backend.index_nplike)
295 shifts = None
--> 296 next = layout._reduce_next(
297     reducer,
298     negaxis,
299     starts,
300     shifts,
301     parents,
302     1,
304     keepdims,
305     behavior,
306 )
308 return next[0]

File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/contents/recordarray.py:888, in RecordArray._reduce_next(self, reducer, negaxis, starts, shifts, parents, outlength, mask, keepdims, behavior)
886 reducer_recordclass = find_record_reducer(reducer, self, behavior)
887 if reducer_recordclass is None:
--> 888     raise TypeError(
889         "no ak.{} overloads for custom types: {}".format(
890             reducer.name, ", ".join(self.fields)
891         )
892     )
893 else:
894     # Positional reducers ultimately need to do more work when rebuilding the result