---
jupytext:
text_representation:
extension: .md
format_name: myst
format_version: 0.13
jupytext_version: 1.14.4
kernelspec:
display_name: Python 3 (ipykernel)
language: python
name: python3
---
How to filter with ragged arrays
================================
```{code-cell} ipython3
import awkward as ak
import numpy as np
```
## What is awkward indexing?
One of the most powerful features of NumPy is the expressiveness of its indexing system. A NumPy array [can be sliced in many different ways](https://numpy.org/doc/stable/user/basics.indexing.html#basic-indexing), such as with a single integer, or an array of integers. Awkward Array implements most of these indexing styles, but adds an additional variant: _awkward indexing_.
+++
Consider the following ragged array:
```{code-cell} ipython3
array = ak.Array(
[
[
[0.0, 1.1, 2.2],
[3.3, 4.4, 5.5, 6.6],
[7.7],
],
[],
[
[8.8, 9.9, 10.10, 11.11, 12.12],
],
]
)
array
```
We can easily pull out the first two items with a simple slice
```{code-cell} ipython3
array[..., :2]
```
But what if we wanted to pull out a different number of items for each sublist, e.g. to produce the following array:
```
[[[], [3.3], [7.7]],
[],
[[10.10, 11.11, 12.12]]]
----------------------------------------------
type: 3 * var * var * float64
```
+++
To produce this result, we need awkward indexing.
+++
(how-to-filter-masked:building-an-awkward-index)=
## Building an awkward index
+++
Awkward indexing requires an index array that
1. has a structure matching the array being sliced **up to** (but not including) the final dimension of the index
2. has at _least_ one ragged (`var`) dimension **or** contain missing values
By structure, we mean the number of sublists in each dimension, which can be seen with {func}`ak.num`:
+++
`axis=0` has a single list of three items:
```{code-cell} ipython3
ak.num(array, axis=0)
```
`axis=1` has three lists, the first with three items, the second with zero items, the third with a single item:
```{code-cell} ipython3
ak.num(array, axis=1)
```
To put this more simply, the final dimension of the awkward index is used to pull items out of the array. Therefore, Awkward needs the preceeding dimensions to line up!
+++
Recall that we wanted to pull out the following result from `array` using awkward indexing:
```
[[[], [3.3], [7.7]],
[],
[[10.10, 11.11, 12.12]]]
----------------------------------------------
type: 3 * var * var * float64
```
+++
It's clear that we want to pull specific items out of the _final_ dimension of the array. Let's find out where these particular items are located in their sublists. Awkward Array provides a special function {func}`ak.local_index` to find the index of each item in the array
```{code-cell} ipython3
ak.local_index(["x", "y", "z"])
```
The word "local" refers to the way that {func}`ak.local_index` computes the index of each item relative to the sublist in which it is found. e.g. for a two-dimensional array:
```{code-cell} ipython3
ak.local_index(
[
["up", "charm", "top"],
["down", "strange"],
["bottom"],
]
)
```
{func}`ak.local_index` also takes an `axis` parameter, but here we only need the default `axis=-1`. It can be seen that this local index has exactly the same _structure_ as `array`.
```{code-cell} ipython3
array
```
```{code-cell} ipython3
ak.local_index(array)
```
To create our awkward index, all we need to do is create an array _like_ `ak.local_index(array)`, but with only the local indices that we want to keep, i.e.
```{code-cell} ipython3
index = ak.Array(
[
[[], [0], [0]],
[],
[[2, 3, 4]],
]
)
```
We can see that this array matches the leading structure of `array`, and has at least one `var` dimension
```{code-cell} ipython3
index.type.show()
```
Let's see what slicing `array` with this awkward index looks like:
```{code-cell} ipython3
array[index]
```
Clearly this index produces the result that we were aiming for!
+++
(how-to-filter-ragged:indexing-with-argmin-and-argmax)=
## Indexing with `argmin` and `argmax`
+++
Awkward indexing is especially useful when combined with the positional {func}`ak.argmin` and {func}`ak.argmax` reducers. These functions accept an `keepdims=True` argument that can be used to keep _the same number of dimensions_ as the original array. There is also a `mask_identity` argument is explained in {ref}`how-to-filter-ragged:indexing-with-missing-values`. For now, we will set it to `False`.
```{code-cell} ipython3
array = ak.Array(
[
[10, 3, 2, 9],
[4, 5, 5, 12, 6],
[8, 9, -1],
]
)
array
```
With `keepdims=False`, all reducers collapse a dimension of the original array:
```{code-cell} ipython3
ak.argmin(array, axis=1, keepdims=False, mask_identity=False)
```
If we try and use this index to slice `array`, it will likely not produce the result we might initially expect:
```{code-cell} ipython3
array[ak.argmin(array, axis=1, keepdims=False, mask_identity=False)]
```
Instead of pulling out the smallest items in `array` along `axis=1`, we have simply re-arranged the sublists of `array` along `axis=0`. Our index has only a single dimension, so for each value in `ak.argmin(array, axis=-1)`, Awkward pulls out the corresponding item from `array`. We want to pull values out of the _second_ dimension, so our index array needs to be two dimensional.
+++
Let's now look at what happens with `keepdims=True`. The result is a two dimensional, fully regular array, with no missing values:
```{code-cell} ipython3
ak.argmin(array, axis=-1, keepdims=True, mask_identity=False)
```
Before we can use this as an index array, we need to convert _at least_ one dimension to a ragged dimension. This follows from rule (2) described in {ref}`how-to-filter-masked:building-an-awkward-index`.
```{code-cell} ipython3
ak.from_regular(
ak.argmin(array, axis=-1, keepdims=True, mask_identity=False)
)
```
We can now use this array to index into `array`:
```{code-cell} ipython3
array[
ak.from_regular(
ak.argmin(array, axis=-1, keepdims=True, mask_identity=False)
)
]
```
it produces the expected result!
+++
## Filtering with booleans
As described in {ref}`how-to-filter-masked:building-an-awkward-index`, Awkward Array's awkward indexing is a generalisation of the advanced indexing supported by NumPy. It is therefore reasonable to ask whether Awkward supports awkward indexing with
_boolean_ values, selecting only values for which the index is `True`.
Let's create an array of integers:
```{code-cell} ipython3
numbers = ak.Array(
[
[0, 1, 2, 3],
[4, 5, 6],
[8, 9, 10, 11, 12],
]
)
```
We can use awkward indexing to keep only the even values. Let's generate a boolean mask with the same structure as `numbers`. In order for there to be a single boolean value for each item in `numbers`, the filter array must have exactly the same number of elements. Ufuncs, such as {func}`np.mod`, are powerful tools for generating boolean masks, as they directly preserve the exact structure of the original array:
```{code-cell} ipython3
is_even = (numbers % 2) == 0
is_even
```
```{code-cell} ipython3
numbers
```
Now we can use `is_even` to slice `numbers`:
```{code-cell} ipython3
numbers[is_even]
```
Note that this is different to what would happen with NumPy's boolean indexing:
```{code-cell} ipython3
numbers_np = np.array(
[
[0, 1, 2, 3],
[4, 5, 6, 7],
[8, 9, 10, 11],
]
)
```
```{code-cell} ipython3
numbers_np[(numbers_np % 2) == 0]
```
NumPy, lacking a ragged array structure, has to flatten the result whereas Awkward Array preserves the number of dimensions in the result.
```{code-cell} ipython3
numbers[
[[True, False, True, False],
[False],
[False, True, False]]
]
```