# How to filter arrays: cutting vs. masking#

```import awkward as ak
import numpy as np
```

## The problem with slicing#

When you write a mathematical formula using binary operators like `+` and `*`, or NumPy universal functions (ufuncs) like `np.sqrt`, the shapes of nested lists must align. If the arrays in an expression were derived from a single array, this is often automatic. For instance,

```original_array = ak.Array([
[
{"title": "zero", "x": 0, "y": 0},
{"title": "one", "x": 1, "y": 1.1},
{"title": "two", "x": 2, "y": 2.2},
],
[],
[
{"title": "three", "x": 3, "y": 3.3},
{"title": "four", "x": 4, "y": 4.4},
],
[
{"title": "five", "x": 5, "y": 5.5},
],
[
{"title": "six", "x": 6, "y": 6.6},
{"title": "seven", "x": 7, "y": 7.7},
{"title": "eight", "x": 8, "y": 8.8},
{"title": "nine", "x": 9, "y": 9.9},
],
])
```
```array_x = original_array.x
array_y = original_array.y
```

The `array_x` and `array_y` have the same number of lists and the same numbers of items in each list because they were both slices of the `original_array`.

```array_x
```
```[[0, 1, 2],
[],
[3, 4],
[5],
[6, 7, 8, 9]]
---------------------
type: 5 * var * int64```
```array_y
```
```[[0, 1.1, 2.2],
[],
[3.3, 4.4],
[5.5],
[6.6, 7.7, 8.8, 9.9]]
-----------------------
type: 5 * var * float64```

Thus, they can be used together in a mathematical formula.

```array_x**2 + array_y**2
```
```[[0, 2.21, 8.84],
[],
[19.9, 35.4],
[55.2],
[79.6, 108, 141, 179]]
-----------------------
type: 5 * var * float64```

However, if one array is sliced, or if the two arrays are sliced by different criteria, they would no longer line up:

```sliced_x = array_x[array_x > 3]
sliced_y = array_y[array_y > 3]
```
```sliced_x
```
```[[],
[],
[4],
[5],
[6, 7, 8, 9]]
---------------------
type: 5 * var * int64```
```sliced_y
```
```[[],
[],
[3.3, 4.4],
[5.5],
[6.6, 7.7, 8.8, 9.9]]
-----------------------
type: 5 * var * float64```

Notice that the first was sliced with `array_x > 3` and the second was sliced with `array_y > 3`, and as a result, the third list differs in length between the two arrays:

```sliced_x[2], sliced_y[2]
```
```(<Array [4] type='1 * int64'>, <Array [3.3, 4.4] type='2 * float64'>)
```

If we try to use these together, we get a ValueError:

```sliced_x**2 + sliced_y**2
```
```---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[11], line 1
----> 1 sliced_x**2 + sliced_y**2

File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/_operators.py:53, in _binary_method.<locals>.func(self, other)
51 if _disables_array_ufunc(other):
52     return NotImplemented
---> 53 return ufunc(self, other)

File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/highlevel.py:1511, in Array.__array_ufunc__(self, ufunc, method, *inputs, **kwargs)
1509 name = f"{type(ufunc).__module__}.{ufunc.__name__}.{method!s}"
1510 with ak._errors.OperationErrorContext(name, inputs, kwargs):
-> 1511     return ak._connect.numpy.array_ufunc(ufunc, method, inputs, kwargs)

File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/_connect/numpy.py:466, in array_ufunc(ufunc, method, inputs, kwargs)
458         raise TypeError(
459             "no {}.{} overloads for custom types: {}".format(
460                 type(ufunc).__module__, ufunc.__name__, ", ".join(error_message)
461             )
462         )
464     return None
467     inputs, action, allow_records=False, function_name=ufunc.__name__
468 )
470 if len(out) == 1:
471     return wrap_layout(out[0], behavior=behavior, attrs=attrs)

966 backend = backend_of(*inputs, coerce_to_common=False)
967 isscalar = []
--> 968 out = apply_step(
969     backend,
971     action,
972     0,
973     depth_context,
974     lateral_context,
975     {
976         "allow_records": allow_records,
979         "numpy_to_regular": numpy_to_regular,
980         "regular_to_jagged": regular_to_jagged,
981         "function_name": function_name,
983     },
984 )
985 assert isinstance(out, tuple)
986 return tuple(broadcast_unpack(x, isscalar) for x in out)

File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/_broadcasting.py:946, in apply_step(backend, inputs, action, depth, depth_context, lateral_context, options)
944     return result
945 elif result is None:
--> 946     return continuation()
947 else:
948     raise AssertionError(result)

913 # Any non-string list-types?
914 elif any(x.is_list and not is_string_like(x) for x in contents):
917 # Any RecordArrays?
918 elif any(x.is_record for x in contents):

619         nextinputs.append(x)
620         nextparameters.append(NO_PARAMETERS)
--> 622 outcontent = apply_step(
623     backend,
624     nextinputs,
625     action,
626     depth + 1,
627     copy.copy(depth_context),
628     lateral_context,
629     options,
630 )
631 assert isinstance(outcontent, tuple)
632 parameters = parameters_factory(nextparameters, len(outcontent))

File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/_broadcasting.py:946, in apply_step(backend, inputs, action, depth, depth_context, lateral_context, options)
944     return result
945 elif result is None:
--> 946     return continuation()
947 else:
948     raise AssertionError(result)

913 # Any non-string list-types?
914 elif any(x.is_list and not is_string_like(x) for x in contents):
917 # Any RecordArrays?
918 elif any(x.is_record for x in contents):

669 for x, x_is_string in zip(inputs, input_is_string):
670     if isinstance(x, listtypes) and not x_is_string:
--> 671         next_content = broadcast_to_offsets_avoiding_carry(x, offsets)
672         nextinputs.append(next_content)
673         nextparameters.append(x._parameters)

369         return list_content.content[:next_length]
370     else:
372 elif isinstance(list_content, ListArray):
373     # Is this list contiguous?
374     if index_nplike.array_equal(
375         list_content.starts.data[1:], list_content.stops.data[:-1]
376     ):
377         # Does this list match the offsets?

406     next_content = self._content[this_start:]
408 if index_nplike.known_data and not index_nplike.array_equal(
409     this_zero_offsets, offsets.data
410 ):
--> 411     raise ValueError("cannot broadcast nested list")
413 return ListOffsetArray(
414     offsets, next_content[: offsets[-1]], parameters=self._parameters
415 )

This error occurred while calling

<Array [[], [], ..., [25], [36, 49, 64, 81]] type='5 * var * int64'>
<Array [[], [], ..., [43.6, 59.3, 77.4, 98]] type='5 * var * float64'>
)
```

Sometimes, these misalignments are overt, but sometimes they’re subtle and embedded deep within a very large array. You can start investigating a problem like this with `ak.num()`:

```ak.num(sliced_x) != ak.num(sliced_y)
```
```[False,
False,
True,
False,
False]
--------------
type: 5 * bool```
```np.nonzero(ak.to_numpy(ak.num(sliced_x) != ak.num(sliced_y)))
```
```(array([2]),)
```

But it’s also possible to avoid them in the first place.

The problem was that the two arrays’ shapes changed differently; instead, we’ll slice them in such a way that their shapes don’t change at all.

The `ak.mask()` function uses a boolean array like a slice, but takes values that line up with `False` and returns `None` instead of removing them.

```ak.mask(array_x, array_x > 3)
```
```[[None, None, None],
[],
[None, 4],
[5],
[6, 7, 8, 9]]
----------------------
type: 5 * var * ?int64```

It can also be accessed as an array property, with square brackets, so that it resembles a slice:

```masked_x = array_x.mask[array_x > 3]
```
```masked_x
```
```[[None, None, None],
[],
[None, 4],
[5],
[6, 7, 8, 9]]
----------------------
type: 5 * var * ?int64```
```masked_y
```
```[[None, None, None],
[],
[3.3, 4.4],
[5.5],
[6.6, 7.7, 8.8, 9.9]]
------------------------
type: 5 * var * ?float64```

The results of these two masks can be used in a mathematical expression because they line up:

```result = masked_x**2 + masked_y**2
result
```
```[[None, None, None],
[],
[None, 35.4],
[55.2],
[79.6, 108, 141, 179]]
------------------------
type: 5 * var * ?float64```

Now only one problem remains: the `None` (missing) values might be undesirable in the output. There are several ways to get rid of them:

• `ak.drop_none()` eliminates `None`, like a slice, but it can be done once at the end of a calculation,

• `ak.fill_none()` replaces `None` with a chosen value,

• `ak.flatten()` removes list structure, and if the `None` values are at the level of a list (the ones in `result` aren’t), they’ll be removed too,

• `ak.singletons()` replaces `None` with `[]` and any other value `x` with `[x]`. The resulting lists all have length 0 or length 1.

```ak.drop_none(result, axis=1)
```
```[[],
[],
[35.4],
[55.2],
[79.6, 108, 141, 179]]
-----------------------
type: 5 * var * float64```
```ak.fill_none(result, -1, axis=1)
```
```[[-1, -1, -1],
[],
[-1, 35.4],
[55.2],
[79.6, 108, 141, 179]]
-----------------------
type: 5 * var * float64```
```ak.singletons(result, axis=1)
```
```[[[], [], []],
[],
[[], [35.4]],
[[55.2]],
[[79.6], [108], [141], [179]]]
-------------------------------
type: 5 * var * var * float64```

As a final note, the difference between using `ak.drop_none()` and slicing with the result of `ak.is_none()` is that `ak.drop_none()` also removes “missingness” from the data type; a slice does not.

```result[~ak.is_none(result, axis=1)]
```
```[[],
[],
[35.4],
[55.2],
[79.6, 108, 141, 179]]
------------------------
type: 5 * var * ?float64```

(Note the `?` for “option-type” before `float64`. This could have consequences, good or bad, at a later stage in processing.)