How to filter with arrays containing missing values#
import awkward as ak
import numpy as np
Indexing with missing values#
In Building an awkward index, we looked building arrays of integers to perform awkward indexing using ak.argmin()
and ak.argmax()
. In particular, the keepdims
argument of ak.argmin()
and ak.argmax()
is very useful for creating arrays that can be used to index into the original array. However, reducers such as ak.argmax()
behave differently when they are asked to operate upon empty lists.
Let’s first create an array that contains empty sublists:
array = ak.Array(
[
[],
[10, 3, 2, 9],
[4, 5, 5, 12, 6],
[],
[8, 9, -1],
]
)
array
[[], [10, 3, 2, 9], [4, 5, 5, 12, 6], [], [8, 9, -1]] --------------------- type: 5 * var * int64
Awkward reducers accept a mask_identity
argument, which changes the ak.Array.type
and the values of the result:
ak.argmax(array, keepdims=True, axis=-1, mask_identity=False)
[[-1], [0], [3], [-1], [1]] ------------------- type: 5 * 1 * int64
ak.argmax(array, keepdims=True, axis=-1, mask_identity=True)
[[None], [0], [3], [None], [1]] -------------------- type: 5 * 1 * ?int64
Setting mask_identity=True
yields the identity value for the reducer instead of None
when reducing empty lists. From the above examples of ak.argmax()
, we can see that the identity for the ak.argmax()
is -1
: What happens if we try and use the array produced with mask_identity=False
to index into array
?
As discussed in Indexing with argmin and argmax, we first need to convert at least one dimension to a ragged dimension
index = ak.from_regular(
ak.argmax(array, keepdims=True, axis=-1, mask_identity=False)
)
Now, if we try and index into array
with index
, it will raise an exception
array[index]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
File ~/micromamba/envs/awkward-docs/lib/python3.10/site-packages/awkward/highlevel.py:1023, in Array.__getitem__(self, where)
1021 with ak._errors.SlicingErrorContext(self, where):
1022 return wrap_layout(
-> 1023 prepare_layout(self._layout[where]), self._behavior, allow_other=True
1024 )
File ~/micromamba/envs/awkward-docs/lib/python3.10/site-packages/awkward/contents/content.py:517, in Content.__getitem__(self, where)
516 def __getitem__(self, where):
--> 517 return self._getitem(where)
File ~/micromamba/envs/awkward-docs/lib/python3.10/site-packages/awkward/contents/content.py:570, in Content._getitem(self, where)
569 elif isinstance(where, ak.highlevel.Array):
--> 570 return self._getitem(where.layout)
572 # Convert between nplikes of different backends
File ~/micromamba/envs/awkward-docs/lib/python3.10/site-packages/awkward/contents/content.py:645, in Content._getitem(self, where)
644 elif isinstance(where, Content):
--> 645 return self._getitem((where,))
647 elif is_sized_iterable(where):
648 # Do we have an array
File ~/micromamba/envs/awkward-docs/lib/python3.10/site-packages/awkward/contents/content.py:562, in Content._getitem(self, where)
555 next = ak.contents.RegularArray(
556 this,
557 this.length,
558 1,
559 parameters=None,
560 )
--> 562 out = next._getitem_next(nextwhere[0], nextwhere[1:], None)
564 if out.length is not unknown_length and out.length == 0:
File ~/micromamba/envs/awkward-docs/lib/python3.10/site-packages/awkward/contents/regulararray.py:703, in RegularArray._getitem_next(self, head, tail, advanced)
688 self._maybe_index_error(
689 self._backend[
690 "awkward_RegularArray_getitem_jagged_expand",
(...)
701 slicer=head,
702 )
--> 703 down = self._content._getitem_next_jagged(
704 multistarts, multistops, head._content, tail
705 )
707 return RegularArray(
708 down, headlength, self._length, parameters=self._parameters
709 )
File ~/micromamba/envs/awkward-docs/lib/python3.10/site-packages/awkward/contents/listoffsetarray.py:417, in ListOffsetArray._getitem_next_jagged(self, slicestarts, slicestops, slicecontent, tail)
414 out = ak.contents.ListArray(
415 self.starts, self.stops, self._content, parameters=self._parameters
416 )
--> 417 return out._getitem_next_jagged(slicestarts, slicestops, slicecontent, tail)
File ~/micromamba/envs/awkward-docs/lib/python3.10/site-packages/awkward/contents/listarray.py:541, in ListArray._getitem_next_jagged(self, slicestarts, slicestops, slicecontent, tail)
532 assert (
533 outoffsets.nplike is self._backend.index_nplike
534 and nextcarry.nplike is self._backend.index_nplike
(...)
539 and self._stops.nplike is self._backend.index_nplike
540 )
--> 541 self._maybe_index_error(
542 self._backend[
543 "awkward_ListArray_getitem_jagged_apply",
544 outoffsets.dtype.type,
545 nextcarry.dtype.type,
546 slicestarts.dtype.type,
547 slicestops.dtype.type,
548 sliceindex.dtype.type,
549 self._starts.dtype.type,
550 self._stops.dtype.type,
551 ](
552 outoffsets.data,
553 nextcarry.data,
554 slicestarts.data,
555 slicestops.data,
556 slicestarts.length,
557 sliceindex.data,
558 sliceindex.length,
559 self._starts.data,
560 self._stops.data,
561 self._content.length,
562 ),
563 slicer=ak.contents.ListArray(slicestarts, slicestops, slicecontent),
564 )
565 nextcontent = self._content._carry(nextcarry, True)
File ~/micromamba/envs/awkward-docs/lib/python3.10/site-packages/awkward/contents/content.py:280, in Content._maybe_index_error(self, error, slicer)
279 message = self._backend.format_kernel_error(error)
--> 280 raise ak._errors.index_error(self, slicer, message)
IndexError: cannot slice ListArray (of length 5) with [[-1], [0], [3], [-1], [1]]: index out of range while attempting to get index -1 (in compiled code: https://github.com/scikit-hep/awkward/blob/awkward-cpp-25/awkward-cpp/src/cpu-kernels/awkward_ListArray_getitem_jagged_apply.cpp#L43)
The above exception was the direct cause of the following exception:
IndexError Traceback (most recent call last)
Cell In[6], line 1
----> 1 array[index]
File ~/micromamba/envs/awkward-docs/lib/python3.10/site-packages/awkward/highlevel.py:1021, in Array.__getitem__(self, where)
592 def __getitem__(self, where):
593 """
594 Args:
595 where (many types supported; see below): Index of positions to
(...)
1019 have the same dimension as the array being indexed.
1020 """
-> 1021 with ak._errors.SlicingErrorContext(self, where):
1022 return wrap_layout(
1023 prepare_layout(self._layout[where]), self._behavior, allow_other=True
1024 )
File ~/micromamba/envs/awkward-docs/lib/python3.10/site-packages/awkward/_errors.py:67, in ErrorContext.__exit__(self, exception_type, exception_value, traceback)
60 try:
61 # Handle caught exception
62 if (
63 exception_type is not None
64 and issubclass(exception_type, Exception)
65 and self.primary() is self
66 ):
---> 67 self.handle_exception(exception_type, exception_value)
68 finally:
69 # `_kwargs` may hold cyclic references, that we really want to avoid
70 # as this can lead to large buffers remaining in memory for longer than absolutely necessary
71 # Let's just clear this, now.
72 self._kwargs.clear()
File ~/micromamba/envs/awkward-docs/lib/python3.10/site-packages/awkward/_errors.py:82, in ErrorContext.handle_exception(self, cls, exception)
80 self.decorate_exception(cls, exception)
81 else:
---> 82 raise self.decorate_exception(cls, exception)
IndexError: cannot slice ListArray (of length 5) with [[-1], [0], [3], [-1], [1]]: index out of range while attempting to get index -1 (in compiled code: https://github.com/scikit-hep/awkward/blob/awkward-cpp-25/awkward-cpp/src/cpu-kernels/awkward_ListArray_getitem_jagged_apply.cpp#L43)
This error occurred while attempting to slice
<Array [[], [10, 3, 2, 9], ..., [], [8, 9, -1]] type='5 * var * int64'>
with
<Array [[-1], [0], [3], [-1], [1]] type='5 * var * int64'>
From the error message, it is clear that for some sublist(s) the index -1
is out of range. This makes sense; some of our sublists are empty, meaning that there is no valid integer to index into them.
Now let’s look at the result of indexing with mask_identity=True
.
index = ak.argmax(array, keepdims=True, axis=-1, mask_identity=True)
Because it contains an option type, index
already satisfies rule (2) in Building an awkward index, and we do not need to convert it to a ragged array. We can see that this index succeeds:
array[index]
[[None], [10], [12], [None], [9]] ---------------------- type: 5 * var * ?int64
Here, the missing values in the index array correspond to missing values in the output array.
Indexing with missing sublists#
Ragged indexing also supports using None
in place of empty sublists within an index. For example, given the following array
array = ak.Array(
[
[10, 3, 2, 9],
[4, 5, 5, 12, 6],
[],
[8, 9, -1],
]
)
array
[[10, 3, 2, 9], [4, 5, 5, 12, 6], [], [8, 9, -1]] --------------------- type: 4 * var * int64
let’s use build a ragged index to pull out some particular values. Rather than using empty lists, we can use None
to mask out sublists that we don’t care about:
array[
[
[0, 1],
None,
[],
[2],
],
]
[[10, 3], None, [], [-1]] ----------------------------- type: 4 * option[var * int64]
If we compare this with simply providing an empty sublist,
array[
[
[0, 1],
[],
[],
[2],
],
]
[[10, 3], [], [], [-1]] --------------------- type: 4 * var * int64
we can see that the None
value introduces an option-type into the final result. None
values can be used at any level in the index array to introduce an option-type at that depth in the result.