How to filter with arrays containing missing values#

import awkward as ak
import numpy as np

Indexing with missing values#

In Building an awkward index, we looked building arrays of integers to perform awkward indexing using ak.argmin() and ak.argmax(). In particular, the keepdims argument of ak.argmin() and ak.argmax() is very useful for creating arrays that can be used to index into the original array. However, reducers such as ak.argmax() behave differently when they are asked to operate upon empty lists.

Let’s first create an array that contains empty sublists:

array = ak.Array(
    [
        [],
        [10, 3, 2, 9],
        [4, 5, 5, 12, 6],
        [],
        [8, 9, -1],
    ]
)
array
[[],
 [10, 3, 2, 9],
 [4, 5, 5, 12, 6],
 [],
 [8, 9, -1]]
---------------------
type: 5 * var * int64

Awkward reducers accept a mask_identity argument, which changes the ak.Array.type and the values of the result:

ak.argmax(array, keepdims=True, axis=-1, mask_identity=False)
[[-1],
 [0],
 [3],
 [-1],
 [1]]
-------------------
type: 5 * 1 * int64
ak.argmax(array, keepdims=True, axis=-1, mask_identity=True)
[[None],
 [0],
 [3],
 [None],
 [1]]
--------------------
type: 5 * 1 * ?int64

Setting mask_identity=True yields the identity value for the reducer instead of None when reducing empty lists. From the above examples of ak.argmax(), we can see that the identity for the ak.argmax() is -1: What happens if we try and use the array produced with mask_identity=False to index into array?

As discussed in Indexing with argmin and argmax, we first need to convert at least one dimension to a ragged dimension

index = ak.from_regular(
    ak.argmax(array, keepdims=True, axis=-1, mask_identity=False)
)

Now, if we try and index into array with index, it will raise an exception

array[index]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
File ~/micromamba-root/envs/awkward-docs/lib/python3.10/site-packages/awkward/highlevel.py:950, in Array.__getitem__(self, where)
    949 with ak._errors.SlicingErrorContext(self, where):
--> 950     out = self._layout[where]
    951     if isinstance(out, ak.contents.NumpyArray):

File ~/micromamba-root/envs/awkward-docs/lib/python3.10/site-packages/awkward/contents/content.py:550, in Content.__getitem__(self, where)
    549 def __getitem__(self, where):
--> 550     return self._getitem(where)

File ~/micromamba-root/envs/awkward-docs/lib/python3.10/site-packages/awkward/contents/content.py:603, in Content._getitem(self, where)
    602 elif isinstance(where, ak.highlevel.Array):
--> 603     return self._getitem(where.layout)
    605 # Convert between nplikes of different backends

File ~/micromamba-root/envs/awkward-docs/lib/python3.10/site-packages/awkward/contents/content.py:678, in Content._getitem(self, where)
    677 elif isinstance(where, Content):
--> 678     return self._getitem((where,))
    680 elif is_sized_iterable(where):
    681     # Do we have an array

File ~/micromamba-root/envs/awkward-docs/lib/python3.10/site-packages/awkward/contents/content.py:595, in Content._getitem(self, where)
    588 next = ak.contents.RegularArray(
    589     this,
    590     this.length,
    591     1,
    592     parameters=None,
    593 )
--> 595 out = next._getitem_next(nextwhere[0], nextwhere[1:], None)
    597 if out.length is not unknown_length and out.length == 0:

File ~/micromamba-root/envs/awkward-docs/lib/python3.10/site-packages/awkward/contents/regulararray.py:707, in RegularArray._getitem_next(self, head, tail, advanced)
    692 self._handle_error(
    693     self._backend[
    694         "awkward_RegularArray_getitem_jagged_expand",
   (...)
    705     slicer=head,
    706 )
--> 707 down = self._content._getitem_next_jagged(
    708     multistarts, multistops, head._content, tail
    709 )
    711 return RegularArray(
    712     down, headlength, self._length, parameters=self._parameters
    713 )

File ~/micromamba-root/envs/awkward-docs/lib/python3.10/site-packages/awkward/contents/listoffsetarray.py:435, in ListOffsetArray._getitem_next_jagged(self, slicestarts, slicestops, slicecontent, tail)
    432 out = ak.contents.ListArray(
    433     self.starts, self.stops, self._content, parameters=self._parameters
    434 )
--> 435 return out._getitem_next_jagged(slicestarts, slicestops, slicecontent, tail)

File ~/micromamba-root/envs/awkward-docs/lib/python3.10/site-packages/awkward/contents/listarray.py:500, in ListArray._getitem_next_jagged(self, slicestarts, slicestops, slicecontent, tail)
    491 assert (
    492     outoffsets.nplike is self._backend.index_nplike
    493     and nextcarry.nplike is self._backend.index_nplike
   (...)
    498     and self._stops.nplike is self._backend.index_nplike
    499 )
--> 500 self._handle_error(
    501     self._backend[
    502         "awkward_ListArray_getitem_jagged_apply",
    503         outoffsets.dtype.type,
    504         nextcarry.dtype.type,
    505         slicestarts.dtype.type,
    506         slicestops.dtype.type,
    507         sliceindex.dtype.type,
    508         self._starts.dtype.type,
    509         self._stops.dtype.type,
    510     ](
    511         outoffsets.data,
    512         nextcarry.data,
    513         slicestarts.data,
    514         slicestops.data,
    515         slicestarts.length,
    516         sliceindex.data,
    517         sliceindex.length,
    518         self._starts.data,
    519         self._stops.data,
    520         self._content.length,
    521     ),
    522     slicer=ak.contents.ListArray(slicestarts, slicestops, slicecontent),
    523 )
    524 nextcontent = self._content._carry(nextcarry, True)

File ~/micromamba-root/envs/awkward-docs/lib/python3.10/site-packages/awkward/contents/content.py:288, in Content._handle_error(self, error, slicer)
    287 else:
--> 288     raise ak._errors.index_error(self, slicer, message)

IndexError: cannot slice ListArray (of length 5) with [[-1], [0], [3], [-1], [1]]: index out of range while attempting to get index -1 (in compiled code: https://github.com/scikit-hep/awkward/blob/awkward-cpp-15/awkward-cpp/src/cpu-kernels/awkward_ListArray_getitem_jagged_apply.cpp#L43)

The above exception was the direct cause of the following exception:

IndexError                                Traceback (most recent call last)
Cell In[6], line 1
----> 1 array[index]

File ~/micromamba-root/envs/awkward-docs/lib/python3.10/site-packages/awkward/highlevel.py:949, in Array.__getitem__(self, where)
    520 def __getitem__(self, where):
    521     """
    522     Args:
    523         where (many types supported; see below): Index of positions to
   (...)
    947     have the same dimension as the array being indexed.
    948     """
--> 949     with ak._errors.SlicingErrorContext(self, where):
    950         out = self._layout[where]
    951         if isinstance(out, ak.contents.NumpyArray):

File ~/micromamba-root/envs/awkward-docs/lib/python3.10/site-packages/awkward/_errors.py:56, in ErrorContext.__exit__(self, exception_type, exception_value, traceback)
     53 try:
     54     # Handle caught exception
     55     if exception_type is not None and self.primary() is self:
---> 56         self.handle_exception(exception_type, exception_value)
     57 finally:
     58     # `_kwargs` may hold cyclic references, that we really want to avoid
     59     # as this can lead to large buffers remaining in memory for longer than absolutely necessary
     60     # Let's just clear this, now.
     61     self._kwargs.clear()

File ~/micromamba-root/envs/awkward-docs/lib/python3.10/site-packages/awkward/_errors.py:71, in ErrorContext.handle_exception(self, cls, exception)
     69     self.decorate_exception(cls, exception)
     70 else:
---> 71     raise self.decorate_exception(cls, exception)

IndexError: cannot slice ListArray (of length 5) with [[-1], [0], [3], [-1], [1]]: index out of range while attempting to get index -1 (in compiled code: https://github.com/scikit-hep/awkward/blob/awkward-cpp-15/awkward-cpp/src/cpu-kernels/awkward_ListArray_getitem_jagged_apply.cpp#L43)

This error occurred while attempting to slice

    <Array [[], [10, 3, 2, 9], ..., [], [8, 9, -1]] type='5 * var * int64'>

with

    <Array [[-1], [0], [3], [-1], [1]] type='5 * var * int64'>

From the error message, it is clear that for some sublist(s) the index -1 is out of range. This makes sense; some of our sublists are empty, meaning that there is no valid integer to index into them.

Now let’s look at the result of indexing with mask_identity=True.

index = ak.argmax(array, keepdims=True, axis=-1, mask_identity=True)

Because it contains an option type, index already satisfies rule (2) in Building an awkward index, and we do not need to convert it to a ragged array. We can see that this index succeeds:

array[index]
[[None],
 [10],
 [12],
 [None],
 [9]]
----------------------
type: 5 * var * ?int64

Here, the missing values in the index array correspond to missing values in the output array.

Indexing with missing sublists#

Ragged indexing also supports using None in place of empty sublists within an index. For example, given the following array

array = ak.Array(
    [
        [10, 3, 2, 9],
        [4, 5, 5, 12, 6],
        [],
        [8, 9, -1],
    ]
)
array
[[10, 3, 2, 9],
 [4, 5, 5, 12, 6],
 [],
 [8, 9, -1]]
---------------------
type: 4 * var * int64

let’s use build a ragged index to pull out some particular values. Rather than using empty lists, we can use None to mask out sublists that we don’t care about:

array[
    [
        [0, 1],
        None,
        [],
        [2],
    ],
]
[[10, 3],
 None,
 [],
 [-1]]
-----------------------------
type: 4 * option[var * int64]

If we compare this with simply providing an empty sublist,

array[
    [
        [0, 1],
        [],
        [],
        [2],
    ],
]
[[10, 3],
 [],
 [],
 [-1]]
---------------------
type: 4 * var * int64

we can see that the None value introduces an option-type into the final result. None values can be used at any level in the index array to introduce an option-type at that depth in the result.