How to examine an array with simple slicing#
Slicing data from an array is a basic operation in array-oriented data analysis. Awkward Array extends NumPy’s slicing capabilities to handle nested and ragged data structures. This tutorial illustrates several ways to slice an array.
For a complete list of slicing features, see ak.Array.__getitem__()
.
import awkward as ak
import numpy as np
Basic slicing with ranges#
Much like NumPy, you can slice Awkward Arrays using simple ranges specified with colons (start:stop:step
). Here’s an example of a regular (non-ragged) Awkward Array:
array = ak.Array(np.arange(10)**2) # squaring numbers for clarity
array
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81] ---------------- backend: cpu nbytes: 80 B type: 10 * int64
To select the first five elements:
array[:5]
[0, 1, 4, 9, 16] --------------- backend: cpu nbytes: 40 B type: 5 * int64
To select from the fifth-to-last onward:
array[-5:]
[25, 36, 49, 64, 81] --------------- backend: cpu nbytes: 40 B type: 5 * int64
To select every other element starting from the second:
array[1::2]
[1, 9, 25, 49, 81] --------------- backend: cpu nbytes: 40 B type: 5 * int64
Multiple ranges for multiple dimensions#
Similarly, for multidimensional data,
np_array3d = np.arange(2*3*5).reshape(2, 3, 5)
np_array3d
array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]],
[[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29]]])
array3d = ak.Array(np_array3d)
array3d
[[[0, 1, 2, 3, 4], [5, 6, 7, 8, 9], [10, 11, 12, 13, 14]], [[15, 16, 17, 18, 19], [20, 21, 22, 23, 24], [25, 26, 27, 28, 29]]] -------------------------------------------------------------------- backend: cpu nbytes: 240 B type: 2 * 3 * 5 * int64
np_array3d[1, ::2, 1:-1]
array([[16, 17, 18],
[26, 27, 28]])
array3d[1, ::2, 1:-1]
[[16, 17, 18], [26, 27, 28]] ------------------- backend: cpu nbytes: 48 B type: 2 * 3 * int64
Just as with NumPy, a single colon (:
) means “take everything from this dimension” and an ellipsis (...
) expands to all dimensions between two slices.
array3d[:, :, 1:-1]
[[[1, 2, 3], [6, 7, 8], [11, 12, 13]], [[16, 17, 18], [21, 22, 23], [26, 27, 28]]] -------------------------------------------- backend: cpu nbytes: 144 B type: 2 * 3 * 3 * int64
array3d[..., 1:-1]
[[[1, 2, 3], [6, 7, 8], [11, 12, 13]], [[16, 17, 18], [21, 22, 23], [26, 27, 28]]] -------------------------------------------- backend: cpu nbytes: 144 B type: 2 * 3 * 3 * int64
Boolean array slices#
Like NumPy’s advanced slicing, an array of booleans filters individual items. For instance, consider an array of booleans constructed by asking which elements of array
are greater than 20:
array > 20
[False, False, False, False, False, True, True, True, True, True] --------------- backend: cpu nbytes: 10 B type: 10 * bool
When applied to array
between square brackets, the boolean array eliminates all items in which array > 20
is False
:
array[array > 20]
[25, 36, 49, 64, 81] --------------- backend: cpu nbytes: 40 B type: 5 * int64
Boolean array slicing is more powerful than range slicing because the True
and False
values may have any pattern. The following selects only even numbers.
array % 2 == 0
[True, False, True, False, True, False, True, False, True, False] --------------- backend: cpu nbytes: 10 B type: 10 * bool
array[array % 2 == 0]
[0, 4, 16, 36, 64] --------------- backend: cpu nbytes: 40 B type: 5 * int64
Integer array slices#
You can also use arrays of integer indices to select specific elements.
indices = ak.Array([2, 5, 3])
array[indices]
[4, 25, 9] --------------- backend: cpu nbytes: 24 B type: 3 * int64
If you are passing indexes directly between the array
’s square brackets, be sure that they, too, are nested within square brackets (to be a list, rather than a tuple).
array[[2, 5, 3]]
[4, 25, 9] --------------- backend: cpu nbytes: 24 B type: 3 * int64
In addition to picking elements out of order, you can pick the same element multiple times.
array[[2, 5, 5, 5, 5, 5, 3]]
[4, 25, 25, 25, 25, 25, 9] --------------- backend: cpu nbytes: 56 B type: 7 * int64
Any slices that could be performed by boolean arrays can be performed by integer arrays, but only integer arrays can reorder and duplicate elements.
Ragged array slicing#
One of the unique features of Awkward Array is its ability to handle ragged arrays efficiently. Here’s an example of a ragged array:
ragged_array = ak.Array([[10, 20, 30], [40], [], [50, 60]])
ragged_array
[[10, 20, 30], [40], [], [50, 60]] --------------------- backend: cpu nbytes: 88 B type: 4 * var * int64
You can slice individual sublists like this:
ragged_array[1]
[40] --------------- backend: cpu nbytes: 8 B type: 1 * int64
And you can perform slices that operate across the sublists:
ragged_array[:, :2] # get first two elements of each sublist
[[10, 20], [40], [], [50, 60]] --------------------- backend: cpu nbytes: 80 B type: 4 * var * int64
Ranges and single indices mixed with slice notation allow the complexity of ragged slicing to express selecting ranges in nested lists, a feature unique to Awkward Array beyond NumPy’s capabilities. Here’s an example where we skip the first element of each sublist that has more than one element:
ragged_array[ak.num(ragged_array) > 1, 1:]
[[20, 30], [60]] --------------------- backend: cpu nbytes: 48 B type: 2 * var * int64
Boolean array slicing with missing data#
When working with boolean arrays for slicing, the arrays can include None
(missing) values. Awkward Array handles missing data gracefully during boolean slicing:
bool_mask = ak.Array([True, None, False, True])
array[bool_mask]
[0, None, 9] ---------------- backend: cpu nbytes: 40 B type: 3 * ?int64
This ability to cope with missing data without failing or needing imputation is invaluable in data analysis tasks where missing data is common.