How to examine an array with simple slicing#

Slicing data from an array is a basic operation in array-oriented data analysis. Awkward Array extends NumPy’s slicing capabilities to handle nested and ragged data structures. This tutorial illustrates several ways to slice an array.

For a complete list of slicing features, see ak.Array.__getitem__().

import awkward as ak
import numpy as np

Basic slicing with ranges#

Much like NumPy, you can slice Awkward Arrays using simple ranges specified with colons (start:stop:step). Here’s an example of a regular (non-ragged) Awkward Array:

array = ak.Array(np.arange(10)**2)  # squaring numbers for clarity
array
[0,
 1,
 4,
 9,
 16,
 25,
 36,
 49,
 64,
 81]
----------------
backend: cpu
nbytes: 80 B
type: 10 * int64

To select the first five elements:

array[:5]
[0,
 1,
 4,
 9,
 16]
---------------
backend: cpu
nbytes: 40 B
type: 5 * int64

To select from the fifth-to-last onward:

array[-5:]
[25,
 36,
 49,
 64,
 81]
---------------
backend: cpu
nbytes: 40 B
type: 5 * int64

To select every other element starting from the second:

array[1::2]
[1,
 9,
 25,
 49,
 81]
---------------
backend: cpu
nbytes: 40 B
type: 5 * int64

Multiple ranges for multiple dimensions#

Similarly, for multidimensional data,

np_array3d = np.arange(2*3*5).reshape(2, 3, 5)
np_array3d
array([[[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14]],

       [[15, 16, 17, 18, 19],
        [20, 21, 22, 23, 24],
        [25, 26, 27, 28, 29]]])
array3d = ak.Array(np_array3d)
array3d
[[[0, 1, 2, 3, 4], [5, 6, 7, 8, 9], [10, 11, 12, 13, 14]],
 [[15, 16, 17, 18, 19], [20, 21, 22, 23, 24], [25, 26, 27, 28, 29]]]
--------------------------------------------------------------------
backend: cpu
nbytes: 240 B
type: 2 * 3 * 5 * int64
np_array3d[1, ::2, 1:-1]
array([[16, 17, 18],
       [26, 27, 28]])
array3d[1, ::2, 1:-1]
[[16, 17, 18],
 [26, 27, 28]]
-------------------
backend: cpu
nbytes: 48 B
type: 2 * 3 * int64

Just as with NumPy, a single colon (:) means “take everything from this dimension” and an ellipsis (...) expands to all dimensions between two slices.

array3d[:, :, 1:-1]
[[[1, 2, 3], [6, 7, 8], [11, 12, 13]],
 [[16, 17, 18], [21, 22, 23], [26, 27, 28]]]
--------------------------------------------
backend: cpu
nbytes: 144 B
type: 2 * 3 * 3 * int64
array3d[..., 1:-1]
[[[1, 2, 3], [6, 7, 8], [11, 12, 13]],
 [[16, 17, 18], [21, 22, 23], [26, 27, 28]]]
--------------------------------------------
backend: cpu
nbytes: 144 B
type: 2 * 3 * 3 * int64

Boolean array slices#

Like NumPy’s advanced slicing, an array of booleans filters individual items. For instance, consider an array of booleans constructed by asking which elements of array are greater than 20:

array > 20
[False,
 False,
 False,
 False,
 False,
 True,
 True,
 True,
 True,
 True]
---------------
backend: cpu
nbytes: 10 B
type: 10 * bool

When applied to array between square brackets, the boolean array eliminates all items in which array > 20 is False:

array[array > 20]
[25,
 36,
 49,
 64,
 81]
---------------
backend: cpu
nbytes: 40 B
type: 5 * int64

Boolean array slicing is more powerful than range slicing because the True and False values may have any pattern. The following selects only even numbers.

array % 2 == 0
[True,
 False,
 True,
 False,
 True,
 False,
 True,
 False,
 True,
 False]
---------------
backend: cpu
nbytes: 10 B
type: 10 * bool
array[array % 2 == 0]
[0,
 4,
 16,
 36,
 64]
---------------
backend: cpu
nbytes: 40 B
type: 5 * int64

Integer array slices#

You can also use arrays of integer indices to select specific elements.

indices = ak.Array([2, 5, 3])
array[indices]
[4,
 25,
 9]
---------------
backend: cpu
nbytes: 24 B
type: 3 * int64

If you are passing indexes directly between the array’s square brackets, be sure that they, too, are nested within square brackets (to be a list, rather than a tuple).

array[[2, 5, 3]]
[4,
 25,
 9]
---------------
backend: cpu
nbytes: 24 B
type: 3 * int64

In addition to picking elements out of order, you can pick the same element multiple times.

array[[2, 5, 5, 5, 5, 5, 3]]
[4,
 25,
 25,
 25,
 25,
 25,
 9]
---------------
backend: cpu
nbytes: 56 B
type: 7 * int64

Any slices that could be performed by boolean arrays can be performed by integer arrays, but only integer arrays can reorder and duplicate elements.

Ragged array slicing#

One of the unique features of Awkward Array is its ability to handle ragged arrays efficiently. Here’s an example of a ragged array:

ragged_array = ak.Array([[10, 20, 30], [40], [], [50, 60]])
ragged_array
[[10, 20, 30],
 [40],
 [],
 [50, 60]]
---------------------
backend: cpu
nbytes: 88 B
type: 4 * var * int64

You can slice individual sublists like this:

ragged_array[1]
[40]
---------------
backend: cpu
nbytes: 8 B
type: 1 * int64

And you can perform slices that operate across the sublists:

ragged_array[:, :2]  # get first two elements of each sublist
[[10, 20],
 [40],
 [],
 [50, 60]]
---------------------
backend: cpu
nbytes: 80 B
type: 4 * var * int64

Ranges and single indices mixed with slice notation allow the complexity of ragged slicing to express selecting ranges in nested lists, a feature unique to Awkward Array beyond NumPy’s capabilities. Here’s an example where we skip the first element of each sublist that has more than one element:

ragged_array[ak.num(ragged_array) > 1, 1:]
[[20, 30],
 [60]]
---------------------
backend: cpu
nbytes: 48 B
type: 2 * var * int64

Boolean array slicing with missing data#

When working with boolean arrays for slicing, the arrays can include None (missing) values. Awkward Array handles missing data gracefully during boolean slicing:

bool_mask = ak.Array([True, None, False, True])
array[bool_mask]
[0,
 None,
 9]
----------------
backend: cpu
nbytes: 40 B
type: 3 * ?int64

This ability to cope with missing data without failing or needing imputation is invaluable in data analysis tasks where missing data is common.