How to create arrays of lists#

import awkward as ak
import numpy as np

From Python lists#

If you have a collection of Python lists, the easiest way to turn them into an Awkward Array is to pass them to the ak.Array constructor, which recognizes a non-dict, non-NumPy iterable and calls ak.from_iter().

python_lists = [[1, 2, 3], [], [4, 5], [6], [7, 8, 9, 10]]
python_lists
[[1, 2, 3], [], [4, 5], [6], [7, 8, 9, 10]]
awkward_array = ak.Array(python_lists)
awkward_array
[[1, 2, 3],
 [],
 [4, 5],
 [6],
 [7, 8, 9, 10]]
---------------------
backend: cpu
nbytes: 128 B
type: 5 * var * int64

The lists of lists can be arbitrarily deep.

python_lists = [[[[], [1, 2, 3]]], [[[4, 5]]], []]
python_lists
[[[[], [1, 2, 3]]], [[[4, 5]]], []]
awkward_array = ak.Array(python_lists)
awkward_array
[[[[], [1, 2, 3]]],
 [[[4, 5]]],
 []]
---------------------------------
backend: cpu
nbytes: 128 B
type: 3 * var * var * var * int64

The “var *” in the type string indicates nested levels of variable-length lists. This is an array of lists of lists of lists of integers.

The advantage of the Awkward Array is that the numerical data are now all in one array buffer and calculations are vectorized across the array, such as NumPy universal functions.

np.sqrt(awkward_array)
[[[[], [1, 1.41, 1.73]]],
 [[[2, 2.24]]],
 []]
-----------------------------------
backend: cpu
nbytes: 128 B
type: 3 * var * var * var * float64

Unlike Python lists, arrays consist of a homogeneous type. A Python list wouldn’t notice if numerical data were given at two different levels of nesting, but that’s a big difference to an Awkward Array.

union_array = ak.Array([[[[], [1, 2, 3]]], [[4, 5]], []])
union_array
[[[[], [1, 2, 3]]],
 [[4, 5]],
 []]
---------------------------------------------------------
backend: cpu
nbytes: 156 B
type: 3 * var * var * union[
    var * int64,
    int64
]

In this example, the data type is a “union” of two levels deep and three levels deep.

union_array.type
ArrayType(ListType(ListType(UnionType([ListType(NumpyType('int64')), NumpyType('int64')]))), 3, None)

Some operations are possible with union arrays, but not all. (Iteration in Numba is one such example.)

From NumPy arrays#

The ak.Array constructor loads NumPy arrays differently from Python lists. The inner dimensions of a NumPy array are guaranteed to have the same lengths, so they are interpreted as a fixed-length list type.

numpy_array = np.arange(2 * 3 * 5).reshape(2, 3, 5)
numpy_array
array([[[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14]],

       [[15, 16, 17, 18, 19],
        [20, 21, 22, 23, 24],
        [25, 26, 27, 28, 29]]])
regular_array = ak.Array(numpy_array)
regular_array
[[[0, 1, 2, 3, 4], [5, 6, 7, 8, 9], [10, 11, 12, 13, 14]],
 [[15, 16, 17, 18, 19], [20, 21, 22, 23, 24], [25, 26, 27, 28, 29]]]
--------------------------------------------------------------------
backend: cpu
nbytes: 240 B
type: 2 * 3 * 5 * int64

The type in this case has no “var *” in it, only “2 *”, “3 *”, and “5 *”. It’s a length-2 array of length-3 lists containing length-5 lists of integers.

Furthermore, if NumPy arrays are nested within Python lists (or other iterables), they’ll be treated as variable-length (”var *”) because there’s no guarantee at the start of a sequence that all NumPy arrays in the sequence will have the same shape.

numpy_arrays = [
    np.arange(3 * 5).reshape(3, 5),
    np.arange(3 * 5, 2 * 3 * 5).reshape(3, 5),
]
numpy_arrays
[array([[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14]]),
 array([[15, 16, 17, 18, 19],
        [20, 21, 22, 23, 24],
        [25, 26, 27, 28, 29]])]
irregular_array = ak.Array(numpy_arrays)
irregular_array
[[[0, 1, 2, 3, 4], [5, 6, 7, 8, 9], [10, 11, 12, 13, 14]],
 [[15, 16, 17, 18, 19], [20, 21, 22, 23, 24], [25, 26, 27, 28, 29]]]
--------------------------------------------------------------------
backend: cpu
nbytes: 320 B
type: 2 * var * var * int64

Both regular_array and irregular_array have the same data values:

regular_array.to_list() == irregular_array.to_list()
True

but they have different types:

regular_array.type, irregular_array.type
(ArrayType(RegularType(RegularType(NumpyType('int64'), 5), 3), 2, None),
 ArrayType(ListType(ListType(NumpyType('int64'))), 2, None))

This can make a difference in some operations, such as broadcasting.

If you want more control over this, use the explicit ak.from_iter() and ak.from_numpy() functions instead of the general-purpose ak.Array constructor.

Unflattening#

Another difference between ak.from_iter() and ak.from_numpy() is that iteration over Python lists is slow and necessarily copies the data, whereas ingesting a NumPy array is zero-copy. (You can see that it’s zero copy by changing the data in-place.)

In some cases, list-making can be vectorized. If you have a flat NumPy array of data and an array of “counts” that add up to the length of the data, then you can ak.unflatten() it.

data = np.array([1, 2, 3, 4, 5, 6, 7, 8])
counts = np.array([3, 0, 1, 4])

unflattened = ak.unflatten(data, counts)
unflattened
[[1, 2, 3],
 [],
 [4],
 [5, 6, 7, 8]]
---------------------
backend: cpu
nbytes: 104 B
type: 4 * var * int64

The first list has length 3, the second has length 0, the third has length 1, and the last has length 4. This is close to Awkward Array’s internal representation of variable-length lists, so it can be performed quickly.

This function is named ak.unflatten() because it has the opposite effect as ak.flatten() and ak.num():

ak.flatten(unflattened)
[1,
 2,
 3,
 4,
 5,
 6,
 7,
 8]
---------------
backend: cpu
nbytes: 64 B
type: 8 * int64
ak.num(unflattened)
[3,
 0,
 1,
 4]
---------------
backend: cpu
nbytes: 32 B
type: 4 * int64

With ArrayBuilder#

ak.ArrayBuilder is described in more detail in this tutorial, but you can also construct arrays of lists using the begin_list/end_list methods or the list context manager.

(This is what ak.from_iter() uses internally to accumulate lists.)

builder = ak.ArrayBuilder()

builder.begin_list()
builder.append(1)
builder.append(2)
builder.append(3)
builder.end_list()

builder.begin_list()
builder.end_list()

builder.begin_list()
builder.append(4)
builder.append(5)
builder.end_list()

array = builder.snapshot()
array
[[1, 2, 3],
 [],
 [4, 5]]
---------------------
backend: cpu
nbytes: 72 B
type: 3 * var * int64
builder = ak.ArrayBuilder()

with builder.list():
    builder.append(1)
    builder.append(2)
    builder.append(3)

with builder.list():
    pass

with builder.list():
    builder.append(4)
    builder.append(5)

array = builder.snapshot()
array
[[1, 2, 3],
 [],
 [4, 5]]
---------------------
backend: cpu
nbytes: 72 B
type: 3 * var * int64

In Numba#

Functions that Numba Just-In-Time (JIT) compiles can use ak.ArrayBuilder or construct flat data and “counts” arrays for ak.unflatten().

(At this time, Numba can’t use context managers, the with statement, in fully compiled code. ak.ArrayBuilder can’t be constructed or converted to an array using snapshot inside a JIT-compiled function, but can be outside the compiled context. Similarly, ak.* functions like ak.unflatten() can’t be called inside a JIT-compiled function, but can be outside.)

import numba as nb
@nb.jit
def append_list(builder, start, stop):
    builder.begin_list()
    for x in range(start, stop):
        builder.append(x)
    builder.end_list()


@nb.jit
def example(builder):
    append_list(builder, 1, 4)
    append_list(builder, 999, 999)
    append_list(builder, 4, 6)
    return builder


builder = example(ak.ArrayBuilder())

array = builder.snapshot()
array
[[1, 2, 3],
 [],
 [4, 5]]
---------------------
backend: cpu
nbytes: 72 B
type: 3 * var * int64
@nb.jit
def append_list(i, data, j, counts, start, stop):
    for x in range(start, stop):
        data[i] = x
        i += 1
    counts[j] = stop - start
    j += 1
    return i, j


@nb.jit
def example():
    data = np.empty(5, np.int64)
    counts = np.empty(3, np.int64)
    i, j = 0, 0
    i, j = append_list(i, data, j, counts, 1, 4)
    i, j = append_list(i, data, j, counts, 999, 999)
    i, j = append_list(i, data, j, counts, 4, 6)
    return data, counts


data, counts = example()

array = ak.unflatten(data, counts)
array
[[1, 2, 3],
 [],
 [4, 5]]
---------------------
backend: cpu
nbytes: 72 B
type: 3 * var * int64