How to concatenate and interleave arrays#

import awkward as ak
import numpy as np
import pandas as pd

Simple concatenation#

ak.concatenate() is an analog of np.concatenate (in fact, you can use np.concatenate where you mean ak.concatenate()). However, it applies to data of arbitrary data structures:

array1 = ak.Array([
    [{"x": 1.1, "y": [1]}, {"x": 2.2, "y": [1, 2]}, {"x": 3.3, "y": [1, 2, 3]}],
    [],
    [{"x": 4.4, "y": [1, 2, 3, 4]}, {"x": 5.5, "y": [1, 2, 3, 4, 5]}],
])
array2 = ak.Array([
    [{"x": 6.6, "y": [1, 2, 3, 4, 5, 6]}],
    [{"x": 7.7, "y": [1, 2, 3, 4, 5, 6, 7]}],
])
ak.concatenate([array1, array2])
[[{x: 1.1, y: [1]}, {x: 2.2, y: [...]}, {x: 3.3, y: [1, 2, 3]}],
 [],
 [{x: 4.4, y: [1, 2, 3, 4]}, {x: 5.5, y: [1, ..., 5]}],
 [{x: 6.6, y: [1, 2, 3, 4, 5, 6]}],
 [{x: 7.7, y: [1, 2, 3, 4, 5, 6, 7]}]]
----------------------------------------------------------------
backend: cpu
nbytes: 392 B
type: 5 * var * {
    x: float64,
    y: var * int64
}

The arrays can even have different data types, in which case the output has union-type.

array3 = ak.Array([{"z": None}, {"z": 0}, {"z": 123}])
ak.concatenate([array1, array2, array3])
[[{x: 1.1, y: [1]}, {x: 2.2, y: [...]}, {x: 3.3, y: [1, 2, 3]}],
 [],
 [{x: 4.4, y: [1, 2, 3, 4]}, {x: 5.5, y: [1, ..., 5]}],
 [{x: 6.6, y: [1, 2, 3, 4, 5, 6]}],
 [{x: 7.7, y: [1, 2, 3, 4, 5, 6, 7]}],
 {z: None},
 {z: 0},
 {z: 123}]
--------------------------------------------------------------------------------------------------------------
backend: cpu
nbytes: 504 B
type: 8 * union[
    var * {
        x: float64,
        y: var * int64
    },
    {
        z: ?int64
    }
]

Keep in mind, however, that some operations can’t deal with union-types (heterogeneous data), so you might want to avoid this.

Interleaving lists with axis > 0#

The default axis=0 returns an array whose length is equal to the sum of the lengths of the input arrays.

Other axis values combine lists within the arrays, as long as the arrays have the same lengths.

array1 = ak.Array([[1.1, 2.2, 3.3], [], [4.4, 5.5]])
array2 = ak.Array([[10, 20], [30], [40, 50, 60, 70]])
len(array1), len(array2)
(3, 3)
ak.concatenate([array1, array2], axis=1)
[[1.1, 2.2, 3.3, 10, 20],
 [30],
 [4.4, 5.5, 40, 50, 60, 70]]
----------------------------
backend: cpu
nbytes: 128 B
type: 3 * var * float64

This can be used in some non-trivial ways: sometimes a problem that doesn’t seem to have anything to do with concatenation can be solved this way.

For instance, suppose that you have to pad some lists so that they start and stop with 0 (for some window-averaging procedure, perhaps). You can make the pad as a new array:

pad = np.zeros(len(array1))[:, np.newaxis]
pad
array([[0.],
       [0.],
       [0.]])

and concatenate it with axis=1 to get the desired effect:

ak.concatenate([pad, array1, pad], axis=1)
[[0, 1.1, 2.2, 3.3, 0],
 [0, 0],
 [0, 4.4, 5.5, 0]]
-----------------------
backend: cpu
nbytes: 120 B
type: 3 * var * float64

Or similarly, to double the first value and double the last value (without affecting empty lists):

ak.concatenate([array1[:, :1], array1, array1[:, -1:]], axis=1)
[[1.1, 1.1, 2.2, 3.3, 3.3],
 [],
 [4.4, 4.4, 5.5, 5.5]]
---------------------------
backend: cpu
nbytes: 104 B
type: 3 * var * float64

The same applies for more deeply nested lists and axis > 1. Remember that axis=-1 starts counting from the innermost dimension, outward.

Emulating NumPy’s “stack” functions#

np.stack, np.hstack, np.vstack, and np.dstack are concatenations with np.newaxis (reshaping to add a dimension of length 1).

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
np.stack([a, b])
array([[1, 2, 3],
       [4, 5, 6]])
np.concatenate([a[np.newaxis], b[np.newaxis]], axis=0)
array([[1, 2, 3],
       [4, 5, 6]])
np.stack([a, b], axis=1)
array([[1, 4],
       [2, 5],
       [3, 6]])
np.concatenate([a[:, np.newaxis], b[:, np.newaxis]], axis=1)
array([[1, 4],
       [2, 5],
       [3, 6]])

Since ak.concatenate() has the same interface as np.concatenate and Awkward Arrays can also be sliced with np.newaxis, they can be stacked the same way, with the addition of arbitrary data structures.

a = ak.Array([[1], [1, 2], [1, 2, 3]])
b = ak.Array([[4], [4, 5], [4, 5, 6]])
ak.concatenate([a[np.newaxis], b[np.newaxis]], axis=0)
[[[1], [1, 2], [1, 2, 3]],
 [[4], [4, 5], [4, 5, 6]]]
--------------------------
backend: cpu
nbytes: 152 B
type: 2 * 3 * var * int64
ak.concatenate([a[:, np.newaxis], b[:, np.newaxis]], axis=1)
[[[1], [4]],
 [[1, 2], [4, 5]],
 [[1, 2, 3], [4, 5, 6]]]
-------------------------
backend: cpu
nbytes: 192 B
type: 3 * 2 * var * int64

Differences from Pandas#

Concatenation in Awkward Array combines arrays lengthwise: by adding the lengths of the arrays or adding the lengths of lists within an array. It does not refer to adding fields to a record (that is, “adding columns to a table”). To add fields to a record, see ak.zip() or ak.Array.__setitem__() in how to zip/unzip and project and how to add fields. This is important to note because pandas.concat does both, depending on its axis argument (and there’s no equivalent in NumPy).

Here’s a table-like example of concatenation in Awkward Array:

array1 = ak.Array({"column": [[1, 2, 3], [], [4, 5]]})
array2 = ak.Array({"column": [[1.1, 2.2, 3.3], [], [4.4, 5.5]]})
array1
[{column: [1, 2, 3]},
 {column: []},
 {column: [4, 5]}]
-------------------------------------
backend: cpu
nbytes: 72 B
type: 3 * {
    column: var * int64
}
array2
[{column: [1.1, 2.2, 3.3]},
 {column: []},
 {column: [4.4, 5.5]}]
---------------------------------------
backend: cpu
nbytes: 72 B
type: 3 * {
    column: var * float64
}
ak.concatenate([array1, array2], axis=0)
[{column: [1, 2, 3]},
 {column: []},
 {column: [4, 5]},
 {column: [1.1, 2.2, 3.3]},
 {column: []},
 {column: [4.4, 5.5]}]
---------------------------------------
backend: cpu
nbytes: 136 B
type: 6 * {
    column: var * float64
}

This is like Pandas for axis=0,

df1 = pd.DataFrame({"column": [[1, 2, 3], [], [4, 5]]})
df2 = pd.DataFrame({"column": [[1.1, 2.2, 3.3], [], [4.4, 5.5]]})
df1
column
0 [1, 2, 3]
1 []
2 [4, 5]
df2
column
0 [1.1, 2.2, 3.3]
1 []
2 [4.4, 5.5]
pd.concat([df1, df2], axis=0)
column
0 [1, 2, 3]
1 []
2 [4, 5]
0 [1.1, 2.2, 3.3]
1 []
2 [4.4, 5.5]

But for axis=1, they’re quite different:

ak.concatenate([array1, array2], axis=1)
[{column: [1, 2, 3, 1.1, 2.2, 3.3]},
 {column: []},
 {column: [4, 5, 4.4, 5.5]}]
---------------------------------------
backend: cpu
nbytes: 112 B
type: 3 * {
    column: var * float64
}
pd.concat([df1, df2], axis=1)
column column
0 [1, 2, 3] [1.1, 2.2, 3.3]
1 [] []
2 [4, 5] [4.4, 5.5]

ak.concatenate() accepts any axis less than the number of dimensions in the arrays, but Pandas has only two choices, axis=0 and axis=1.

Fields (“columns”) of an Awkward Array are unrelated to array dimensions. If you want what pandas.concat does with axis=1, you would use ak.zip():

ak.zip({"column1": array1.column, "column2": array2.column}, depth_limit=1)
[{column1: [1, 2, 3], column2: [1.1, 2.2, 3.3]},
 {column1: [], column2: []},
 {column1: [4, 5], column2: [4.4, 5.5]}]
------------------------------------------------------------------
backend: cpu
nbytes: 144 B
type: 3 * {
    column1: var * int64,
    column2: var * float64
}

The depth_limit prevents ak.zip() from interleaving the lists further:

ak.zip({"column1": array1.column, "column2": array2.column})
[[{column1: 1, column2: 1.1}, {...}, {column1: 3, column2: 3.3}],
 [],
 [{column1: 4, column2: 4.4}, {column1: 5, column2: 5.5}]]
-----------------------------------------------------------------
backend: cpu
nbytes: 112 B
type: 3 * var * {
    column1: int64,
    column2: float64
}

which Pandas doesn’t do because lists in Pandas cells are Python objects that it doesn’t modify.