ak.cartesian#
Defined in awkward.operations.ak_cartesian on line 16.
- ak.cartesian(arrays, axis=1, *, nested=None, parameters=None, with_name=None, highlevel=True, behavior=None)#
- Parameters
arrays (dict or iterable of arrays) – Each value in this dict or iterable can be any array-like data that
ak.to_layout
recognizes.axis (int) – The dimension at which this operation is applied. The outermost dimension is
0
, followed by1
, etc., and negative values count backward from the innermost:-1
is the innermost dimension,-2
is the next level up, etc.nested (None, True, False, or iterable of str or int) – If None or False, all combinations of elements from the
arrays
are produced at the same level of nesting; if True, they are grouped in nested lists by combinations that share a common item from each of thearrays
; if an iterable of str or int, group common items for a chosen set of keys from thearray
dict or integer slots of thearray
iterable.parameters (None or dict) – Parameters for the new
ak.contents.RecordArray
node that is created by this operation.with_name (None or str) – Assigns a
"__record__"
name to the newak.contents.RecordArray
node that is created by this operation (overridingparameters
, if necessary).highlevel (bool) – If True, return an
ak.Array
; otherwise, return a low-levelak.contents.Content
subclass.behavior (None or dict) – Custom
ak.behavior
for the output array, if high-level.
Computes a Cartesian product (i.e. cross product) of data from a set of
arrays
. This operation creates records (if arrays
is a dict) or tuples
(if arrays
is another kind of iterable) that hold the combinations
of elements, and it can introduce new levels of nesting.
As a simple example with axis=0
, the Cartesian product of
>>> one = ak.Array([1, 2, 3])
>>> two = ak.Array(["a", "b"])
is
>>> ak.cartesian([one, two], axis=0).show()
[(1, 'a'),
(1, 'b'),
(2, 'a'),
(2, 'b'),
(3, 'a'),
(3, 'b')]
With nesting, a new level of nested lists is created to group combinations
that share the same element from one
into the same list.
>>> ak.cartesian([one, two], axis=0, nested=True).show()
[[(1, 'a'), (1, 'b')],
[(2, 'a'), (2, 'b')],
[(3, 'a'), (3, 'b')]]
The primary purpose of this function, however, is to compute a different
Cartesian product for each element of an array: in other words, axis=1
.
The following arrays each have four elements.
>>> one = ak.Array([[1, 2, 3], [], [4, 5], [6]])
>>> two = ak.Array([["a", "b"], ["c"], ["d"], ["e", "f"]])
The default axis=1
produces 6 pairs from the Cartesian product of
[1, 2, 3]
and ["a", "b"]
, 0 pairs from []
and ["c"]
, 1 pair from
[4, 5]
and ["d"]
, and 1 pair from [6]
and ["e", "f"]
.
>>> ak.cartesian([one, two]).show()
[[(1, 'a'), (1, 'b'), (2, 'a'), (2, 'b'), (3, 'a'), (3, 'b')],
[],
[(4, 'd'), (5, 'd')],
[(6, 'e'), (6, 'f')]]
The nesting depth is the same as the original arrays; with nested=True
,
the nesting depth is increased by 1 and tuples are grouped by their
first element.
>>> ak.cartesian([one, two], nested=True).show()
[[[(1, 'a'), (1, 'b')], [(2, 'a'), (2, ...)], [(3, 'a'), (3, 'b')]],
[],
[[(4, 'd')], [(5, 'd')]],
[[(6, 'e'), (6, 'f')]]]
These tuples are ak.contents.RecordArray
nodes with unnamed fields. To
name the fields, we can pass one
and two
in a dict, rather than a list.
>>> ak.cartesian({"x": one, "y": two}).show()
[[{x: 1, y: 'a'}, {x: 1, y: 'b'}, {...}, ..., {x: 3, y: 'a'}, {x: 3, y: 'b'}],
[],
[{x: 4, y: 'd'}, {x: 5, y: 'd'}],
[{x: 6, y: 'e'}, {x: 6, y: 'f'}]]
With more than two elements in the Cartesian product, nested
can specify
which are grouped and which are not. For example,
>>> one = ak.Array([1, 2, 3, 4])
>>> two = ak.Array([1.1, 2.2, 3.3])
>>> three = ak.Array(["a", "b"])
can be left entirely ungrouped:
>>> ak.cartesian([one, two, three], axis=0).show()
[(1, 1.1, 'a'),
(1, 1.1, 'b'),
(1, 2.2, 'a'),
(1, 2.2, 'b'),
(1, 3.3, 'a'),
(1, 3.3, 'b'),
(2, 1.1, 'a'),
(2, 1.1, 'b'),
(2, 2.2, 'a'),
(2, 2.2, 'b'),
...,
(3, 2.2, 'b'),
(3, 3.3, 'a'),
(3, 3.3, 'b'),
(4, 1.1, 'a'),
(4, 1.1, 'b'),
(4, 2.2, 'a'),
(4, 2.2, 'b'),
(4, 3.3, 'a'),
(4, 3.3, 'b')]
can be grouped by one
(adding 1 more dimension):
>>> ak.cartesian([one, two, three], axis=0, nested=[0]).show()
[[(1, 1.1, 'a'), (1, 1.1, 'b'), (1, 2.2, 'a')],
[(1, 2.2, 'b'), (1, 3.3, 'a'), (1, 3.3, 'b')],
[(2, 1.1, 'a'), (2, 1.1, 'b'), (2, 2.2, 'a')],
[(2, 2.2, 'b'), (2, 3.3, 'a'), (2, 3.3, 'b')],
[(3, 1.1, 'a'), (3, 1.1, 'b'), (3, 2.2, 'a')],
[(3, 2.2, 'b'), (3, 3.3, 'a'), (3, 3.3, 'b')],
[(4, 1.1, 'a'), (4, 1.1, 'b'), (4, 2.2, 'a')],
[(4, 2.2, 'b'), (4, 3.3, 'a'), (4, 3.3, 'b')]]
can be grouped by one
and two
(adding 2 more dimensions):
>>> ak.cartesian([one, two, three], axis=0, nested=[0, 1]).show()
[[[(1, 1.1, 'a'), (1, 1.1, 'b')], [...], [(1, 3.3, 'a'), (1, 3.3, ...)]],
[[(2, 1.1, 'a'), (2, 1.1, 'b')], [...], [(2, 3.3, 'a'), (2, 3.3, ...)]],
[[(3, 1.1, 'a'), (3, 1.1, 'b')], [...], [(3, 3.3, 'a'), (3, 3.3, ...)]],
[[(4, 1.1, 'a'), (4, 1.1, 'b')], [...], [(4, 3.3, 'a'), (4, 3.3, ...)]]]
or grouped by unique one
-two
pairs (adding 1 more dimension):
>>> ak.cartesian([one, two, three], axis=0, nested=[1]).show()
[[(1, 1.1, 'a'), (1, 1.1, 'b')],
[(1, 2.2, 'a'), (1, 2.2, 'b')],
[(1, 3.3, 'a'), (1, 3.3, 'b')],
[(2, 1.1, 'a'), (2, 1.1, 'b')],
[(2, 2.2, 'a'), (2, 2.2, 'b')],
[(2, 3.3, 'a'), (2, 3.3, 'b')],
[(3, 1.1, 'a'), (3, 1.1, 'b')],
[(3, 2.2, 'a'), (3, 2.2, 'b')],
[(3, 3.3, 'a'), (3, 3.3, 'b')],
[(4, 1.1, 'a'), (4, 1.1, 'b')],
[(4, 2.2, 'a'), (4, 2.2, 'b')],
[(4, 3.3, 'a'), (4, 3.3, 'b')]]
The order of the output is fixed: it is always lexicographical in the
order that the arrays
are written.
To emulate an SQL or Pandas “group by” operation, put the keys that you
wish to group by first and use nested=[0]
or nested=[n]
to group by
unique n-tuples. If necessary, record keys can later be reordered with a
list of strings in ak.Array.__getitem__
.
To get list index positions in the tuples/records, rather than data from
the original arrays
, use ak.argcartesian
instead of ak.cartesian
. The
ak.argcartesian
form can be particularly useful as nested indexing in
ak.Array.__getitem__
.