# How to find the best match between two collections using Cartesian (cross) product#

In high energy physics (HEP), `ak.combinations()` is often needed to find particles whose trajectories are close to each other, separately in many high-energy collision events (`axis=1`). In some applications, the two collections that need to be matched are simulated particles and reconstructed versions of those particles (“gen-reco matching”), and in other applications, the two collections are different types of particles, such as electrons and jets.

I’ll describe how to solve such a problem on this page, but avoid domain-specific jargon by casting it as a problem of finding the distance between bunnies and foxes—if a bunny is too close to a fox, it will get eaten!

```import awkward as ak
import numpy as np
```

## Setting up the problem#

In 1000 separate yards (big suburb), there’s a random number of bunnies and a random number of foxes, each with random x, y positions. We’re making ragged arrays of records using `ak.unflatten()` and `ak.zip()`.

```np.random.seed(12345)

number_of_bunnies = np.random.poisson(3.5, 1000)   # average of 3.5 bunnies/yard
number_of_foxes = np.random.poisson(1.5, 1000)     # average of 1.5 foxes/yard

bunny_xy = np.random.normal(0, 1, (number_of_bunnies.sum(), 2))
fox_xy = np.random.normal(0, 1, (number_of_foxes.sum(), 2))

bunnies = ak.unflatten(ak.zip({"x": bunny_xy[:, 0], "y": bunny_xy[:, 1]}), number_of_bunnies)
foxes = ak.unflatten(ak.zip({"x": fox_xy[:, 0], "y": fox_xy[:, 1]}), number_of_foxes)
```
```bunnies
```
```[[{x: -1, y: -0.727}, {x: 0.735, y: -0.594}, {x: -0.588, y: 0.718}],
[{x: -0.449, y: -1.61}, {x: 1.59, y: -0.627}, ..., {...}, {x: 1.49, y: -1.42}],
[{x: 0.441, y: -0.705}, {x: -1.96, y: 0.31}],
[{x: -0.121, y: -0.232}, {x: -0.44, y: -2.03}, ..., {x: 0.0405, y: 1.24}],
[{x: -0.703, y: -0.049}, {x: -0.209, y: ..., ...}, ..., {x: 0.525, y: 0.265}],
[{x: -0.958, y: 0.119}, {x: 0.885, y: -0.266}],
[{x: -1.28, y: 0.235}, {x: 0.625, y: ..., ...}, {x: -0.748, y: -0.336}],
[{x: 0.37, y: 0.582}, {x: -0.594, y: -2.68}, ..., {...}, {x: 0.412, y: 0.32}],
[{x: 0.57, y: 0.342}, {x: -0.416, y: -0.973}],
[{x: -0.0416, y: 0.802}, {x: -0.144, y: ..., ...}, ..., {x: -0.635, y: -1.71}],
...,
[{x: 1.7, y: -0.564}],
[{x: 1.11, y: -0.309}, {x: 0.155, y: -0.458}, ..., {x: 0.352, y: -1.18}],
[{x: -0.534, y: 0.555}, {x: -1.07, y: -1.37}, {x: -0.771, y: 0.699}],
[{x: -0.586, y: 1.86}],
[{x: 0.0432, y: -1.22}, {x: -1.15, y: 0.45}],
[{x: -0.868, y: 1.79}, {x: 0.999, y: -1.16}, ..., {x: -0.877, y: 0.203}],
[{x: 0.73, y: 0.335}, {x: -0.372, y: -0.659}, {x: 0.946, y: -1.1}],
[{x: -0.622, y: -1.26}],
[{x: 1.47, y: 1.5}, {x: 1.94, y: -2.37}]]
--------------------------------------------------------------------------------
type: 1000 * var * {
x: float64,
y: float64
}```
```foxes
```
```[[{x: 0.183, y: -1.17}],
[{x: 0.497, y: 0.761}],
[{x: -1.06, y: 0.0902}],
[{x: -0.106, y: -1.28}],
[{x: 0.338, y: 0.642}],
[{x: -0.833, y: -1.11}, {x: -0.163, y: ..., ...}, {x: -0.548, y: -0.546}],
[{x: 0.89, y: 0.67}, {x: -1.24, y: 0.543}, {...}, {x: 0.211, y: -1.03}],
[{x: 0.411, y: 1.23}, {x: -0.51, y: 0.931}, {...}, {x: 0.953, y: 1.18}],
[{x: -3.47, y: -0.273}],
[{x: -0.553, y: 0.339}],
...,
[{x: 0.856, y: 1.06}, {x: 1.12, y: -0.0269}],
[],
[],
[],
[{x: -1.51, y: 0.912}],
[{x: -0.665, y: -0.175}],
[{x: -0.401, y: -1.2}, {x: -1.33, y: -1.06}],
[],
[{x: -0.0623, y: -0.596}, {x: 1.35, y: ..., ...}, ..., {x: 0.052, y: -0.242}]]
-------------------------------------------------------------------------------
type: 1000 * var * {
x: float64,
y: float64
}```

## Find all combinations#

In each yard, we find all bunny-fox pairs, regardless of whether they’re close or not using `ak.cartesian()`, and then unpacking the pairs with `ak.unzip()`.

```pair_bunnies, pair_foxes = ak.unzip(ak.cartesian([bunnies, foxes]))
```

These two arrays, `pair_bunnies` and `pair_foxes`, have the same type as `bunnies` and `foxes`, but different numbers of items in each list because now they’re paired to match each other. Both kinds of animals are duplicated to enable this match.

```pair_bunnies
```
```[[{x: -1, y: -0.727}, {x: 0.735, y: -0.594}, {x: -0.588, y: 0.718}],
[{x: -0.449, y: -1.61}, {x: 1.59, y: -0.627}, ..., {...}, {x: 1.49, y: -1.42}],
[{x: 0.441, y: -0.705}, {x: -1.96, y: 0.31}],
[{x: -0.121, y: -0.232}, {x: -0.44, y: -2.03}, ..., {x: 0.0405, y: 1.24}],
[{x: -0.703, y: -0.049}, {x: -0.209, y: ..., ...}, ..., {x: 0.525, y: 0.265}],
[{x: -0.958, y: 0.119}, {x: -0.958, y: ..., ...}, ..., {x: 0.885, y: -0.266}],
[{x: -1.28, y: 0.235}, {x: -1.28, y: 0.235}, ..., {x: -0.748, y: -0.336}],
[{x: 0.37, y: 0.582}, {x: 0.37, y: 0.582}, ..., {...}, {x: 0.412, y: 0.32}],
[{x: 0.57, y: 0.342}, {x: -0.416, y: -0.973}],
[{x: -0.0416, y: 0.802}, {x: -0.144, y: ..., ...}, ..., {x: -0.635, y: -1.71}],
...,
[{x: 1.7, y: -0.564}, {x: 1.7, y: -0.564}],
[],
[],
[],
[{x: 0.0432, y: -1.22}, {x: -1.15, y: 0.45}],
[{x: -0.868, y: 1.79}, {x: 0.999, y: -1.16}, ..., {x: -0.877, y: 0.203}],
[{x: 0.73, y: 0.335}, {x: 0.73, y: 0.335}, ..., {...}, {x: 0.946, y: -1.1}],
[],
[{x: 1.47, y: 1.5}, {x: 1.47, y: 1.5}, {...}, ..., {...}, {x: 1.94, y: -2.37}]]
--------------------------------------------------------------------------------
type: 1000 * var * {
x: float64,
y: float64
}```
```pair_foxes
```
```[[{x: 0.183, y: -1.17}, {x: 0.183, y: -1.17}, {x: 0.183, y: -1.17}],
[{x: 0.497, y: 0.761}, {x: 0.497, y: 0.761}, ..., {...}, {x: 0.497, y: 0.761}],
[{x: -1.06, y: 0.0902}, {x: -1.06, y: 0.0902}],
[{x: -0.106, y: -1.28}, {x: -0.106, y: ..., ...}, ..., {x: -0.106, y: -1.28}],
[{x: 0.338, y: 0.642}, {x: 0.338, y: 0.642}, ..., {...}, {x: 0.338, y: 0.642}],
[{x: -0.833, y: -1.11}, {x: -0.163, y: ..., ...}, ..., {x: -0.548, y: -0.546}],
[{x: 0.89, y: 0.67}, {x: -1.24, y: 0.543}, ..., {...}, {x: 0.211, y: -1.03}],
[{x: 0.411, y: 1.23}, {x: -0.51, y: 0.931}, ..., {...}, {x: 0.953, y: 1.18}],
[{x: -3.47, y: -0.273}, {x: -3.47, y: -0.273}],
[{x: -0.553, y: 0.339}, {x: -0.553, y: ..., ...}, ..., {x: -0.553, y: 0.339}],
...,
[{x: 0.856, y: 1.06}, {x: 1.12, y: -0.0269}],
[],
[],
[],
[{x: -1.51, y: 0.912}, {x: -1.51, y: 0.912}],
[{x: -0.665, y: -0.175}, {x: -0.665, ...}, {...}, {x: -0.665, y: -0.175}],
[{x: -0.401, y: -1.2}, {x: -1.33, y: -1.06}, ..., {...}, {x: -1.33, y: -1.06}],
[],
[{x: -0.0623, y: -0.596}, {x: 1.35, y: ..., ...}, ..., {x: 0.052, y: -0.242}]]
--------------------------------------------------------------------------------
type: 1000 * var * {
x: float64,
y: float64
}```

The two arrays have the same list lengths as each other because they came from the same `ak.unzip()`.

```ak.num(pair_bunnies), ak.num(pair_foxes)
```
```(<Array [3, 8, 2, 8, 5, 6, 12, 20, ..., 0, 0, 2, 4, 6, 0, 8] type='1000 * int64'>,
<Array [3, 8, 2, 8, 5, 6, 12, 20, ..., 0, 0, 2, 4, 6, 0, 8] type='1000 * int64'>)
```

## Calculating distances#

Since the arrays have the same shapes, they can be used in the same mathematical formula. Here’s the formula for distance:

```distances = np.sqrt((pair_bunnies.x - pair_foxes.x)**2 + (pair_bunnies.y - pair_foxes.y)**2)
distances
```
```[[1.27, 0.799, 2.04],
[2.55, 1.77, 3.37, 1.57, 0.352, 2, 0.758, 2.39],
[1.7, 0.924],
[1.05, 0.819, 1.38, 3.03, 1.44, 2.22, 2.89, 2.53],
[1.25, 2.61, 0.953, 0.685, 0.421],
[1.24, 0.88, 0.781, 1.92, 1.3, 1.46],
[2.21, 0.31, 1.36, 1.95, 0.656, 1.93, ..., 1.17, 1.92, 1.01, 0.788, 1.18],
[0.645, 0.947, 1.42, 0.834, 4.03, 3.61, ..., 3.12, 0.906, 1.11, 1.48, 1.02],
[4.08, 3.13],
[0.69, 2.84, 0.312, 2.05],
...,
[1.83, 0.789],
[],
[],
[],
[2.64, 0.585],
[1.97, 1.94, 2.51, 0.434],
[1.91, 2.49, 0.545, 1.04, 1.35, 2.28],
[],
[2.6, 0.607, 2.03, 2.25, 2.67, 3.33, 2.26, 2.84]]
-----------------------------------------------------------------------------
type: 1000 * var * float64```

Let’s say that 1 unit is close enough for a bunny to be eaten.

```eaten = (distances < 1)
eaten
```
```[[False, True, False],
[False, False, False, False, True, False, True, False],
[False, True],
[False, True, False, False, False, False, False, False],
[False, False, True, True, True],
[False, True, True, False, False, False],
[False, True, False, False, True, ..., False, False, False, True, False],
[True, True, False, True, False, False, ..., False, True, False, False, False],
[False, False],
[True, False, True, False],
...,
[False, True],
[],
[],
[],
[False, True],
[False, False, False, True],
[False, False, True, False, False, False],
[],
[False, True, False, False, False, False, False, False]]
--------------------------------------------------------------------------------
type: 1000 * var * bool```

This is great (not for the bunnies, but perhaps for the foxes). However, if we want to use this information on the original arrays, we’re stuck: this array has a different shape from the original `bunnies` (and the original `foxes`).

Perhaps the question we really wanted to ask is, “For each bunny, is there any fox that can eat it?”

## Combinations with `nested=True`#

Asking a question about any fox means performing a reducer, `ak.any()`, over lists, one list per bunny. The list would be all of the foxes in its yard. For that, we’ll need to pass `nested=True` to `ak.cartesian()`.

```pair_bunnies, pair_foxes = ak.unzip(ak.cartesian([bunnies, foxes], nested=True))
```

Now `pair_bunnies` and `pair_foxes` are one list-depth deeper than the original `bunnies` and `foxes`.

```pair_bunnies
```
```[[[{x: -1, y: -0.727}], [{x: 0.735, ...}], [{x: -0.588, y: 0.718}]],
[[{x: -0.449, y: -1.61}], [{x: 1.59, ...}], ..., [{x: 1.49, y: -1.42}]],
[[{x: 0.441, y: -0.705}], [{x: -1.96, y: 0.31}]],
[[{x: -0.121, y: -0.232}], [{x: -0.44, ...}], ..., [{x: 0.0405, y: 1.24}]],
[[{x: -0.703, y: -0.049}], [{x: -0.209, ...}], ..., [{x: 0.525, y: 0.265}]],
[[{x: -0.958, y: 0.119}, {x: -0.958, ...}, {x: -0.958, y: 0.119}], [...]],
[[{x: -1.28, y: 0.235}, {x: -1.28, ...}, ..., {x: -1.28, y: 0.235}], ...],
[[{x: 0.37, y: 0.582}, {x: 0.37, y: ..., ...}, ..., {x: 0.37, y: 0.582}], ...],
[[{x: 0.57, y: 0.342}], [{x: -0.416, y: -0.973}]],
[[{x: -0.0416, y: 0.802}], [{x: -0.144, ...}], ..., [{x: -0.635, y: -1.71}]],
...,
[[{x: 1.7, y: -0.564}, {x: 1.7, y: -0.564}]],
[[], [], [], []],
[[], [], []],
[[]],
[[{x: 0.0432, y: -1.22}], [{x: -1.15, y: 0.45}]],
[[{x: -0.868, y: 1.79}], [{x: 0.999, ...}], ..., [{x: -0.877, y: 0.203}]],
[[{x: 0.73, y: 0.335}, {x: 0.73, y: 0.335}], ..., [{x: 0.946, ...}, ...]],
[[]],
[[{x: 1.47, y: 1.5}, {x: 1.47, y: 1.5}, {...}, {x: 1.47, y: 1.5}], [...]]]
--------------------------------------------------------------------------------
type: 1000 * var * var * {
x: float64,
y: float64
}```
```pair_foxes
```
```[[[{x: 0.183, y: -1.17}], [{x: 0.183, ...}], [{x: 0.183, y: -1.17}]],
[[{x: 0.497, y: 0.761}], [{x: 0.497, ...}], ..., [{x: 0.497, y: 0.761}]],
[[{x: -1.06, y: 0.0902}], [{x: -1.06, y: 0.0902}]],
[[{x: -0.106, y: -1.28}], [{x: -0.106, ...}], ..., [{x: -0.106, y: -1.28}]],
[[{x: 0.338, y: 0.642}], [{x: 0.338, ...}], ..., [{x: 0.338, y: 0.642}]],
[[{x: -0.833, y: -1.11}, {x: -0.163, ...}, {x: -0.548, y: -0.546}], ...],
[[{x: 0.89, y: 0.67}, {x: -1.24, ...}, {...}, {x: 0.211, y: -1.03}], ...],
[[{x: 0.411, y: 1.23}, {x: -0.51, ...}, {...}, {x: 0.953, y: 1.18}], ...],
[[{x: -3.47, y: -0.273}], [{x: -3.47, y: -0.273}]],
[[{x: -0.553, y: 0.339}], [{x: -0.553, ...}], ..., [{x: -0.553, y: 0.339}]],
...,
[[{x: 0.856, y: 1.06}, {x: 1.12, y: -0.0269}]],
[[], [], [], []],
[[], [], []],
[[]],
[[{x: -1.51, y: 0.912}], [{x: -1.51, y: 0.912}]],
[[{x: -0.665, y: -0.175}], [{x: -0.665, ...}], ..., [{x: -0.665, y: -0.175}]],
[[{x: -0.401, y: -1.2}, {x: -1.33, y: -1.06}], ..., [{x: -0.401, ...}, ...]],
[[]],
[[{x: -0.0623, y: -0.596}, {x: 1.35, ...}, ..., {x: 0.052, y: -0.242}], ...]]
-------------------------------------------------------------------------------
type: 1000 * var * var * {
x: float64,
y: float64
}```

We can compute `distances` in the same way, though it’s also one list-depth deeper.

```distances = np.sqrt((pair_bunnies.x - pair_foxes.x)**2 + (pair_bunnies.y - pair_foxes.y)**2)
distances
```
```[[[1.27], [0.799], [2.04]],
[[2.55], [1.77], [3.37], [1.57], [0.352], [2], [0.758], [2.39]],
[[1.7], [0.924]],
[[1.05], [0.819], [1.38], [3.03], [1.44], [2.22], [2.89], [2.53]],
[[1.25], [2.61], [0.953], [0.685], [0.421]],
[[1.24, 0.88, 0.781], [1.92, 1.3, 1.46]],
[[2.21, 0.31, 1.36, 1.95], [0.656, ..., 1.17], [1.92, 1.01, 0.788, 1.18]],
[[0.645, 0.947, 1.42, 0.834], [4.03, ...], ..., [0.906, 1.11, 1.48, 1.02]],
[[4.08], [3.13]],
[[0.69], [2.84], [0.312], [2.05]],
...,
[[1.83, 0.789]],
[[], [], [], []],
[[], [], []],
[[]],
[[2.64], [0.585]],
[[1.97], [1.94], [2.51], [0.434]],
[[1.91, 2.49], [0.545, 1.04], [1.35, 2.28]],
[[]],
[[2.6, 0.607, 2.03, 2.25], [2.67, 3.33, 2.26, 2.84]]]
----------------------------------------------------------------------------
type: 1000 * var * var * float64```

Similarly for `eaten`.

```eaten = (distances < 1)
eaten
```
```[[[False], [True], [False]],
[[False], [False], [False], [False], [True], [False], [True], [False]],
[[False], [True]],
[[False], [True], [False], [False], [False], [False], [False], [False]],
[[False], [False], [True], [True], [True]],
[[False, True, True], [False, False, False]],
[[False, True, False, False], [True, ...], [False, False, True, False]],
[[True, True, False, True], [False, ...], ..., [True, False, False, False]],
[[False], [False]],
[[True], [False], [True], [False]],
...,
[[False, True]],
[[], [], [], []],
[[], [], []],
[[]],
[[False], [True]],
[[False], [False], [False], [True]],
[[False, False], [True, False], [False, False]],
[[]],
[[False, True, False, False], [False, False, False, False]]]
-----------------------------------------------------------------------------
type: 1000 * var * var * bool```

Now each inner list of booleans is answering the questions, “Can fox 0 eat me?”, “Can fox 1 eat me?”, …, “Can fox n eat me?” and there are exactly as many of these lists as there are bunnies. Applying `ak.any()` over the innermost lists (`axis=-1`),

```bunny_eaten = ak.any(eaten, axis=-1)
bunny_eaten
```
```[[False, True, False],
[False, False, False, False, True, False, True, False],
[False, True],
[False, True, False, False, False, False, False, False],
[False, False, True, True, True],
[True, False],
[True, True, True],
[True, False, False, False, True],
[False, False],
[True, False, True, False],
...,
[True],
[False, False, False, False],
[False, False, False],
[False],
[False, True],
[False, False, False, True],
[False, True, False],
[False],
[True, False]]
---------------------------------------------------------
type: 1000 * var * bool```

We’ve now answered the question, “Can any fox eat me?” for each bunny. After the mayhem, these are the bunnies we have left:

```bunnies[~bunny_eaten]
```
```[[{x: -1, y: -0.727}, {x: -0.588, y: 0.718}],
[{x: -0.449, y: -1.61}, {x: 1.59, y: -0.627}, ..., {...}, {x: 1.49, y: -1.42}],
[{x: 0.441, y: -0.705}],
[{x: -0.121, y: -0.232}, {x: 0.23, y: 0.0559}, ..., {x: 0.0405, y: 1.24}],
[{x: -0.703, y: -0.049}, {x: -0.209, y: -1.91}],
[{x: 0.885, y: -0.266}],
[],
[{x: -0.594, y: -2.68}, {x: 0.235, y: -1.13}, {x: 0.878, y: -1.94}],
[{x: 0.57, y: 0.342}, {x: -0.416, y: -0.973}],
[{x: -0.144, y: -2.47}, {x: -0.635, y: -1.71}],
...,
[],
[{x: 1.11, y: -0.309}, {x: 0.155, y: -0.458}, ..., {x: 0.352, y: -1.18}],
[{x: -0.534, y: 0.555}, {x: -1.07, y: -1.37}, {x: -0.771, y: 0.699}],
[{x: -0.586, y: 1.86}],
[{x: 0.0432, y: -1.22}],
[{x: -0.868, y: 1.79}, {x: 0.999, y: -1.16}, {x: -0.389, y: 2.32}],
[{x: 0.73, y: 0.335}, {x: 0.946, y: -1.1}],
[{x: -0.622, y: -1.26}],
[{x: 1.94, y: -2.37}]]
--------------------------------------------------------------------------------
type: 1000 * var * {
x: float64,
y: float64
}```

Whereas there was originally an average of 3.5 bunnies per yard, by construction,

```ak.mean(ak.num(bunnies, axis=1))
```
```3.527
```

Now there’s only

```ak.mean(ak.num(bunnies[~bunny_eaten], axis=1))
```
```2.557
```

left.

## Asymmetry in the problem#

The way we performed this calculation was asymmetric: for each bunny, we asked if it was eaten. We could have performed a similar, but different, calculation to ask, which foxes get to eat? To do that, we must reverse the order of arguments because `nested=True` groups from the left.

```pair_foxes, pair_bunnies = ak.unzip(ak.cartesian([foxes, bunnies], nested=True))

distances = np.sqrt((pair_foxes.x - pair_bunnies.x)**2 + (pair_foxes.y - pair_bunnies.y)**2)

eating = (distances < 1)

fox_eats = ak.any(eating, axis=-1)

foxes[fox_eats]
```
```[[{x: 0.183, y: -1.17}],
[{x: 0.497, y: 0.761}],
[{x: -1.06, y: 0.0902}],
[{x: -0.106, y: -1.28}],
[{x: 0.338, y: 0.642}],
[{x: -0.163, y: 0.496}, {x: -0.548, y: -0.546}],
[{x: 0.89, y: 0.67}, {x: -1.24, y: 0.543}, {x: 0.0186, y: -0.154}],
[{x: 0.411, y: 1.23}, {x: -0.51, y: 0.931}, {x: 0.953, y: 1.18}],
[],
[{x: -0.553, y: 0.339}],
...,
[{x: 1.12, y: -0.0269}],
[],
[],
[],
[{x: -1.51, y: 0.912}],
[{x: -0.665, y: -0.175}],
[{x: -0.401, y: -1.2}],
[],
[{x: 1.35, y: 0.905}]]
--------------------------------------------------------------------
type: 1000 * var * {
x: float64,
y: float64
}```