# Min/max/sort one array by another#

A common task in data analysis is to select items from one array that minimizes or maximizes another, or to sort one array by the values of another.

```import awkward as ak
```

## Naive attempt goes wrong#

For instance, in

```data = ak.Array([
[
{"title": "zero", "x": 0, "y": 0},
{"title": "two", "x": 2, "y": 2.2},
{"title": "one", "x": 1, "y": 1.1},
],
[],
[
{"title": "four", "x": 4, "y": 4.4},
{"title": "three", "x": 3, "y": 3.3},
],
[
{"title": "five", "x": 5, "y": 5.5},
],
[
{"title": "eight", "x": 8, "y": 8.8},
{"title": "six", "x": 6, "y": 6.6},
{"title": "nine", "x": 9, "y": 9.9},
{"title": "seven", "x": 7, "y": 7.7},
],
])
```

you may want to score each record with a computed value, such as `x**2 + y**2`, and then select the record with the highest score from each list.

```score = data.x**2 + data.y**2
score
```
```[[0, 8.84, 2.21],
[],
[35.4, 19.9],
[55.2],
[141, 79.6, 179, 108]]
-----------------------
type: 5 * var * float64```

At first, it would seem that `ak.argmax()` is what you need to identify the item with the highest score from each list and select it from `data`.

```best_index = ak.argmax(score, axis=1)
best_index
```
```[1,
None,
0,
0,
2]
----------------
type: 5 * ?int64```

However, if you attempt to slice the `data` with this, you’ll either get an indexing error or lists instead of records:

```data[best_index]
```
```[[],
None,
[{title: 'zero', x: 0, y: 0}, {...}, {title: 'one', x: 1, y: 1.1}],
[{title: 'zero', x: 0, y: 0}, {...}, {title: 'one', x: 1, y: 1.1}],
[{title: 'four', x: 4, y: 4.4}, {title: 'three', x: 3, y: 3.3}]]
--------------------------------------------------------------------
type: 5 * option[var * {
title: string,
x: int64,
y: float64
}]```

## What happend?#

Following the logic for reducers, the `ak.argmin()` function returns an array with one fewer dimension than the input: the `data` is an array of lists of records, but `best_index` is an array of integers. We want an array of lists of integers.

The `keepdims=True` parameter can ensure that the output has the same number of dimensions as the input:

```best_index = ak.argmax(score, axis=1, keepdims=True)
best_index
```
```[[1],
[None],
[0],
[0],
[2]]
--------------------
type: 5 * 1 * ?int64```

Now these integers are at the same level of depth as the records that we want to select:

```result = data[best_index]
result
```
```[[{title: 'two', x: 2, y: 2.2}],
[None],
[{title: 'four', x: 4, y: 4.4}],
[{title: 'five', x: 5, y: 5.5}],
[{title: 'nine', x: 9, y: 9.9}]]
---------------------------------
type: 5 * var * ?{
title: string,
x: int64,
y: float64
}```

In the above, each length-1 list contains the record with the highest `score`. Even the empty list, for which the `ak.argmax()` is missing (`None`), is now a length-1 list containing `None`. We can remove this length-1 list structure with a slice:

```result[:, 0]
```
```[{title: 'two', x: 2, y: 2.2},
None,
{title: 'four', x: 4, y: 4.4},
{title: 'five', x: 5, y: 5.5},
{title: 'nine', x: 9, y: 9.9}]
-------------------------------
type: 5 * ?{
title: string,
x: int64,
y: float64
}```

To summarize this as a handy idiom, the way to get the record with maximum `data.x**2 + data.y**2` from an array of lists of records named `data` is

```data[ak.argmax(data.x**2 + data.y**2, axis=1, keepdims=True)][:, 0]
```
```[{title: 'two', x: 2, y: 2.2},
None,
{title: 'four', x: 4, y: 4.4},
{title: 'five', x: 5, y: 5.5},
{title: 'nine', x: 9, y: 9.9}]
-------------------------------
type: 5 * ?{
title: string,
x: int64,
y: float64
}```

For an array of lists of lists of records, `axis=2` and the final slice would be `[:, :, 0]`, and so on.

## Sorting by another array#

In addition to selecting items corresponding to the minimum or maximum of some other array, we may want to sort by another array. Just as `ak.argmin()` and `ak.argmax()` are the functions that would convey indexes from one array to another, `ak.argsort()` conveys sorted indexes from one array to another array. However, `ak.argsort()` always maintains the total number of dimensions, so we don’t need to worry about `keepdims`.

```sorted_indexes = ak.argsort(score)
sorted_indexes
```
```[[0, 2, 1],
[],
[1, 0],
[0],
[1, 3, 0, 2]]
---------------------
type: 5 * var * int64```
```data[sorted_indexes]
```
```[[{title: 'zero', x: 0, y: 0}, {...}, {title: 'two', x: 2, y: 2.2}],
[],
[{title: 'three', x: 3, y: 3.3}, {title: 'four', x: 4, y: 4.4}],
[{title: 'five', x: 5, y: 5.5}],
[{title: 'six', x: 6, y: 6.6}, {...}, ..., {title: 'nine', x: 9, y: 9.9}]]
---------------------------------------------------------------------------
type: 5 * var * {
title: string,
x: int64,
y: float64
}```

This sorted data has the same type as `data`:

```data.type.show()
```
```5 * var * {
title: string,
x: int64,
y: float64
}
```

It’s exactly what we want. `ak.argsort()` is easier to use than `ak.argmin()` and `ak.argmax()`.

## Getting the top n items#

The `ak.min()`, `ak.max()`, `ak.argmin()`, and `ak.argmax()` functions select one extreme value. If you want the top n items (with n ≠ 1), you can use `ak.sort()` or `ak.argsort()`, followed by a slice:

```top2 = data[ak.argsort(score)][:, :2]
top2
```
```[[{title: 'zero', x: 0, y: 0}, {title: 'one', x: 1, y: 1.1}],
[],
[{title: 'three', x: 3, y: 3.3}, {title: 'four', x: 4, y: 4.4}],
[{title: 'five', x: 5, y: 5.5}],
[{title: 'six', x: 6, y: 6.6}, {title: 'seven', x: 7, y: 7.7}]]
-----------------------------------------------------------------
type: 5 * var * {
title: string,
x: int64,
y: float64
}```

Notice, though, that not all of these lists have length 2. The lists with 0 or 1 input items have 0 or 1 output items: these lists have up to length 2. That may be fine, but the example with `ak.argmax()`, above, resulted in `None` for an empty list. We could emulate that with `ak.pad_none()`.

```padded = ak.pad_none(top2, 2, axis=1)
```
```[[{title: 'zero', x: 0, y: 0}, {title: 'one', x: 1, y: 1.1}],
[None, None],
[{title: 'three', x: 3, y: 3.3}, {title: 'four', x: 4, y: 4.4}],
[{title: 'five', x: 5, y: 5.5}, None],
[{title: 'six', x: 6, y: 6.6}, {title: 'seven', x: 7, y: 7.7}]]
-----------------------------------------------------------------
type: 5 * var * ?{
title: string,
x: int64,
y: float64
}```

The data type still says “`var *`”, meaning that the lists are allowed to be variable-length, even though they happen to all have length 2. At this point, we might not care because that’s all we need in order to convert these fields into NumPy arrays (e.g. for some machine learning process):

```ak.to_numpy(padded.x)
```
```masked_array(
data=[[0, 1],
[--, --],
[3, 4],
[5, --],
[6, 7]],
[ True,  True],
[False, False],
[False,  True],
[False, False]],
fill_value=999999)
```
```ak.to_numpy(padded.y)
```
```masked_array(
data=[[0.0, 1.1],
[--, --],
[3.3, 4.4],
[5.5, --],
[6.6, 7.7]],
[ True,  True],
[False, False],
[False,  True],
[False, False]],
fill_value=1e+20)
```

Or we might want to force the data type to ensure that the lists have length 2, using `ak.to_regular()`, `ak.enforce_type()`, or just by passing `clip=True` in the original `ak.pad_none()`.

```ak.to_regular(padded, axis=1)
```
```[[{title: 'zero', x: 0, y: 0}, {title: 'one', x: 1, y: 1.1}],
[None, None],
[{title: 'three', x: 3, y: 3.3}, {title: 'four', x: 4, y: 4.4}],
[{title: 'five', x: 5, y: 5.5}, None],
[{title: 'six', x: 6, y: 6.6}, {title: 'seven', x: 7, y: 7.7}]]
-----------------------------------------------------------------
type: 5 * 2 * ?{
title: string,
x: int64,
y: float64
}```

(Now the list lengths are “`2 *`”, rather than “`var *`”.)