# How to compute statistics on dimensions (mean/var/std)#

Awkward Array provides several functions for statistical analysis that operate on ragged arrays. These are dimensional reducers, like `ak.sum()`, `ak.min()`, `ak.any()`, and `ak.all()` in the previous section, but they compute quantities such as mean, variance, standard deviation, and higher moments, as well as functions for linear regression and correlation.

```import awkward as ak
import numpy as np
```

## Basic statistical functions#

### Mean, variance, and standard deviation#

To compute the mean, variance, and standard deviation of an array, use `ak.mean()`, `ak.var()`, and `ak.std()`. Unlike the NumPy functions with the same names, these functions apply to arrays with variable-length dimensions and missing values (but not heterogeneous dimensionality or records; see the last section of reducing.

```array = ak.Array([[0, 1.1, 2.2], [3.3, 4.4], [5.5], [6.6, 7.7, 8.8, 9.9]])
```
```ak.mean(array, axis=-1)
```
```[1.1,
3.85,
5.5,
8.25]
-----------------
type: 4 * float64```
```ak.var(array, axis=-1)
```
```[0.807,
0.302,
0,
1.51]
-----------------
type: 4 * float64```
```ak.std(array, axis=-1)
```
```[0.898,
0.55,
0,
1.23]
-----------------
type: 4 * float64```

These functions also have counterparts that ignore `nan` values: `ak.nanmean()`, `ak.nanvar()`, and `ak.nanstd()`.

```array_with_nan = ak.Array([[0, 1.1, np.nan], [3.3, 4.4], [np.nan], [6.6, np.nan, 8.8, 9.9]])
```
```ak.nanmean(array_with_nan, axis=-1)
```
```[0.55,
3.85,
None,
8.43]
------------------
type: 4 * ?float64```
```ak.nanvar(array_with_nan, axis=-1)
```
```[0.303,
0.302,
None,
1.88]
------------------
type: 4 * ?float64```
```ak.nanstd(array_with_nan, axis=-1)
```
```[0.55,
0.55,
None,
1.37]
------------------
type: 4 * ?float64```

Note that floating-point `nan` is different from missing values (`None`). Unlike `nan`, integer arrays can have missing values, and whole lists can be missing as well. For both types of functions, missing values are ignored if they are in the dimension being reduced or pass through a function to the output otherwise, just as the `nan`-ignoring functions ignore `nan`.

```array_with_None = ak.Array([[0, 1.1, 2.2], None, [None, 4.4], [5.5], [6.6, np.nan, 8.8, 9.9]])
```
```ak.mean(array_with_None, axis=-1)
```
```[1.1,
None,
4.4,
5.5,
nan]
------------------
type: 5 * ?float64```
```ak.nanmean(array_with_None, axis=-1)
```
```[1.1,
None,
4.4,
5.5,
8.43]
------------------
type: 5 * ?float64```

### Moments#

For higher moments, use `ak.moment()`. For example, to calculate the third moment (skewness), you would do the following:

```ak.moment(array, 3, axis=-1)
```
```[3.99,
60.6,
166,
599]
-----------------
type: 4 * float64```

## Correlation and covariance#

For correlation and covariance between two arrays, use `ak.corr()` and `ak.covar()`.

```array_x = ak.Array([[0, 1.1, 2.2], [3.3, 4.4], [5.5], [6.6, 7.7, 8.8, 9.9]])
array_y = ak.Array([[0, 1, 2], [3, 4], [5], [6, 7, 8, 9]])
```
```ak.corr(array_x, array_y, axis=-1)
```
```[1,
1,
nan,
1]
-----------------
type: 4 * float64```
```ak.covar(array_x, array_y, axis=-1)
```
```[0.733,
0.275,
0,
1.38]
-----------------
type: 4 * float64```

## Linear fits#

To perform linear fits, use `ak.linear_fit()`. Instead of reducing each list to a number, it reduces each list to a record that has `intercept`, `slope`, `intercept_error`, and `slope_error` fields. (These “errors” are uncertainty estimates of the intercept and slope parameters, assuming that the underlying generator of data is truly linear.)

```ak.linear_fit(array_x, array_y, axis=-1)
```
```[{intercept: 0, slope: 0.909, intercept_error: 0.913, slope_error: 0.643},
{intercept: 0, slope: 0.909, intercept_error: 5, slope_error: 1.29},
{intercept: nan, slope: nan, intercept_error: inf, slope_error: inf},
{intercept: 0, slope: 0.909, intercept_error: 3.39, slope_error: 0.407}]
--------------------------------------------------------------------------
type: 4 * LinearFit[
intercept: float64,
slope: float64,
intercept_error: float64,
slope_error: float64
]```

Ordinary least squares linear fits can be computed by a formula, without approximation or iteration, so it can be thought of like computing the mean or other moments, but with greater fidelity to the data because it models a general correlation. For example, some statistical models achieve high granularity by segmenting a dataset in some meaningful way and then summarizing the data in each segment (such as a regression decision tree). Performing linear fits on each segment fine-tunes the model more than performing just taking the average of data in each segment.

## Peak to peak#

The peak-to-peak function `ak.ptp()` can be used to find the range (maximum - minimum) of data along an axis. It’s more convenient than calling `ak.min()` and `ak.max()` separately.

```ak.ptp(array, axis=-1)
```
```[2.2,
1.1,
0,
3.3]
------------------
type: 4 * ?float64```

## Softmax#

The softmax function is useful in machine learning, particularly in the context of logistic regression and neural networks. Awkward Array provides `ak.softmax()` to compute softmax values of an array.

Note that this function does not reduce a dimension; it computes one output value for each input value, but each output value is normalized by all the other values in the same list.

Also note that only `axis=-1` (innermost lists) is supported by `ak.softmax()`.

```ak.softmax(array, axis=-1)
```
```[[0.0768, 0.231, 0.693],
[0.25, 0.75],
[1],
[0.0249, 0.0748, 0.225, 0.675]]
--------------------------------
type: 4 * var * float64```

## Example uses in data analysis#

Here is an example that normalizes an input array to have an overall mean of 0 and standard deviation of 1:

```array = ak.Array([[1.1, 2.2, 3.3], [4.4, 5.5], [6.6, 7.7, 8.8, 9.9]])
```
```(array - ak.mean(array)) / ak.std(array)
```
```[[-1.55, -1.16, -0.775],
[-0.387, 0],
[0.387, 0.775, 1.16, 1.55]]
----------------------------
type: 3 * var * float64```

And here’s another example that normalizes each list within the array to each have a mean of 0 and a standard deviation of 1:

```(array - ak.mean(array, axis=-1)) / ak.std(array, axis=-1)
```
```[[-1.22, 4.94e-16, 1.22],
[-1, 1],
[-1.34, -0.447, 0.447, 1.34]]
------------------------------
type: 3 * var * float64```