ak.to_dataframe#

Defined in awkward.operations.ak_to_dataframe on line 12.

ak.to_dataframe(array, *, how='inner', levelname=lambda i: ..., anonymous='values')#
Parameters
  • array – Array-like data (anything ak.to_layout recognizes).

  • how (None or str) – Passed to pd.merge to combine DataFrames for each multiplicity into one DataFrame. If None, a list of Pandas DataFrames is returned.

  • levelname (int -> str) – Computes a name for each level of the row index from the number of levels deep.

  • anonymous (str) – Column name to use if the array does not contain records; otherwise, column names are derived from record fields.

Converts Awkward data structures into Pandas MultiIndex rows and columns. The resulting DataFrame(s) contains no Awkward structures.

ak.Array structures can’t be losslessly converted into a single DataFrame; different fields in a record structure might have different nested list lengths, but a DataFrame can have only one index.

If how is None, this function always returns a list of DataFrames (even if it contains only one DataFrame); otherwise how is passed to pd.merge to merge them into a single DataFrame with the associated loss of data.

In the following example, nested lists are converted into MultiIndex rows. The index level names "entry", "subentry" and "subsubentry" can be controlled with the levelname parameter. The column name "values" is assigned because this array has no fields; it can be controlled with the anonymous parameter.

>>> ak.to_dataframe(ak.Array([[[1.1, 2.2], [], [3.3]],
...                           [],
...                           [[4.4], [5.5, 6.6]],
...                           [[7.7]],
...                           [[8.8]]]))
                            values
entry subentry subsubentry
0     0        0               1.1
               1               2.2
      2        0               3.3
2     0        0               4.4
      1        0               5.5
               1               6.6
3     0        0               7.7
4     0        0               8.8

In this example, nested records are converted into MultiIndex columns. (MultiIndex rows and columns can be mixed; these examples are deliberately simple.)

>>> ak.to_dataframe(ak.Array([
...     {"I": {"a": _, "b": {"i": _}}, "II": {"x": {"y": {"z": _}}}}
...     for _ in range(0, 50, 10)]))
        I      II
        a   b   x
            i   y
                z
entry
0       0   0   0
1      10  10  10
2      20  20  20
3      30  30  30
4      40  40  40

The following two examples show how fields of different length lists are merged. With how="inner" (default), only subentries that exist for all fields are preserved; with how="outer", all subentries are preserved at the expense of requiring missing values.

>>> ak.to_dataframe(ak.Array([{"x": [], "y": [4.4, 3.3, 2.2, 1.1]},
...                           {"x": [1], "y": [3.3, 2.2, 1.1]},
...                           {"x": [1, 2], "y": [2.2, 1.1]},
...                           {"x": [1, 2, 3], "y": [1.1]},
...                           {"x": [1, 2, 3, 4], "y": []}]),
...                          how="inner")
                x    y
entry subentry
1     0         1  3.3
2     0         1  2.2
      1         2  1.1
3     0         1  1.1

The same with how="outer":

>>> ak.to_dataframe(ak.Array([{"x": [], "y": [4.4, 3.3, 2.2, 1.1]},
...                           {"x": [1], "y": [3.3, 2.2, 1.1]},
...                           {"x": [1, 2], "y": [2.2, 1.1]},
...                           {"x": [1, 2, 3], "y": [1.1]},
...                           {"x": [1, 2, 3, 4], "y": []}]),
...                          how="outer")
                  x    y
entry subentry
0     0         NaN  4.4
      1         NaN  3.3
      2         NaN  2.2
      3         NaN  1.1
1     0         1.0  3.3
      1         NaN  2.2
      2         NaN  1.1
2     0         1.0  2.2
      1         2.0  1.1
3     0         1.0  1.1
      1         2.0  NaN
      2         3.0  NaN
4     0         1.0  NaN
      1         2.0  NaN
      2         3.0  NaN
      3         4.0  NaN