ak.to_dataframe#
Defined in awkward.operations.ak_to_dataframe on line 12.
- ak.to_dataframe(array, *, how='inner', levelname=lambda i: ..., anonymous='values')#
- Parameters
array – Array-like data (anything
ak.to_layout
recognizes).how (None or str) – Passed to pd.merge to combine DataFrames for each multiplicity into one DataFrame. If None, a list of Pandas DataFrames is returned.
levelname (int -> str) – Computes a name for each level of the row index from the number of levels deep.
anonymous (str) – Column name to use if the
array
does not contain records; otherwise, column names are derived from record fields.
Converts Awkward data structures into Pandas MultiIndex rows and columns. The resulting DataFrame(s) contains no Awkward structures.
ak.Array
structures can’t be losslessly converted into a single DataFrame;
different fields in a record structure might have different nested list
lengths, but a DataFrame can have only one index.
If how
is None, this function always returns a list of DataFrames (even
if it contains only one DataFrame); otherwise how
is passed to
pd.merge
to merge them into a single DataFrame with the associated loss of data.
In the following example, nested lists are converted into MultiIndex rows.
The index level names "entry"
, "subentry"
and "subsubentry"
can be
controlled with the levelname
parameter. The column name "values"
is
assigned because this array has no fields; it can be controlled with the
anonymous
parameter.
>>> ak.to_dataframe(ak.Array([[[1.1, 2.2], [], [3.3]],
... [],
... [[4.4], [5.5, 6.6]],
... [[7.7]],
... [[8.8]]]))
values
entry subentry subsubentry
0 0 0 1.1
1 2.2
2 0 3.3
2 0 0 4.4
1 0 5.5
1 6.6
3 0 0 7.7
4 0 0 8.8
In this example, nested records are converted into MultiIndex columns. (MultiIndex rows and columns can be mixed; these examples are deliberately simple.)
>>> ak.to_dataframe(ak.Array([
... {"I": {"a": _, "b": {"i": _}}, "II": {"x": {"y": {"z": _}}}}
... for _ in range(0, 50, 10)]))
I II
a b x
i y
z
entry
0 0 0 0
1 10 10 10
2 20 20 20
3 30 30 30
4 40 40 40
The following two examples show how fields of different length lists are
merged. With how="inner"
(default), only subentries that exist for all
fields are preserved; with how="outer"
, all subentries are preserved at
the expense of requiring missing values.
>>> ak.to_dataframe(ak.Array([{"x": [], "y": [4.4, 3.3, 2.2, 1.1]},
... {"x": [1], "y": [3.3, 2.2, 1.1]},
... {"x": [1, 2], "y": [2.2, 1.1]},
... {"x": [1, 2, 3], "y": [1.1]},
... {"x": [1, 2, 3, 4], "y": []}]),
... how="inner")
x y
entry subentry
1 0 1 3.3
2 0 1 2.2
1 2 1.1
3 0 1 1.1
The same with how="outer"
:
>>> ak.to_dataframe(ak.Array([{"x": [], "y": [4.4, 3.3, 2.2, 1.1]},
... {"x": [1], "y": [3.3, 2.2, 1.1]},
... {"x": [1, 2], "y": [2.2, 1.1]},
... {"x": [1, 2, 3], "y": [1.1]},
... {"x": [1, 2, 3, 4], "y": []}]),
... how="outer")
x y
entry subentry
0 0 NaN 4.4
1 NaN 3.3
2 NaN 2.2
3 NaN 1.1
1 0 1.0 3.3
1 NaN 2.2
2 NaN 1.1
2 0 1.0 2.2
1 2.0 1.1
3 0 1.0 1.1
1 2.0 NaN
2 3.0 NaN
4 0 1.0 NaN
1 2.0 NaN
2 3.0 NaN
3 4.0 NaN