How to convert to/from ROOT RDataFrame#

The ROOT RDataFrame is a declarative, parallel framework for data analysis and manipulation. RDataFrame reads columnar data via a data source. The transformations can be applied to the data to select rows and/or to define new columns, and to produce results: histograms, etc.

import awkward as ak
import ROOT

Welcome to JupyROOT 6.26/10

From Awkward to RDataFrame#

The function for Awkward → RDataFrame conversion is ak.to_rdataframe().

The argument to this function requires a dictionary: { <column name string> : <awkwad array> }. This function always returns

cppyy.gbl.ROOT.RDF.RInterface

object.

array_x = ak.Array(
    [
        {"x": [1.1, 1.2, 1.3]},
        {"x": [2.1, 2.2]},
        {"x": [3.1]},
        {"x": [4.1, 4.2, 4.3, 4.4]},
        {"x": [5.1]},
    ]
)
array_y = ak.Array([1, 2, 3, 4, 5])
array_z = ak.Array([[1.1], [2.1, 2.3, 2.4], [3.1], [4.1, 4.2, 4.3], [5.1]])

The arrays given for each column have to be equal length:

assert len(array_x) == len(array_y) == len(array_z)

The dictionary key defines a column name in RDataFrame.

df = ak.to_rdataframe({"x": array_x, "y": array_y, "z": array_z})

The {func} ak.to_rdataframe function presents a generated on demand Awkward Array view as an RDataFrame source. There is a small overhead of generating Awkward RDataSource C++ code. This operation does not execute the RDataFrame event loop. The array data are not copied.

The column readers are generated based on the run-time type of the views. Here is a description of the RDataFrame columns:

df.Describe().Print()

Dataframe from datasource Custom Datasource

Property                Value
--------                -----
Columns in total            4
Columns from defines        1
Event loops run             0
Processing slots            1

Column          Type                            Origin
------          ----                            ------
awkward_index_  long                            Define
x               awkward::Record_cZovKxoiVwo     Dataset
y               int64_t                         Dataset
z               ROOT::VecOps::RVec<double>      Dataset

The x column contains an Awkward Array with a made-up type; awkward::Record_cKnX5DyNVM.

Awkward Arrays are dynamically typed, so in a C++ context, the type name is hashed. In practice, there is no need to know the type. The C++ code should use a placeholder type specifier auto. The type of the variable that is being declared will be automatically deduced from its initializer.

From RDataFrame to Awkward#

The function for RDataFrame → Awkward conversion is ak.from_rdataframe(). The argument to this function requires a tuple of strings that are the RDataFrame column names. This function always returns

ak.Array

type.

array = ak.from_rdataframe(
    df,
    columns=(
        "x",
        "y",
        "z",
    ),
)
array

[{y: 1, z: [1.1], x: {x: [1.1, ..., 1.3]}},
 {y: 2, z: [2.1, 2.3, 2.4], x: {x: [2.1, ...]}},
 {y: 3, z: [3.1], x: {x: [3.1]}},
 {y: 4, z: [4.1, 4.2, 4.3], x: {x: [4.1, ...]}},
 {y: 5, z: [5.1], x: {x: [5.1]}}]
------------------------------------------------
type: 5 * {
    y: int64,
    z: var * float64,
    x: {
        x: var * float64
    }
}