How to convert to/from ROOT with Uproot

Uproot defaults to reading data as Awkward Arrays, so there usually isn’t any extra work to do. But there are caveats, mostly with the legacy version of Uproot (Uproot 3).

To find out which version you’re using:

  • if you import uproot3, then it’s Uproot 3;

  • if you import uproot or import uproot4, then it’s Uproot 4.

import awkward as ak
import numpy as np
import uproot3
import uproot

From ROOT to Awkward with Uproot 4

By default, Uproot 4 delivers data from ROOT files as Awkward 1 arrays, even though the Awkward library isn’t one of Uproot’s formal dependencies. (If you try to use Uproot 4 without having Awkward, you’ll quickly be presented with an ImportError and suggestions about how to proceed.)

To start, open a file and look at the objects it contains.

up4_file = uproot.open("http://scikit-hep.org/uproot3/examples/HZZ.root")
up4_file.classnames()
{'events': 'TTree'}

From the above, we learn that "events" is a TTree, so we read its metadata with:

up4_events = up4_file["events"]
up4_events
<TTree 'events' (51 branches) at 0x7f1c3ebcac50>

And then look at its branches and their types.

up4_events.show()
name                 | typename                 | interpretation                
---------------------+--------------------------+-------------------------------
NJet                 | int32_t                  | AsDtype('>i4')
Jet_Px               | float[]                  | AsJagged(AsDtype('>f4'))
Jet_Py               | float[]                  | AsJagged(AsDtype('>f4'))
Jet_Pz               | float[]                  | AsJagged(AsDtype('>f4'))
Jet_E                | float[]                  | AsJagged(AsDtype('>f4'))
Jet_btag             | float[]                  | AsJagged(AsDtype('>f4'))
Jet_ID               | bool[]                   | AsJagged(AsDtype('bool'))
NMuon                | int32_t                  | AsDtype('>i4')
Muon_Px              | float[]                  | AsJagged(AsDtype('>f4'))
Muon_Py              | float[]                  | AsJagged(AsDtype('>f4'))
Muon_Pz              | float[]                  | AsJagged(AsDtype('>f4'))
Muon_E               | float[]                  | AsJagged(AsDtype('>f4'))
Muon_Charge          | int32_t[]                | AsJagged(AsDtype('>i4'))
Muon_Iso             | float[]                  | AsJagged(AsDtype('>f4'))
NElectron            | int32_t                  | AsDtype('>i4')
Electron_Px          | float[]                  | AsJagged(AsDtype('>f4'))
Electron_Py          | float[]                  | AsJagged(AsDtype('>f4'))
Electron_Pz          | float[]                  | AsJagged(AsDtype('>f4'))
Electron_E           | float[]                  | AsJagged(AsDtype('>f4'))
Electron_Charge      | int32_t[]                | AsJagged(AsDtype('>i4'))
Electron_Iso         | float[]                  | AsJagged(AsDtype('>f4'))
NPhoton              | int32_t                  | AsDtype('>i4')
Photon_Px            | float[]                  | AsJagged(AsDtype('>f4'))
Photon_Py            | float[]                  | AsJagged(AsDtype('>f4'))
Photon_Pz            | float[]                  | AsJagged(AsDtype('>f4'))
Photon_E             | float[]                  | AsJagged(AsDtype('>f4'))
Photon_Iso           | float[]                  | AsJagged(AsDtype('>f4'))
MET_px               | float                    | AsDtype('>f4')
MET_py               | float                    | AsDtype('>f4')
MChadronicBottom_px  | float                    | AsDtype('>f4')
MChadronicBottom_py  | float                    | AsDtype('>f4')
MChadronicBottom_pz  | float                    | AsDtype('>f4')
MCleptonicBottom_px  | float                    | AsDtype('>f4')
MCleptonicBottom_py  | float                    | AsDtype('>f4')
MCleptonicBottom_pz  | float                    | AsDtype('>f4')
MChadronicWDecayQ... | float                    | AsDtype('>f4')
MChadronicWDecayQ... | float                    | AsDtype('>f4')
MChadronicWDecayQ... | float                    | AsDtype('>f4')
MChadronicWDecayQ... | float                    | AsDtype('>f4')
MChadronicWDecayQ... | float                    | AsDtype('>f4')
MChadronicWDecayQ... | float                    | AsDtype('>f4')
MClepton_px          | float                    | AsDtype('>f4')
MClepton_py          | float                    | AsDtype('>f4')
MClepton_pz          | float                    | AsDtype('>f4')
MCleptonPDGid        | int32_t                  | AsDtype('>i4')
MCneutrino_px        | float                    | AsDtype('>f4')
MCneutrino_py        | float                    | AsDtype('>f4')
MCneutrino_pz        | float                    | AsDtype('>f4')
NPrimaryVertices     | int32_t                  | AsDtype('>i4')
triggerIsoMu24       | bool                     | AsDtype('bool')
EventWeight          | float                    | AsDtype('>f4')

Some of these branches have a single value per event (e.g. "MET_px" has type float) and some have multiple values per event (e.g. "Muon_Px" has type float[]).

Regardless of type, they would all be returned as Awkward Arrays:

array = up4_events["MET_px"].array()
array
<Array [5.91, 24.8, -25.8, ... 79.9, 19.7] type='2421 * float32'>
type(array)
awkward.highlevel.Array
array = up4_events["Muon_Px"].array()
array
<Array [[-52.9, 37.7], ... 1.14], [23.9]] type='2421 * var * float32'>
type(array)
awkward.highlevel.Array

Because library="ak" is the default. Setting it to another value, like library="np", returns non-Awkward arrays.

array = up4_events["MET_px"].array(library="np")
array
array([  5.912771,  24.765203, -25.785088, ...,  18.101646,  79.87519 ,
        19.713749], dtype=float32)
type(array)
numpy.ndarray
array = up4_events["Muon_Px"].array(library="np")
array
array([array([-52.899456,  37.73778 ], dtype=float32),
       array([-0.81645936], dtype=float32),
       array([48.98783  ,  0.8275667], dtype=float32), ...,
       array([-29.756786], dtype=float32),
       array([1.1418698], dtype=float32),
       array([23.913206], dtype=float32)], dtype=object)
type(array)
numpy.ndarray

Uproot’s arrays method (plural) returns a “package” of related arrays, which for the Awkward library means arrays presented as records.

arrays = up4_events.arrays(["Muon_Px", "Muon_Py", "Muon_Pz"])
arrays
<Array [{Muon_Px: [-52.9, 37.7, ... 54.7]}] type='2421 * {"Muon_Px": var * float...'>
ak.type(arrays)
2421 * {"Muon_Px": var * float32, "Muon_Py": var * float32, "Muon_Pz": var * float32}
arrays["Muon_Px"]
<Array [[-52.9, 37.7], ... 1.14], [23.9]] type='2421 * var * float32'>

With arrays, the how="zip" option attempts to ak.zip lists with common list lengths.

Note that the ak.type below is var * {all fields}, rather than {field: var, field: var, ...}.

arrays = up4_events.arrays(["Muon_Px", "Muon_Py", "Muon_Pz"], how="zip")
arrays
<Array [{Muon: [{Px: -52.9, ... Pz: 54.7}]}] type='2421 * {"Muon": var * {"Px": ...'>
ak.type(arrays)
2421 * {"Muon": var * {"Px": float32, "Py": float32, "Pz": float32}}
arrays.Muon.Px
<Array [[-52.9, 37.7], ... 1.14], [23.9]] type='2421 * var * float32'>

If some of the branches cannot be combined because they have different multiplicities, they are kept separate.

arrays = up4_events.arrays(["Muon_Px", "Muon_Py", "Muon_Pz", "Jet_Px", "Jet_Py", "Jet_Pz"], how="zip")
arrays
<Array [{Muon: [{Px: -52.9, ... Jet: []}] type='2421 * {"Muon": var * {"Px": flo...'>
ak.type(arrays)
2421 * {"Muon": var * {"Px": float32, "Py": float32, "Pz": float32}, "Jet": var * {"Px": float32, "Py": float32, "Pz": float32}}
arrays.Muon.Px
<Array [[-52.9, 37.7], ... 1.14], [23.9]] type='2421 * var * float32'>
arrays.Jet.Px
<Array [[], [-38.9], ... [-36.4, -15.3], []] type='2421 * var * float32'>

From Awkward to ROOT with Uproot 4

Not implemented yet: see the bottom of this page for writing files with Uproot 3.

From ROOT to Awkward with Uproot 3

Some of the arrays returned by Uproot 3 are NumPy arrays and others are Awkward 0 (i.e. “old library”) arrays, depending on whether Awkward is needed.

Uproot 4 is recommended unless you’re dealing with legacy software built on Uproot 3.

To start, open a file and look at the objects it contains.

up3_file = uproot3.open("http://scikit-hep.org/uproot3/examples/HZZ.root")
up3_file.classnames()
[(b'events;1', 'TTree')]

From the above, we learn that "events" is a TTree, so we read its metadata with:

up3_events = up3_file["events"]
up3_events
<TTree b'events' at 0x7f1c3c3cf250>

And then look at its branches and their types.

up3_events.show()
NJet                       (no streamer)              asdtype('>i4')
Jet_Px                     (no streamer)              asjagged(asdtype('>f4'))
Jet_Py                     (no streamer)              asjagged(asdtype('>f4'))
Jet_Pz                     (no streamer)              asjagged(asdtype('>f4'))
Jet_E                      (no streamer)              asjagged(asdtype('>f4'))
Jet_btag                   (no streamer)              asjagged(asdtype('>f4'))
Jet_ID                     (no streamer)              asjagged(asdtype('bool'))
NMuon                      (no streamer)              asdtype('>i4')
Muon_Px                    (no streamer)              asjagged(asdtype('>f4'))
Muon_Py                    (no streamer)              asjagged(asdtype('>f4'))
Muon_Pz                    (no streamer)              asjagged(asdtype('>f4'))
Muon_E                     (no streamer)              asjagged(asdtype('>f4'))
Muon_Charge                (no streamer)              asjagged(asdtype('>i4'))
Muon_Iso                   (no streamer)              asjagged(asdtype('>f4'))
NElectron                  (no streamer)              asdtype('>i4')
Electron_Px                (no streamer)              asjagged(asdtype('>f4'))
Electron_Py                (no streamer)              asjagged(asdtype('>f4'))
Electron_Pz                (no streamer)              asjagged(asdtype('>f4'))
Electron_E                 (no streamer)              asjagged(asdtype('>f4'))
Electron_Charge            (no streamer)              asjagged(asdtype('>i4'))
Electron_Iso               (no streamer)              asjagged(asdtype('>f4'))
NPhoton                    (no streamer)              asdtype('>i4')
Photon_Px                  (no streamer)              asjagged(asdtype('>f4'))
Photon_Py                  (no streamer)              asjagged(asdtype('>f4'))
Photon_Pz                  (no streamer)              asjagged(asdtype('>f4'))
Photon_E                   (no streamer)              asjagged(asdtype('>f4'))
Photon_Iso                 (no streamer)              asjagged(asdtype('>f4'))
MET_px                     (no streamer)              asdtype('>f4')
MET_py                     (no streamer)              asdtype('>f4')
MChadronicBottom_px        (no streamer)              asdtype('>f4')
MChadronicBottom_py        (no streamer)              asdtype('>f4')
MChadronicBottom_pz        (no streamer)              asdtype('>f4')
MCleptonicBottom_px        (no streamer)              asdtype('>f4')
MCleptonicBottom_py        (no streamer)              asdtype('>f4')
MCleptonicBottom_pz        (no streamer)              asdtype('>f4')
MChadronicWDecayQuark_px   (no streamer)              asdtype('>f4')
MChadronicWDecayQuark_py   (no streamer)              asdtype('>f4')
MChadronicWDecayQuark_pz   (no streamer)              asdtype('>f4')
MChadronicWDecayQuarkBar_px
                           (no streamer)              asdtype('>f4')
MChadronicWDecayQuarkBar_py
                           (no streamer)              asdtype('>f4')
MChadronicWDecayQuarkBar_pz
                           (no streamer)              asdtype('>f4')
MClepton_px                (no streamer)              asdtype('>f4')
MClepton_py                (no streamer)              asdtype('>f4')
MClepton_pz                (no streamer)              asdtype('>f4')
MCleptonPDGid              (no streamer)              asdtype('>i4')
MCneutrino_px              (no streamer)              asdtype('>f4')
MCneutrino_py              (no streamer)              asdtype('>f4')
MCneutrino_pz              (no streamer)              asdtype('>f4')
NPrimaryVertices           (no streamer)              asdtype('>i4')
triggerIsoMu24             (no streamer)              asdtype('bool')
EventWeight                (no streamer)              asdtype('>f4')

Some of these branches have a single value per event (e.g. "MET_px" has interpretation asdtype('>f4')) and some have multiple values per event (e.g. "Muon_Px" has interpretation asjagged(asdtype('>f4'))).

Data that can be interpreted asdtype are returned as NumPy arrays:

array = up3_events.array("MET_px")
array
array([  5.912771,  24.765203, -25.785088, ...,  18.101646,  79.87519 ,
        19.713749], dtype=float32)
type(array)
numpy.ndarray

NumPy arrays can be converted to Awkward 1 (i.e. “new library”) by passing them to the ak.Array constructor or the ak.from_numpy function.

ak.Array(array)
<Array [5.91, 24.8, -25.8, ... 79.9, 19.7] type='2421 * float32'>

And data that require asjagged or other specialized interpretations are returned as Awkward Arrays:

array = up3_events.array("Muon_Px")
array
<JaggedArray [[-52.899456 37.73778] [-0.81645936] [48.98783 0.8275667] ... [-29.756786] [1.1418698] [23.913206]] at 0x7f1c3c384690>
type(array)
awkward0.array.jagged.JaggedArray

Awkward 0 arrays can be converted to Awkward 1 by passing them to the ak.from_awkward0 function. (There’s also an ak.to_awkward0 for the other direction; conversions are usually zero-copy and quick.)

ak.from_awkward0(array)
<Array [[-52.9, 37.7], ... 1.14], [23.9]] type='2421 * var * float32'>

(Unlike Uproot 4, there isn’t a way to specify which library you want for returning output.)

Uproot 3’s arrays method (plural) returns “packages” of related arrays as Python dicts:

arrays = up3_events.arrays(["Muon_Px", "Muon_Py", "Muon_Pz"])
arrays
{b'Muon_Px': <JaggedArray [[-52.899456 37.73778] [-0.81645936] [48.98783 0.8275667] ... [-29.756786] [1.1418698] [23.913206]] at 0x7f1c3c604290>,
 b'Muon_Py': <JaggedArray [[-11.654672 0.6934736] [-24.404259] [-21.723139 29.800508] ... [-15.303859] [63.60957] [-35.665077]] at 0x7f1c3c6046d0>,
 b'Muon_Pz': <JaggedArray [[-8.160793 -11.307582] [20.199968] [11.168285 36.96519] ... [-52.66375] [162.17632] [54.719437]] at 0x7f1c3c38ef10>}

Be careful of the bytestring keys (dict key type is bytes, rather than str) and note that you can only convert all of the arrays with a loop: they are separate entities.

{name.decode(): ak.from_awkward0(array) for name, array in arrays.items()}
{'Muon_Px': <Array [[-52.9, 37.7], ... 1.14], [23.9]] type='2421 * var * float32'>,
 'Muon_Py': <Array [[-11.7, 0.693], ... 63.6], [-35.7]] type='2421 * var * float32'>,
 'Muon_Pz': <Array [[-8.16, -11.3], ... 162], [54.7]] type='2421 * var * float32'>}

From Awkward to ROOT with Uproot 3

Since ROOT file-writing is only implemented in Uproot 3, you’ll need to take into consideration whether an array is flat, and therefore NumPy, or jagged, and therefore Awkward 0 (i.e. “old library”).

To open a flie for writing, use uproot.recreate, rather than uproot.open.

file = uproot3.recreate("/tmp/example.root")
file
<TFileRecreate b'example.root' at 0x7f1c3c3a7890>

The uproot3.newtree function creates a tree that can be written. The data types for each branch have to be specified.

file["tree1"] = uproot3.newtree({"branch1": int, "branch2": np.float32})

The method for writing is extend, which can be called as many times as needed to write array chunks to the file.

The chunks should be large (each represents a ROOT TBasket) and must include equal-length arrays for each branch.

file["tree1"].extend({"branch1": np.array([0, 1, 2, 3, 4]),
                      "branch2": np.array([0.0, 1.1, 2.2, 3.3, 4.4], dtype=np.float32)})
file["tree1"].extend({"branch1": np.array([5, 6, 7, 8, 9]),
                      "branch2": np.array([5.5, 6.6, 7.7, 8.8, 9.9], dtype=np.float32)})

To write a jagged array, it must be in Awkward 0 format. You may need to use ak.to_awkward0.

ak0_array = ak.to_awkward0(ak.Array([[1.1, 2.2, 3.3], [], [4.4, 5.5]]))
ak0_array
<JaggedArray [[1.1 2.2 3.3] [] [4.4 5.5]] at 0x7f1c3c3479d0>

And you will need its counts. (This is the Awkward 0 equivalent of ak.num.

ak0_array.counts
array([3, 0, 2], dtype=int64)

The branch’s type has to be constructed with the uproot3.newbranch function and has to include a size, into which the counts will be written.

file["tree2"] = uproot3.newtree({"branch3": uproot3.newbranch(np.dtype("f8"), size="n")})

Fill each chunk by assigning the branch data and the counts in each extend.

file["tree2"].extend({"branch3": ak0_array, "n": ak0_array.counts})

File-closure could also be enforced by putting uproot3.recreate in a context manager (Python with statement).

file.close()