How to create arrays with LayoutBuilder (more control)

What is LayoutBuilder

ak.layout.LayoutBuilder is the low-level LayoutBuilder that builds layouts, or ak.layout.Content arrays. It must be initialized by a JSON string that represents a valid ak.forms.Form.

A layout consists of composable ak.layout.Content elements that determine how an array is structured. The layout may be considered a “low-level” view, as it distinguishes between arrays that have the same logical meaning (i.e. same JSON output and high-level type) but different

  • node types, such as ak.layout.ListArray64 and ak.layout.ListOffsetArray64,

  • integer type specialization, such as ak.layout.ListArray64 and ak.layout.ListArray32,

  • or specific values, such as gaps in a ak.layout.ListArray64.

ak.forms.Form describes a low-level data type or “form”. There is an exact one-to-one relationship between each ak.layout.Content class and each Form.

ak.layout.LayoutBuilder helps you create these “low-level” views that are described by the form. Once the builder is initialized, it can only build a specific view determined by the layout form.

LayoutBuilder vs ArrayBuilder

The biggest difference between a LayoutBuilder and an ak.ArrayBuilder is that the data types that you can append to the LayoutBuilder are restricted by its Form, while you can append any data types to an ArrayBuilder. The latter flexibility comes with performance limitations.

Appending

import awkward as ak

To create an ak.layout.LayoutBuilder a valid ak.forms.Form in JSON format is needed to initialize it. This ak.forms.Form determines which commands and which data types are accepted by the builder.

Here is an example of a JSON form describing an ak.layout.UnionArray8_64 array of a union-type:

form = """
{
  "class": "UnionArray8_64",
  "tags": "i8",
  "index": "i64",
  "contents": [
      "float64",
      "bool",
      "int64"
  ],
  "form_key": "node0"
}
  """

When a layout builder is created from this form, it cannot be modified. The builder accepts only data types spcified in the form: float64, bool, or int64. The appending data builder methods are restricted to float64, boolean, and int64. The methods have similar to the data type names.

builder = ak.layout.LayoutBuilder32(form)

A tag is associated with each of the UnionArray contents. The tags are contiguous integers, starting with 0 for the first content.

A tag command has to be issued prior to each data method:

builder.tag(0)
builder.float64(1.1)
builder.tag(1)
builder.boolean(False)
builder.tag(2)
builder.int64(11)

The contents filling order can be arbitrary. tag uniquely identifies the content, the next command fills it.

builder.tag(0)
builder.float64(2.2)
builder.tag(1)
builder.boolean(False)
builder.tag(0)
builder.float64(2.2)
builder.tag(0)
builder.float64(3.3)
builder.tag(1)
builder.boolean(True)
builder.tag(0)
builder.float64(4.4)
builder.tag(1)
builder.boolean(False)
builder.tag(1)
builder.boolean(True)
builder.tag(0)
builder.float64(-2.2)

Snapshot

To turn a LayoutBuilder into a layout, call snapshot. This is an inexpensive operation (may be done multiple times; the builder is unaffected).

layout = builder.snapshot()
layout
<UnionArray8_64>
    <tags><Index8 i="[0 1 2 0 1 0 0 1 0 1 1 0]" offset="0" length="12" at="0x000002e83840"/></tags>
    <index><Index64 i="[0 0 0 1 1 2 3 2 4 3 4 5]" offset="0" length="12" at="0x000002f19590"/></index>
    <content tag="0">
        <NumpyArray format="d" shape="6" data="1.1 2.2 2.2 3.3 4.4 -2.2" at="0x000002f19a20"/>
    </content>
    <content tag="1">
        <NumpyArray format="?" shape="5" data="false false true false true" at="0x000002f04730"/>
    </content>
    <content tag="2">
        <NumpyArray format="l" shape="1" data="11" at="0x000002f1ba30"/>
    </content>
</UnionArray8_64>

If you want to use the layout as a high-level array for normal analysis, remember to convert it.

array = ak.Array(layout)
array
<Array [1.1, False, 11, ... False, True, -2.2] type='12 * union[float64, bool, i...'>

Nested lists

To fill data inside of a list use the following commands:

  • begin_list/end_list

Here is an example of a list offset array form:

form = """
{
  "class": "ListOffsetArray64",
  "offsets": "i64",
  "content": "float64",
  "form_key": "node0"
}
"""

Create a builder from the form:

builder = ak.layout.LayoutBuilder32(form)

and append the data between begin_list and end_list:

builder.begin_list()
builder.float64(1.1)
builder.float64(2.2)
builder.float64(3.3)
builder.end_list()

To append an empty list:

builder.begin_list()
builder.end_list()

and continue:

builder.begin_list()
builder.float64(4.4)
builder.float64(5.5)
builder.end_list()

Remember, you can taka a snapshot at any time:

layout = builder.snapshot()
layout
<ListOffsetArray64>
    <offsets><Index64 i="[0 3 3 5]" offset="0" length="4" at="0x000002f331a0"/></offsets>
    <content><NumpyArray format="d" shape="5" data="1.1 2.2 3.3 4.4 5.5" at="0x000002f351b0"/></content>
</ListOffsetArray64>

Nested records

When using a RecordArray form you can not specify a field, the fields alternate.

form = """
{
"class": "RecordArray",
"contents": {
    "one": "float64",
    "two": "int64"
},
"form_key": "node0"
}
"""
builder = ak.layout.LayoutBuilder32(form)

# the fields alternate
builder.float64(1.1)  # "one"
builder.int64(2)      # "two"
builder.float64(3.3)  # "one"
builder.int64(4)      # "two"
layout = builder.snapshot()
layout
<RecordArray length="2">
    <field index="0" key="one">
        <NumpyArray format="d" shape="2" data="1.1 3.3" at="0x000002f391d0"/>
    </field>
    <field index="1" key="two">
        <NumpyArray format="l" shape="2" data="2 4" at="0x000002f3b1e0"/>
    </field>
</RecordArray>

Similarly, for the record contents with the same type:

form = """
{
"class": "RecordArray",
"contents": {
    "one": "float64",
    "two": "float64"
},
"form_key": "node0"
}
"""
builder = ak.layout.LayoutBuilder32(form)

If record contents have the same type, the fields alternate:

builder.float64(1.1)  # "one"
builder.float64(2.2)  # "two"
builder.float64(3.3)  # "one"
builder.float64(4.4)  # "two"

A more complex example

A more complex example that contains both nestsed lists and records:

form = """
{
  "class": "ListOffsetArray64",
  "offsets": "i64",
  "content": {
      "class": "RecordArray",
      "contents": {
          "x": {
              "class": "NumpyArray",
              "primitive": "float64",
              "form_key": "node2"
          },
          "y": {
              "class": "ListOffsetArray64",
              "offsets": "i64",
              "content": {
                  "class": "NumpyArray",
                  "primitive": "int64",
                  "form_key": "node4"
              },
              "form_key": "node3"
          }
      },
      "form_key": "node1"
  },
  "form_key": "node0"
}
  """

Create a builder from a form:

builder = ak.layout.LayoutBuilder32(form)

Start appending the data:

builder.begin_list()
builder.float64(1.1)
builder.begin_list()
builder.int64(1)
builder.end_list()
builder.float64(2.2)
builder.begin_list()
builder.int64(1)
builder.int64(2)
builder.end_list()
builder.end_list()

builder.begin_list()
builder.end_list()

builder.begin_list()
builder.float64(3.3)
builder.begin_list()
builder.int64(1)
builder.int64(2)
builder.int64(3)
builder.end_list()
builder.end_list()

and take a snapshot:

layout = builder.snapshot()
layout
<ListOffsetArray64>
    <offsets><Index64 i="[0 2 2 3]" offset="0" length="4" at="0x000002f45230"/></offsets>
    <content><RecordArray length="3">
        <field index="0" key="x">
            <NumpyArray format="d" shape="3" data="1.1 2.2 3.3" at="0x000002f47240"/>
        </field>
        <field index="1" key="y">
            <ListOffsetArray64>
                <offsets><Index64 i="[0 1 3 6]" offset="0" length="4" at="0x000002f49250"/></offsets>
                <content><NumpyArray format="l" shape="6" data="1 1 2 1 2 3" at="0x000002f4b260"/></content>
            </ListOffsetArray64>
        </field>
    </RecordArray></content>
</ListOffsetArray64>

Error handling

The commands given to the LayoutBuilder must be in the order described by its Form. Issuing a non conforming command or issuing a command in an incorrect order is treated as a user error. As soon as an unexpected command is issued, the builder stops appending data. It is not possible to recover from this state. All you can do is to take a snapshot to recover the accumulated data.