ak.ArrayBuilder#
Defined in awkward.highlevel on line 2585.
- class ak.ArrayBuilder(*, behavior=None, attrs=None, initial=1024, resize=8)#
- Parameters:
behavior (None or dict) – Custom
ak.behaviorfor arrays built by this ArrayBuilder.initial (int) – Initial size (in bytes) of buffers used by the
ak::ArrayBuilder.resize (float) – Resize multiplier for buffers used by the
ak::ArrayBuilder; should be strictly greater than 1.
General tool for building arrays of nested data structures from a sequence of commands. Most data types can be constructed by calling commands in the right order, similar to printing tokens to construct JSON output.
To illustrate how this works, consider the following example.:
b = ak.ArrayBuilder() # fill commands # as JSON # current array type ########################################################################################## b.begin_list() # [ # 0 * var * unknown (initially, the type is unknown) b.integer(1) # 1, # 0 * var * int64 b.integer(2) # 2, # 0 * var * int64 b.real(3) # 3.0 # 0 * var * float64 (all the integers have become floats) b.end_list() # ], # 1 * var * float64 (closed first list; array length is 1) b.begin_list() # [ # 1 * var * float64 b.end_list() # ], # 2 * var * float64 (closed empty list; array length is 2) b.begin_list() # [ # 2 * var * float64 b.integer(4) # 4, # 2 * var * float64 b.null() # null, # 2 * var * ?float64 (now the floats are nullable) b.integer(5) # 5 # 2 * var * ?float64 b.end_list() # ], # 3 * var * ?float64 b.begin_list() # [ # 3 * var * ?float64 b.begin_record() # { # 3 * var * union[?float64, ?{}] b.field("x") # "x": # 3 * var * union[?float64, ?{x: unknown}] b.integer(1) # 1, # 3 * var * union[?float64, ?{x: int64}] b.field("y") # "y": # 3 * var * union[?float64, ?{x: int64, y: unknown}] b.begin_list() # [ # 3 * var * union[?float64, ?{x: int64, y: var * unknown}] b.integer(2) # 2, # 3 * var * union[?float64, ?{x: int64, y: var * int64}] b.integer(3) # 3 # 3 * var * union[?float64, ?{x: int64, y: var * int64}] b.end_list() # ] # 3 * var * union[?float64, ?{x: int64, y: var * int64}] b.end_record() # } # 3 * var * union[?float64, ?{x: int64, y: var * int64}] b.end_list() # ] # 4 * var * union[?float64, ?{x: int64, y: var * int64}]
To get an array, we take a
snapshotof the ArrayBuilder’s current state.>>> b.snapshot() <Array [[1, 2, 3], ..., [{x: 1, y: ..., ...}]] type='4 * var * union[?float...'> >>> b.snapshot().show() [[1, 2, 3], [], [4, None, 5], [{x: 1, y: [2, 3]}]]
The full set of filling commands is the following.
null: appends a None value.boolean: appends True or False.integer: appends an integer.real: appends a floating-point value.complex: appends a complex value.datetime: appends a datetime value.timedelta: appends a timedelta value.bytestring: appends an unencoded string (raw bytes).string: appends a UTF-8 encoded string.begin_list: begins filling a list; must be closed withend_list.end_list: ends a list.begin_tuple: begins filling a tuple; must be closed withend_tuple.index: selects a tuple slot to fill; must be followed by a commandthat actually fills that slot.
end_tuple: ends a tuple.begin_record: begins filling a record; must be closed with
field: selects a record field to fill; must be followed by a commandthat actually fills that field.
end_record: ends a record.extend: appends all the items from an iterable.list: context manager forbegin_listandend_list.tuple: context manager forbegin_tupleandend_tuple.record: context manager forbegin_recordandend_record.
ArrayBuilders can be used in Numba: they can be passed as arguments to a Numba-compiled function or returned as return values. (Since ArrayBuilder works by accumulating side-effects, it’s not strictly necessary to return the object.)
The primary limitation is that ArrayBuilders cannot be created and
snapshotcannot be called inside the Numba-compiled function. Awkward Array uses Numba as a transformer:ak.Arrayand an emptyak.ArrayBuildergo in and a filledak.ArrayBuilderis the result;snapshotcan be called outside of the compiled function.Also, context managers (Python’s
withstatement) are not supported in Numba yet, so thelist,tuple, andrecordmethods are not available in Numba-compiled functions.Here is an example of filling an ArrayBuilder in Numba, which makes a tree of dynamic depth.
>>> import numba as nb >>> @nb.njit ... def deepnesting(builder, probability): ... if np.random.uniform(0, 1) > probability: ... builder.append(np.random.normal()) ... else: ... builder.begin_list() ... for i in range(np.random.poisson(3)): ... deepnesting(builder, probability**2) ... builder.end_list() ... >>> builder = ak.ArrayBuilder() >>> deepnesting(builder, 0.9) >>> builder.snapshot() <Array [[[-0.523, ..., [[2.16, ...], ...]]]] type='1 * var * var * union[fl...'> >>> builder.type.show() 1 * var * var * union[ float64, var * union[ var * union[ float64, var * unknown ], float64 ] ]
Note that this is a general method for building arrays; if the type is known in advance, more specialized procedures can be faster. This should be considered the “least effort” approach.
- _layout#
- _behavior = None#
- _attrs = None#
- classmethod _wrap(layout, behavior=None, attrs=None)#
- Parameters:
layout (
ak._ext.ArrayBuilder) – Low-level builder to wrap.behavior (None or dict) – Custom
ak.behaviorfor arrays built by this ArrayBuilder.
Wraps a low-level
ak._ext.ArrayBuilderas a high-levelak.ArrayBulider.The
ak.ArrayBuilderconstructor creates a newak._ext.ArrayBuilderwith no accumulated data, but Numba needs to wrap existing data when returning from a lowered function.
- property attrs: awkward._attrs.Attrs#
The mapping containing top-level metadata, which is serialised with the array during pickling.
Keys prefixed with
@are identified as “transient” attributes which are discarded prior to pickling, permitting the storage of non-pickleable types.
- property behavior#
The
behaviorparameter passed into this ArrayBuilder’s constructor.- If a dict, this
behavioroverrides the globalak.behavior. Any keys in the global
ak.behaviorbut not thisbehaviorare still valid, but any keys in both are overridden by thisbehavior. Keys with a None value are equivalent to missing keys, so thisbehaviorcan effectively remove keys from the globalak.behavior.
- If a dict, this
If None, the Array defaults to the global
ak.behavior.
See
ak.behaviorfor a list of recognized key patterns and their meanings.
- tolist()#
Converts this Array into Python objects; same as
ak.to_list(but without the underscore, like NumPy’s tolist).
- to_list()#
Converts this Array into Python objects; same as
ak.to_list.
- to_numpy(allow_missing=True)#
Converts this Array into a NumPy array, if possible; same as
ak.to_numpy.
- property type#
The high-level type of the accumulated array; same as
ak.type.Note that the outermost element of an Array’s type is always an
ak.types.ArrayType, which specifies the number of elements in the array.The type of a
ak.contents.Content(fromak.Array.layout) is not wrapped by anak.types.ArrayType.
- property typestr#
The high-level type of this accumulated array, presented as a string.
- __len__()#
The current length of the accumulated array.
- __str__()#
- __repr__()#
- _repr(limit_cols)#
- show(limit_rows=20, limit_cols=80, *, type=False, named_axis=False, nbytes=False, backend=False, all=False, stream=STDOUT, formatter=None, precision=3)#
- Parameters:
limit_rows (int) – Maximum number of rows (lines) to use in the output.
limit_cols (int) – Maximum number of columns (characters wide).
type (bool) – If True, print the type as well. (Doesn’t count toward number of rows/lines limit.)
named_axis (bool) – If True, print the named axis as well. (Doesn’t count toward number of rows/lines limit.)
nbytes (bool) – If True, print the number of bytes as well. (Doesn’t count toward number of rows/lines limit.)
backend (bool) – If True, print the backend of the array as well. (Doesn’t count toward number of rows/lines limit.)
all (bool) – If True, print the ‘type’, ‘named axis’, ‘nbytes’, and ‘backend’ of the array. (Doesn’t count toward number of rows/lines limit.)
stream (object with a
write(str)method or None) – Stream to write the output to. If None, return a string instead of writing to a stream.formatter (Mapping or None) – Mapping of types/type-classes to string formatters. If None, use the default formatter.
Display the contents of the array builder within
limit_rowsandlimit_cols, using ellipsis (...) for hidden nested data.The
formatterargument controls the formatting of individual values, c.f. https://numpy.org/doc/stable/reference/generated/numpy.set_printoptions.html As Awkward Array does not implement strings as a NumPy dtype, thenumpystrkey is ignored; instead, a"bytes"and/or"str"key is considered when formatting string values, falling back upon"str_kind".This method takes a snapshot of the data and calls show on it, and a snapshot copies data.
- __array__(dtype=None, copy=None)#
Intercepts attempts to convert a #snapshot of this array into a NumPy array and either performs a conversion if possible or raises an error.
See
ak.Array.__array__for a more complete description.
- __arrow_array__(type=None)#
- property numba_type#
The type of this Array when it is used in Numba. It contains enough information to generate low-level code for accessing any element, down to the leaves.
See Numba documentation on types and signatures.
- __bool__()#
- snapshot()#
Converts the currently accumulated data into an
ak.Array.The currently accumulated data are copied into the new array.
- null()#
Appends a None value at the current position in the accumulated array.
- boolean(x)#
Appends a boolean value
xat the current position in the accumulated array.
- integer(x)#
Appends an integer
xat the current position in the accumulated array.
- real(x)#
Appends a floating point number
xat the current position in the accumulated array.
- complex(x)#
Appends a floating point number
xat the current position in the accumulated array.
- datetime(x)#
Appends a datetime value
xat the current position in the accumulated array.
- timedelta(x)#
Appends a timedelta value
xat the current position in the accumulated array.
- bytestring(x)#
Appends an unencoded string (raw bytes)
xat the current position in the accumulated array.
- string(x)#
Appends a UTF-8 encoded string
xat the current position in the accumulated array.
- begin_list()#
Begins filling a list; must be closed with #end_list.
For example,
>>> builder = ak.ArrayBuilder() >>> builder.begin_list() >>> builder.real(1.1) >>> builder.real(2.2) >>> builder.real(3.3) >>> builder.end_list() >>> builder.begin_list() >>> builder.end_list() >>> builder.begin_list() >>> builder.real(4.4) >>> builder.real(5.5) >>> builder.end_list()
produces
>>> builder.show() [[1.1, 2.2, 3.3], [], [4.4, 5.5]]
- end_list()#
Ends a list.
- begin_tuple(numfields)#
Begins filling a tuple with
numfieldsfields; must be closed with #end_tuple.For example,
>>> builder = ak.ArrayBuilder() >>> builder.begin_tuple(3) >>> builder.index(0).integer(1) >>> builder.index(1).real(1.1) >>> builder.index(2).string("one") >>> builder.end_tuple() >>> builder.begin_tuple(3) >>> builder.index(0).integer(2) >>> builder.index(1).real(2.2) >>> builder.index(2).string("two") >>> builder.end_tuple()
produces
>>> builder.show() [(1, 1.1, 'one'), (2, 2.2, 'two')]
- index(i)#
- Parameters:
i (int) – The tuple slot to fill.
This method also returns the
ak.ArrayBuilder, so that it can be chained with the value that fills the slot.Prepares to fill a tuple slot; see #begin_tuple for an example.
- end_tuple()#
Ends a tuple.
- begin_record(name=None)#
Begins filling a record with an optional
name; must be closed with #end_record.For example,
>>> builder = ak.ArrayBuilder() >>> builder.begin_record("points") >>> builder.field("x").real(1) >>> builder.field("y").real(1.1) >>> builder.end_record() >>> builder.begin_record("points") >>> builder.field("x").real(2) >>> builder.field("y").real(2.2) >>> builder.end_record()
produces
>>> builder.show() [{x: 1, y: 1.1}, {x: 2, y: 2.2}]
with type
>>> builder.type.show() 2 * points[ x: float64, y: float64 ]
The record type is named
"points"because its"__record__"parameter is set to that value:>>> builder.snapshot().layout.parameters {'__record__': 'points'}
The
"__record__"parameter can be used to add behavior to the records in the array, as described inak.Array,ak.Record, andak.behavior.
- field(key)#
- Parameters:
key (str) – The field key to fill.
This method also returns the
ak.ArrayBuilder, so that it can be chained with the value that fills the slot.Prepares to fill a field; see #begin_record for an example.
- end_record()#
Ends a record.
- append(obj)#
- Parameters:
obj – The data to append (None, bool, int, float, bytes, str, or anything recognized by
ak.from_iter).
Appends any type, which can be a shorthand for #null, #boolean, #integer, #real, #bytestring, or #string, but also an
ak.Arrayorak.Recordto reference values from an existing dataset, or any Python object to convert to Awkward Array.If
objis an iterable (including dict), this is equivalent toak.from_iterexcept that it fills an existingak.ArrayBuilder, rather than creating a new one.
- extend(obj)#
- Parameters:
obj (iterable) – Iterable of data to extend this ArrayBuilder with.
Appends every value from
obj.
- list()#
Context manager to prevent unpaired #begin_list and #end_list. The example in the #begin_list documentation can be rewritten as
>>> builder = ak.ArrayBuilder() >>> with builder.list(): ... builder.real(1.1) ... builder.real(2.2) ... builder.real(3.3) ... >>> with builder.list(): ... pass ... >>> with builder.list(): ... builder.real(4.4) ... builder.real(5.5) ...
to produce the same result.
>>> builder.show() [[1.1, 2.2, 3.3], [], [4.4, 5.5]]
Since context managers aren’t yet supported by Numba, this method can’t be used in Numba.
- tuple(numfields)#
Context manager to prevent unpaired #begin_tuple and #end_tuple. The example in the #begin_tuple documentation can be rewritten as
>>> builder = ak.ArrayBuilder() >>> with builder.tuple(3): ... builder.index(0).integer(1) ... builder.index(1).real(1.1) ... builder.index(2).string("one") ... >>> with builder.tuple(3): ... builder.index(0).integer(2) ... builder.index(1).real(2.2) ... builder.index(2).string("two") ...
to produce the same result.
>>> builder.show() [(1, 1.1, 'one'), (2, 2.2, 'two')]
Since context managers aren’t yet supported by Numba, this method can’t be used in Numba.
- record(name=None)#
Context manager to prevent unpaired #begin_record and #end_record. The example in the #begin_record documentation can be rewritten as
>>> builder = ak.ArrayBuilder() >>> with builder.record("points"): ... builder.field("x").real(1) ... builder.field("y").real(1.1) ... >>> with builder.record("points"): ... builder.field("x").real(2) ... builder.field("y").real(2.2) ...
to produce the same result.
>>> builder.show() [{x: 1, y: 1.1}, {x: 2, y: 2.2}]
Since context managers aren’t yet supported by Numba, this method can’t be used in Numba.
Classes#
|
|
|
|
|
|
|