ak.Array#
Defined in awkward.highlevel on line 137.
- class ak.Array(data, *, behavior=None, with_name=None, check_valid=False, backend=None, attrs=None, named_axis=None)#
- Parameters:
data (
ak.contents.Content,ak.Array,np.ndarray,cp.ndarray,pyarrow.*, str, dict, or iterable) –- Data to wrap or convert into an array.
If a NumPy array, the regularity of its dimensions is preserved and the data are viewed, not copied.
CuPy arrays are treated the same way as NumPy arrays except that they default to
backend="cuda", rather thanbackend="cpu".If a pyarrow object, calls
ak.from_arrow, preserving as much metadata as possible, usually zero-copy.If a dict of str → columns, combines the columns into an array of records (like Pandas’s DataFrame constructor).
If a string, the data are assumed to be JSON.
If an iterable, calls
ak.from_iter, which assumes all dimensions have irregular lengths.
behavior (None or dict) – Custom
ak.behaviorfor this Array only.with_name (None or str) – Gives tuples and records a name that can be used to override their behavior (see below).
check_valid (bool) – If True, verify that the
layoutis valid.backend (None,
"cpu","jax","cuda") – If"cpu", the Array will be placed in main memory for use with other"cpu"Arrays and Records; if"cuda", the Array will be placed in GPU global memory using CUDA; if"jax", the structure is copied to the CPU for use with JAX. if None, thedataare left untouched.
High-level array that can contain data of any type.
For most users, this is the only class in Awkward Array that matters: it is the entry point for data analysis with an emphasis on usability. It intentionally has a minimum of methods, preferring standalone functions like:
ak.num(array1) ak.combinations(array1) ak.cartesian([array1, array2]) ak.zip({"x": array1, "y": array2, "z": array3})
instead of bound methods like:
array1.num() array1.combinations() array1.cartesian([array2, array3]) array1.zip(...) # ?
because its namespace is valuable for domain-specific parameters and functionality. For example, if records contain a field named
"num", they can be accessed as:array1.num
instead of:
array1["num"]
without any confusion or interference from
ak.num. The same is true for domain-specific methods that have been attached to the data. For instance, an analysis of mailing addresses might have a function that computes zip codes, which can be attached to the data with a method like:latlon.zip()
without any confusion or interference from
ak.zip. Custom methods like this can be added withak.behavior, and so the namespace of Array attributes must be kept clear for such applications.See also
ak.Record.Interfaces to other libraries#
NumPy#
When NumPy universal functions (ufuncs) are applied to an ak.Array, they are passed through the Awkward data structure, applied to the numerical data at its leaves, and the output maintains the original structure.
For example,
>>> array = ak.Array([[1, 4, 9], [], [16, 25]]) >>> np.sqrt(array) <Array [[1, 2, 3], [], [4, 5]] type='3 * var * float64'>
See also
ak.Array.__array_ufunc__.Some NumPy functions other than ufuncs are also handled properly in NumPy >= 1.17 (see NEP 18) and if an Awkward override exists. That is,:
np.concatenate
can be used on an Awkward Array because:
ak.concatenate
exists.
Pandas#
Ragged arrays (list type) can be converted into Pandas MultiIndex rows and nested records can be converted into MultiIndex columns. If the Awkward Array has only one “branch” of nested lists (i.e. different record fields do not have different-length lists, but a single chain of lists-of-lists is okay), then it can be losslessly converted into a single DataFrame. Otherwise, multiple DataFrames are needed, though they can be merged (with a loss of information).
The
ak.to_dataframefunction performs this conversion; ifhow=None, it returns a list of DataFrames; otherwise,howis passed topd.mergewhen merging the resultant DataFrames.Numba#
Arrays can be used in Numba: they can be passed as arguments to a Numba-compiled function or returned as return values. The only limitation is that Awkward Arrays cannot be created inside the Numba-compiled function; to make outputs, consider
ak.ArrayBuilder.Arrow#
Arrays are convertible to and from Apache Arrow, a standard for representing nested data structures in columnar arrays. See
ak.to_arrowandak.from_arrow.JAX#
Derivatives of a calculation on an
ak.Array(s) can be calculated with JAX, but only if the array functions inak/numpyare used, not the functions in thejaxlibrary directly (apart from e.g.jax.grad).Like NumPy ufuncs, the function and its derivatives are evaluated on the numeric leaves of the data structure, maintaining structure in the output.
- _cpp_type = None#
- _layout#
- _behavior = None#
- _attrs = None#
- _histogram_module_#
- _update_class()#
- property attrs: awkward._attrs.Attrs#
The mapping containing top-level metadata, which is serialised with the array during pickling.
Keys prefixed with @ are identified as “transient” attributes which are discarded prior to pickling, permitting the storage of non-pickleable types.
- property layout#
The composable #ak.contents.Content elements that determine how this Array is structured.
This may be considered a “low-level” view, as it distinguishes between arrays that have the same logical meaning (i.e. same JSON output and high-level #type) but different
- node types, such as #ak.contents.ListArray and
#ak.contents.ListOffsetArray,
integer type specialization, such as int64 vs int32
or specific values, such as gaps in a #ak.contents.ListArray.
The #ak.contents.Content elements are fully composable, whereas an Array is not; the high-level Array is a single-layer “shell” around its layout.
Layouts are rendered as XML instead of a nested list. For example, the following array:
ak.Array([[1.1, 2.2, 3.3], [], [4.4, 5.5]])
is presented as:
<Array [[1.1, 2.2, 3.3], [], [4.4, 5.5]] type='3 * var * float64'>
but array.layout is presented as:
<ListOffsetArray len='3'> <offsets><Index dtype='int64' len='4'> [0 3 3 5] </Index></offsets> <content> <NumpyArray dtype='float64' len='5'>[1.1 2.2 3.3 4.4 5.5]</NumpyArray> </content> </ListOffsetArray>
(with truncation for large arrays).
- property behavior#
The behavior parameter passed into this Array’s constructor.
- If a dict, this behavior overrides the global #ak.behavior.
Any keys in the global #ak.behavior but not this behavior are still valid, but any keys in both are overridden by this behavior. Keys with a None value are equivalent to missing keys, so this behavior can effectively remove keys from the global #ak.behavior.
If None, the Array defaults to the global #ak.behavior.
See #ak.behavior for a list of recognized key patterns and their meanings.
- property named_axis: awkward._namedaxis.AxisMapping#
- property mask#
Whereas:
array[array_of_booleans]
removes elements from array in which array_of_booleans is False,:
array.mask[array_of_booleans]
returns data with the same length as the original array but False values in array_of_booleans are mapped to None. Such an output can be used in mathematical expressions with the original array because they are still aligned.
See <<<filtering>>> and #ak.mask.
- tolist()#
Converts this Array into Python objects; same as #ak.to_list (but without the underscore, like NumPy’s [tolist](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.tolist.html)).
- to_list()#
Converts this Array into Python objects; same as #ak.to_list.
- to_numpy(allow_missing=True)#
Converts this Array into a NumPy array, if possible; same as #ak.to_numpy.
- property nbytes#
The total number of bytes in all the #ak.index.Index, and #ak.contents.NumpyArray buffers in this array tree.
It does not count buffers that must be kept in memory because of ownership, but are not directly used in the array. Nor does it count the (small) Python objects that reference the (large) array buffers.
- property ndim#
Number of dimensions (nested variable-length lists and/or regular arrays) before reaching a numeric type or a record.
There may be nested lists within the record, as field values, but this number of dimensions does not count those.
(Some fields may have different depths than others, which is why they are not counted.)
- property fields#
List of field names or tuple slot numbers (as strings) of the outermost record or tuple in this array.
If the array contains nested records, only the fields of the outermost record are shown. If it contains tuples instead of records, its fields are string representations of integers, such as “0”, “1”, “2”, etc. The records or tuples may be within multiple layers of nested lists.
If the array contains neither tuples nor records, it is an empty list.
See also #ak.fields.
- property is_tuple#
If True, the top-most record structure has no named fields, i.e. it’s a tuple.
- _ipython_key_completions_()#
- property type#
The high-level type of this Array; same as #ak.type.
Note that the outermost element of an Array’s type is always an #ak.types.ArrayType, which specifies the number of elements in the array.
The type of a #ak.contents.Content (from #ak.Array.layout) is not wrapped by an #ak.types.ArrayType.
- property typestr#
The high-level type of this Array, presented as a string.
- _repr(limit_cols)#
- show(limit_rows=20, limit_cols=80, *, type=False, named_axis=False, nbytes=False, backend=False, all=False, stream=STDOUT, formatter=None, precision=3)#
- Parameters:
limit_rows (int) – Maximum number of rows (lines) to use in the output.
limit_cols (int) – Maximum number of columns (characters wide).
type (bool) – If True, print the type as well. (Doesn’t count toward number of rows/lines limit.)
named_axis (bool) – If True, print the named axis as well. (Doesn’t count toward number of rows/lines limit.)
nbytes (bool) – If True, print the number of bytes as well. (Doesn’t count toward number of rows/lines limit.)
backend (bool) – If True, print the backend of the array as well. (Doesn’t count toward number of rows/lines limit.)
all (bool) – If True, print the ‘type’, ‘named axis’, ‘nbytes’, and ‘backend’ of the array. (Doesn’t count toward number of rows/lines limit.)
stream (object with a
write(str)method or None) – Stream to write the output to. If None, return a string instead of writing to a stream.formatter (Mapping or None) – Mapping of types/type-classes to string formatters. If None, use the default formatter.
Display the contents of the array within limit_rows and limit_cols, using ellipsis (…) for hidden nested data.
The formatter argument controls the formatting of individual values, c.f. https://numpy.org/doc/stable/reference/generated/numpy.set_printoptions.html As Awkward Array does not implement strings as a NumPy dtype, the numpystr key is ignored; instead, a “bytes” and/or “str” key is considered when formatting string values, falling back upon “str_kind”.
- _repr_mimebundle_(include=None, exclude=None)#
- numba_type()#
The type of this Array when it is used in Numba. It contains enough information to generate low-level code for accessing any element, down to the leaves.
See [Numba documentation](https://numba.pydata.org/numba-doc/dev/reference/types.html) on types and signatures.
- cpp_type()#
The C++ type of this Array when it is used in cppyy.:
cpp_type (None or str): Generated on demand when the Array needs to be passed to a C++ (possibly templated) function defined by a `cppyy` compiler.See [cppyy documentation](https://cppyy.readthedocs.io/en/latest/index.html) on types and signatures.
Classes#
|