ak.transform#
Defined in awkward.operations.ak_transform on line 27.
- ak.transform(transformation, array, *more_arrays, depth_context=None, lateral_context=None, allow_records=True, broadcast_parameters_rule='intersect', left_broadcast=True, right_broadcast=True, numpy_to_regular=False, regular_to_jagged=False, return_value='simplified', expect_return_value=False, highlevel=True, behavior=None, attrs=None)#
- Parameters:
transformation (callable) – Function to apply to each node of the array. See below for details.
array – Array-like data (anything
ak.to_layoutrecognizes), but not anak.Recordorak.record.Record.more_arrays – Additional arrays to be broadcasted together (with first
array) and used together in the transformation. See below for details.depth_context (None or dict) – User data to propagate through the transformation. New data added to
depth_contextis available to the entire subtree at which it is added, but no other subtrees. For example, data added during the transformation will not be in the originaldepth_contextafter the transformation.lateral_context (None or dict) – User data to propagate through the transformation. New data added to
lateral_contextis available at any later step of the depth-first walk over the tree, including other subtrees. For example, data added during the transformation will be in the originallateral_contextafter the transformation.allow_records (bool) – If False and the recursive walk encounters any
ak.contents.RecordArraynodes, an error is raised.broadcast_parameters_rule (str) – Rule for broadcasting parameters, one of: -
"intersect"-"all_or_nothing"-"one_to_one"-"none"left_broadcast (bool) – If
more_arraysare provided, the parameter determines whether the arrays are left-broadcasted, which is Awkward-like broadcasting.right_broadcast (bool) – If
more_arraysare provided, the parameter determines whether the arrays are right-broadcasted, which is NumPy-like broadcasting.numpy_to_regular (bool) – If True, multidimensional
ak.contents.NumpyArraynodes are converted intoak.contents.RegularArraynodes before callingtransformation.regular_to_jagged (bool) – If True, regular-type lists are converted into variable-length lists before calling
transformation.return_value (
"none","original", ``"simplified") – If"none", the output of this function is None; if"original", untouched nodes surrounding the ones replaced by thetransformationare returned in their original state; if"simplified", theak.Content.simplifiedconstructor is used on the surrounding nodes to ensure that option-type and union-type nodes are not nested inappropriately. Note that ifreturn_valueis"none", the only way to get information out of this function is through thelateral_context.expect_return_value (bool) – If True, raise a
RuntimeErrorif the transformer does not terminate the recursion.highlevel (bool) – If True, return an
ak.Array; otherwise, return a low-levelak.contents.Contentsubclass.behavior (None or dict) – Custom
ak.behaviorfor the output array, if high-level.attrs (None or dict) – Custom attributes for the output array, if high-level.
Applies a
transformationfunction to every node of an Awkward array or arrays to either obtain a transformed copy or extract data from a walk over the arrays’ low-level layout nodes.This is a public interface to the infrastructure that is used to implement most Awkward Array operations. As such, it’s very powerful, but low-level.
Here is a “hello world” example:
>>> def say_hello(layout, depth, **kwargs): ... print("Hello", type(layout).__name__, "at", depth) ... >>> array = ak.Array([[1.1, 2.2, "three"], [], None, [4.4, 5.5]]) >>> ak.transform(say_hello, array, return_value="none") Hello IndexedOptionArray at 1 Hello ListOffsetArray at 1 Hello UnionArray at 2 Hello NumpyArray at 2 Hello ListOffsetArray at 2 Hello NumpyArray at 3
In the above,
say_hellois called on every node of thearray, which has a lot of nodes because it has nested lists, missing data, and a union of different types. The data types are low-level “layouts,” subclasses ofak.contents.Content, rather than high-levelak.Array.The primary purpose of this function is to allow you to edit one level of structure without having to worry about what it’s embedded in. Suppose, for instance, you want to apply NumPy’s
np.roundfunction to numerical data, regardless of what lists or other structures they’re embedded in.The return value must be a subclass of
ak.contents.Content(to replace the array node) or None (to leave the array node unchanged).>>> def rounder(layout, **kwargs): ... if layout.is_numpy: ... return ak.contents.NumpyArray( ... np.round(layout.data).astype(np.int32) ... ) ... >>> array = ak.Array( ... [[[[[1.1, 2.2, 3.3], []], None], []], ... [[[[4.4, 5.5]]]]] ... ) >>> ak.transform(rounder, array).show(type=True) type: 2 * var * var * option[var * var * int32] [[[[[1, 2, 3], []], None], []], [[[[4, 6]]]]]
If you pass multiple arrays to this function (
more_arrays), those arrays will be broadcasted and all inputs, at the same level of depth and structure, will be passed to thetransformationfunction as a group.Here is an example with broadcasting:
>>> def combine(layouts, **kwargs): ... assert len(layouts) == 2 ... if layouts[0].is_numpy and layouts[1].is_numpy: ... return ak.contents.NumpyArray( ... layouts[0].data + 10 * layouts[1].data ... ) ... >>> array1 = ak.Array([[1, 2, 3], [], None, [4, 5]]) >>> array2 = ak.Array([1, 2, 3, 4]) >>> ak.transform(combine, array1, array2) <Array [[11, 12, 13], [], None, [44, 45]] type='4 * option[var * int64]'>
The
1and4fromarray2are broadcasted to the[1, 2, 3]and the[4, 5]ofarray1, and the other elements disappear because they are broadcasted with an empty list and a missing value. Note that the first argument of thistransformationfunction is a list of layouts, not a single layout. There are always 2 layouts because 2 arrays were passed toak.transform.Signature of the transformation function#
If there is only one array, the first argument of
transformationis aak.contents.Contentinstance. If there are multiple arrays (more_arrays), the first argument is a list ofak.contents.Contentinstances.All other arguments can be absorbed into a
**kwargsbecause they will always be passed to your function by keyword. They are- depth (int): The current list depth, where 1 is the outermost array and
higher numbers are deeper levels of list nesting. This does not count nesting of other data structures, such as option-types and records.
- depth_context (None or dict): Any user-specified data. You can add to
this dict during transformation; changes would only be seen in the subtree’s nodes.
- lateral_context (None or dict): Any user-specified data. You can add to
this dict during transformation; changes would be seen in any node visited later in the depth-first search.
- continuation (callable): Zero-argument function that continues the
recursion from this point in the walk, so that you can perform post-processing instead of pre-processing.
For completeness, the following arguments are also passed to
transformation, but you usually won’t need them:- behavior (None or dict): Behavior that would be attached to the output
array(s) if
highlevel.
- backend (array library / kernel library shim): Handle to the NumPy
library, CuPy, etc., depending on the type of arrays.
options (dict): Options provided to
ak.transform.
If there is only one array, the
transformationfunction must either return None or return anak.contents.Content.If there are multiple arrays (
more_arrays), then the transformation function may return one array or a tuple of arrays. (The preferred type is a tuple, even if it has length 1.)The final return value of
ak.transformis a new array or tuple of arrays constructed by replacing nodes whentransformationreturns aak.contents.Contentor tuple ofak.contents.Content, and leaving nodes unchanged whentransformationreturns None. Iftransformationreturns length-1 tuples, the final output is an array, not a length-1 tuple.If
return_valueis"none",ak.transformreturns None. This is useful for functions that return non-array data throughlateral_context. The other two choices,"original"and"simplified", determine how untouched array nodes, the ones that are _not_ modified by thetransformationfunction, are returned. With"original", they are returned without modification, which might result in illegal combinations of option-type and union-type, which would raise an error. With"simplified", the surrounding array nodes are simplified upon reconstruction. For example, if thetransformationputs a newak.contents.ByteMaskedArrayinside an existingak.contents.ByteMaskedArray, the two will be consolidated into a single option-type array node.Contexts#
The
depth_contextandlateral_contextallow you to pass your own data into the transformation as well as communicate between calls oftransformationon different nodes. Thedepth_contextlimits this communication to descendants of the subtree in which the data were added;lateral_contextdoes not have this limit. (depth_contextis shallow-copied at each node during descent;lateral_contextis never copied.)For example, consider this array:
>>> array = ak.Array([ ... [{"x": [1], "y": 1.1}, {"x": [1, 2], "y": 2.2}, {"x": [1, 2, 3], "y": 3.3}], ... [], ... [{"x": [1, 2, 3, 4], "y": 4.4}, {"x": [1, 2, 3, 4, 5], "y": 5.5}], ... ])
If we accumulate node type names using
depth_context,>>> def crawl(layout, depth_context, **kwargs): ... depth_context["types"] = depth_context["types"] + (type(layout).__name__,) ... print(depth_context["types"]) ... >>> context = {"types": ()} >>> ak.transform(crawl, array, depth_context=context, return_value="none") ('ListOffsetArray',) ('ListOffsetArray', 'RecordArray') ('ListOffsetArray', 'RecordArray', 'ListOffsetArray') ('ListOffsetArray', 'RecordArray', 'ListOffsetArray', 'NumpyArray') ('ListOffsetArray', 'RecordArray', 'NumpyArray') >>> context {'types': ()}
The data in
depth_context["types"]represents a path from the root of the tree to the current node. There is never, for instance, more than one leaf-type (ak.contents.NumpyArray) in the tuple. Also, thecontextis unchanged outside of the function.On the other hand, if we do the same with a
lateral_context,>>> def crawl(layout, lateral_context, **kwargs): ... lateral_context["types"] = lateral_context["types"] + (type(layout).__name__,) ... print(lateral_context["types"]) ... >>> context = {"types": ()} >>> ak.transform(crawl, array, lateral_context=context, return_value="none") ('ListOffsetArray',) ('ListOffsetArray', 'RecordArray') ('ListOffsetArray', 'RecordArray', 'ListOffsetArray') ('ListOffsetArray', 'RecordArray', 'ListOffsetArray', 'NumpyArray') ('ListOffsetArray', 'RecordArray', 'ListOffsetArray', 'NumpyArray', 'NumpyArray') >>> context {'types': ('ListOffsetArray', 'RecordArray', 'ListOffsetArray', 'NumpyArray', 'NumpyArray')}
The data accumulate through the walk over the tree. There are two leaf-types (
ak.contents.NumpyArray) in the tuple because this tree has two leaves. The data are even available outside of the function, solateral_contextcan be paired withreturn_value="none"to extract non-array data, rather than transforming the array.The visitation order is stable: a recursive walk always proceeds through the same tree in the same order.
Continuation#
The
transformationfunction is given an input, untransformed layout or layouts. Some algorithms need to perform a correction on transformed outputs, socontinuation()can be called at any point to continue descending but obtain the transformed result.For example, this function inserts an option-type at every level of an array:
>>> def insert_optiontype(layout, continuation, **kwargs): ... return ak.contents.UnmaskedArray(continuation()) ... >>> array = ak.Array([[[[[1.1, 2.2, 3.3], []]], []], [[[[4.4, 5.5]]]]]) >>> array.type.show() 2 * var * var * var * var * float64
>>> array2 = ak.transform(insert_optiontype, array) >>> array2.type.show() 2 * option[var * option[var * option[var * option[var * ?float64]]]]
In the original array, every node is a
ak.contents.ListOffsetArrayexcept the leaf, which is aak.contents.NumpyArray. The call tocontinuation()returns aak.contents.ListOffsetArraywith its contents transformed, which is the argument of a newak.contents.UnmaskedArray.To see this process as it happens, we can add
printstatements to the function.>>> def insert_optiontype(input, continuation, **kwargs): ... print("before", input.form.type) ... output = ak.contents.UnmaskedArray(continuation()) ... print("after ", output.form.type) ... return output ... >>> ak.transform(insert_optiontype, array) before var * var * var * var * float64 before var * var * var * float64 before var * var * float64 before var * float64 before float64 after ?float64 after option[var * ?float64] after option[var * option[var * ?float64]] after option[var * option[var * option[var * ?float64]]] after option[var * option[var * option[var * option[var * ?float64]]]] <Array [[[[[1.1, ..., 3.3], ...]], ...], ...] type='2 * option[var * option...'>
Broadcasting#
When multiple arrays are provided (
more_arrays), all of the arrays are broadcasted during the walk so that thetransformationfunction is eventually provided with a list of layouts that have compatible types (for mathematical operations, etc.).For instance, given these two arrays:
>>> array1 = ak.Array([[1, 2, 3], [], None, [4, 5]]) >>> array2 = ak.Array([10, 20, 30, 40])
The following single-array function shows the nodes encountered when walking down either one of them.
>>> def one_array(layout, **kwargs): ... print(type(layout).__name__) ... >>> ak.transform(one_array, array1, return_value="none") IndexedOptionArray ListOffsetArray NumpyArray >>> ak.transform(one_array, array2, return_value="none") NumpyArray
The first array has three nested nodes; the second has only one node.
However, when the following two-array function is applied,
>>> def two_arrays(layouts, **kwargs): ... assert len(layouts) == 2 ... print(type(layouts[0]).__name__, ak.to_list(layouts[0])) ... print(type(layouts[1]).__name__, ak.to_list(layouts[1])) ... print() ... >>> ak.transform(two_arrays, array1, array2) RegularArray [[[1, 2, 3], [], None, [4, 5]]] RegularArray [[10, 20, 30, 40]]
IndexedOptionArray [[1, 2, 3], [], None, [4, 5]] NumpyArray [10, 20, 30, 40]
ListArray [[1, 2, 3], [], [4, 5]] NumpyArray [10, 20, 40]
NumpyArray [1, 2, 3, 4, 5] NumpyArray [10, 10, 10, 40, 40]
- (<Array [[1, 2, 3], [], None, [4, 5]] type=’4 * option[var * int64]’>,
<Array [[10, 10, 10], [], None, [40, 40]] type=’4 * option[var * int64]’>)
The incompatible types of the two arrays eventually becomes the same type by duplicating and removing values wherever necessary. If you cannot perform an operation on a
ak.contents.ListArrayand aak.contents.NumpyArray, wait for a later iteration, in which both will beak.contents.NumpyArray(if the original arrays are broadcastable).The return value, without transformation, is the same as what
ak.broadcast_arrayswould return. Seeak.broadcast_arraysfor an explanation ofleft_broadcastandright_broadcast.Broadcasting Parameters#
When broadcasting multiple arrays with parameters, there are different ways of assigning parameters to the outputs. The assignment of array parameters happens at every level above the transformation action.
The method of parameter assignment used by the broadcasting routine is controlled by the
broadcast_parameters_ruleoption, which can take one of the following values:"intersect"The parameters of each output array will correspond to the intersection of the parameters from each of the input arrays.
"all_or_nothing"If the parameters of the input arrays are all equal, then they will be used for each output array. Otherwise, the output arrays will not be given parameters.
"one_to_one"If the number of output arrays matches the number of input arrays, then the output arrays are given the parameters of the input arrays. Otherwise, a ValueError is raised.
"none"The output arrays will not be given parameters.
Performance Tip#
ak.transformwill traverse the layout of (potentially multiple) arrays once. This can be useful if one wants to apply a batch of transformations in one single layout traversal. Traversing the layout multiple times can be inefficient.Consider the following example:
>>> def batch_of_operations(array): ... return np.sqrt(np.sin(array) + 1) - 1 ... >>> def apply_batch_of_operations(layout, **kwargs): ... if layout.is_numpy: ... return ak.contents.NumpyArray( ... batch_of_operations(layout.data) ... ) ... >>> array = ak.Array( ... [[[[[1.1, 2.2, 3.3], []], None], []], ... [[[[4.4, 5.5]]]]] ... ) >>> %timeit ak.transform(apply_batch_of_operations, array) ... 68.5 μs ± 663 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each) >>> %timeit batch_of_operations(array) ... 1.07 ms ± 39.1 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
The first
%timeitcell shows the time it takes to apply the batch of operations usingak.transform, which allows to apply the operations in one single traversal of the layout. The second%timeitcell shows the runtime of applying the operations directly to the array, which traverses the layout multiple times. To be more explicit: one layout traversal for each operation.See also:
ak.is_validandak.valid_whento check the validity of transformed outputs.