ak.transform ------------ .. py:module: ak.transform Defined in `awkward.operations.ak_transform `__ on `line 14 `__. .. py:function:: ak.transform(transformation, array, *more_arrays, depth_context=None, lateral_context=None, allow_records=True, broadcast_parameters_rule='intersect', left_broadcast=True, right_broadcast=True, numpy_to_regular=False, regular_to_jagged=False, return_value='simplified', highlevel=True, behavior=None) :param transformation: Function to apply to each node of the array. See below for details. :type transformation: callable :param array: Array-like data (anything :py:obj:`ak.to_layout` recognizes), but not an :py:obj:`ak.Record` or :py:obj:`ak.record.Record`. :param more_arrays: Additional arrays to be broadcasted together (with first ``array``) and used together in the transformation. See below for details. :param depth_context: User data to propagate through the transformation. New data added to ``depth_context`` is available to the entire *subtree* at which it is added, but no other *subtrees*. For example, data added during the transformation will not be in the original ``depth_context`` after the transformation. :type depth_context: None or dict :param lateral_context: User data to propagate through the transformation. New data added to ``lateral_context`` is available at any later step of the depth-first walk over the tree, including *other subtrees*. For example, data added during the transformation will be in the original ``lateral_context`` after the transformation. :type lateral_context: None or dict :param allow_records: If False and the recursive walk encounters any :py:obj:`ak.contents.RecordArray` nodes, an error is raised. :type allow_records: bool :param broadcast_parameters_rule: Rule for broadcasting parameters, one of: - ``"intersect"`` - ``"all_or_nothing"`` - ``"one_to_one"`` - ``"none"`` :type broadcast_parameters_rule: str :param left_broadcast: If ``more_arrays`` are provided, the parameter determines whether the arrays are left-broadcasted, which is Awkward-like broadcasting. :type left_broadcast: bool :param right_broadcast: If ``more_arrays`` are provided, the parameter determines whether the arrays are right-broadcasted, which is NumPy-like broadcasting. :type right_broadcast: bool :param numpy_to_regular: If True, multidimensional :py:obj:`ak.contents.NumpyArray` nodes are converted into :py:obj:`ak.contents.RegularArray` nodes before calling ``transformation``. :type numpy_to_regular: bool :param regular_to_jagged: If True, regular-type lists are converted into variable-length lists before calling ``transformation``. :type regular_to_jagged: bool :param return_value: this function is None; if ``"original"``, untouched nodes surrounding the ones replaced by the ``transformation`` are returned in their original state; if ``"simplified"``, the :py:obj:`ak.Content.simplified` constructor is used on the surrounding nodes to ensure that option-type and union-type nodes are not nested inappropriately. Note that if ``return_value`` is ``"none"``, the only way to get information out of this function is through the ``lateral_context``. :type return_value: ``"none"``, ``"original", ``"simplified"`` :param highlevel: If True, return an :py:obj:`ak.Array`; otherwise, return a low-level :py:obj:`ak.contents.Content` subclass. :type highlevel: bool :param behavior: Custom :py:obj:`ak.behavior` for the output array, if high-level. :type behavior: None or dict Applies a ``transformation`` function to every node of an Awkward array or arrays to either obtain a transformed copy or extract data from a walk over the arrays' low-level layout nodes. This is a public interface to the infrastructure that is used to implement most Awkward Array operations. As such, it's very powerful, but low-level. Here is a "hello world" example: .. code-block:: python >>> def say_hello(layout, depth, **kwargs): ... print("Hello", type(layout).__name__, "at", depth) ... >>> array = ak.Array([[1.1, 2.2, "three"], [], None, [4.4, 5.5]]) >>> ak.transform(say_hello, array, return_value="none") Hello IndexedOptionArray at 1 Hello ListOffsetArray at 1 Hello UnionArray at 2 Hello NumpyArray at 2 Hello ListOffsetArray at 2 Hello NumpyArray at 3 In the above, ``say_hello`` is called on every node of the ``array``, which has a lot of nodes because it has nested lists, missing data, and a union of different types. The data types are low-level "layouts," subclasses of :py:obj:`ak.contents.Content`, rather than high-level :py:obj:`ak.Array`. The primary purpose of this function is to allow you to edit one level of structure without having to worry about what it's embedded in. Suppose, for instance, you want to apply NumPy's ``np.round`` function to numerical data, regardless of what lists or other structures they're embedded in. The return value must be a subclass of :py:obj:`ak.contents.Content` (to replace the array node) or None (to leave the array node unchanged). .. code-block:: python >>> def rounder(layout, **kwargs): ... if layout.is_numpy: ... return ak.contents.NumpyArray( ... np.round(layout.data).astype(np.int32) ... ) ... >>> array = ak.Array( ... [[[[[1.1, 2.2, 3.3], []], None], []], ... [[[[4.4, 5.5]]]]] ... ) >>> ak.transform(rounder, array).show(type=True) type: 2 * var * var * option[var * var * int32] [[[[[1, 2, 3], []], None], []], [[[[4, 6]]]]] If you pass multiple arrays to this function (``more_arrays``), those arrays will be broadcasted and all inputs, at the same level of depth and structure, will be passed to the ``transformation`` function as a group. Here is an example with broadcasting: .. code-block:: python >>> def combine(layouts, **kwargs): ... assert len(layouts) == 2 ... if layouts[0].is_numpy and layouts[1].is_numpy: ... return ak.contents.NumpyArray( ... layouts[0].data + 10 * layouts[1].data ... ) ... >>> array1 = ak.Array([[1, 2, 3], [], None, [4, 5]]) >>> array2 = ak.Array([1, 2, 3, 4]) >>> ak.transform(combine, array1, array2) The ``1`` and ``4`` from ``array2`` are broadcasted to the ``[1, 2, 3]`` and the ``[4, 5]`` of ``array1``, and the other elements disappear because they are broadcasted with an empty list and a missing value. Note that the first argument of this ``transformation`` function is a *list* of layouts, not a single layout. There are always 2 layouts because 2 arrays were passed to :py:obj:`ak.transform`. Signature of the transformation function ======================================== If there is only one array, the first argument of ``transformation`` is a :py:obj:`ak.contents.Content` instance. If there are multiple arrays (``more_arrays``), the first argument is a list of :py:obj:`ak.contents.Content` instances. All other arguments can be absorbed into a ``**kwargs`` because they will always be passed to your function by keyword. They are * depth (int): The current list depth, where 1 is the outermost array and higher numbers are deeper levels of list nesting. This does not count nesting of other data structures, such as option-types and records. * depth_context (None or dict): Any user-specified data. You can add to this dict during transformation; changes would only be seen in the subtree's nodes. * lateral_context (None or dict): Any user-specified data. You can add to this dict during transformation; changes would be seen in any node visited later in the depth-first search. * continuation (callable): Zero-argument function that continues the recursion from this point in the walk, so that you can perform post-processing instead of pre-processing. For completeness, the following arguments are also passed to ``transformation``, but you usually won't need them: * behavior (None or dict): Behavior that would be attached to the output array(s) if ``highlevel``. * backend (array library / kernel library shim): Handle to the NumPy library, CuPy, etc., depending on the type of arrays. * options (dict): Options provided to :py:obj:`ak.transform`. If there is only one array, the ``transformation`` function must either return None or return an :py:obj:`ak.contents.Content`. If there are multiple arrays (``more_arrays``), then the transformation function may return one array or a tuple of arrays. (The preferred type is a tuple, even if it has length 1.) The final return value of :py:obj:`ak.transform` is a new array or tuple of arrays constructed by replacing nodes when ``transformation`` returns a :py:obj:`ak.contents.Content` or tuple of :py:obj:`ak.contents.Content`, and leaving nodes unchanged when ``transformation`` returns None. If ``transformation`` returns length-1 tuples, the final output is an array, not a length-1 tuple. If ``return_value`` is ``"none"``, :py:obj:`ak.transform` returns None. This is useful for functions that return non-array data through ``lateral_context``. The other two choices, ``"original"`` and ``"simplified"``, determine how untouched array nodes, the ones that are _not_ modified by the ``transformation`` function, are returned. With ``"original"``, they are returned without modification, which might result in illegal combinations of option-type and union-type, which would raise an error. With ``"simplified"``, the surrounding array nodes are simplified upon reconstruction. For example, if the ``transformation`` puts a new :py:obj:`ak.contents.ByteMaskedArray` inside an existing :py:obj:`ak.contents.ByteMaskedArray`, the two will be consolidated into a single option-type array node. Contexts ======== The ``depth_context`` and ``lateral_context`` allow you to pass your own data into the transformation as well as communicate between calls of ``transformation`` on different nodes. The ``depth_context`` limits this communication to descendants of the subtree in which the data were added; ``lateral_context`` does not have this limit. (``depth_context`` is shallow-copied at each node during descent; ``lateral_context`` is never copied.) For example, consider this array: .. code-block:: python >>> array = ak.Array([ ... [{"x": [1], "y": 1.1}, {"x": [1, 2], "y": 2.2}, {"x": [1, 2, 3], "y": 3.3}], ... [], ... [{"x": [1, 2, 3, 4], "y": 4.4}, {"x": [1, 2, 3, 4, 5], "y": 5.5}], ... ]) If we accumulate node type names using ``depth_context``, .. code-block:: python >>> def crawl(layout, depth_context, **kwargs): ... depth_context["types"] = depth_context["types"] + (type(layout).__name__,) ... print(depth_context["types"]) ... >>> context = {"types": ()} >>> ak.transform(crawl, array, depth_context=context, return_value="none") ('ListOffsetArray',) ('ListOffsetArray', 'RecordArray') ('ListOffsetArray', 'RecordArray', 'ListOffsetArray') ('ListOffsetArray', 'RecordArray', 'ListOffsetArray', 'NumpyArray') ('ListOffsetArray', 'RecordArray', 'NumpyArray') >>> context {'types': ()} The data in ``depth_context["types"]`` represents a path from the root of the tree to the current node. There is never, for instance, more than one leaf-type (:py:obj:`ak.contents.NumpyArray`) in the tuple. Also, the ``context`` is unchanged outside of the function. On the other hand, if we do the same with a ``lateral_context``, .. code-block:: python >>> def crawl(layout, lateral_context, **kwargs): ... lateral_context["types"] = lateral_context["types"] + (type(layout).__name__,) ... print(lateral_context["types"]) ... >>> context = {"types": ()} >>> ak.transform(crawl, array, lateral_context=context, return_value="none") ('ListOffsetArray',) ('ListOffsetArray', 'RecordArray') ('ListOffsetArray', 'RecordArray', 'ListOffsetArray') ('ListOffsetArray', 'RecordArray', 'ListOffsetArray', 'NumpyArray') ('ListOffsetArray', 'RecordArray', 'ListOffsetArray', 'NumpyArray', 'NumpyArray') >>> context {'types': ('ListOffsetArray', 'RecordArray', 'ListOffsetArray', 'NumpyArray', 'NumpyArray')} The data accumulate through the walk over the tree. There are two leaf-types (:py:obj:`ak.contents.NumpyArray`) in the tuple because this tree has two leaves. The data are even available outside of the function, so ``lateral_context`` can be paired with ``return_value="none"`` to extract non-array data, rather than transforming the array. The visitation order is stable: a recursive walk always proceeds through the same tree in the same order. Continuation ============ The ``transformation`` function is given an input, untransformed layout or layouts. Some algorithms need to perform a correction on transformed outputs, so ``continuation()`` can be called at any point to continue descending but obtain the transformed result. For example, this function inserts an option-type at every level of an array: .. code-block:: python >>> def insert_optiontype(layout, continuation, **kwargs): ... return ak.contents.UnmaskedArray(continuation()) ... >>> array = ak.Array([[[[[1.1, 2.2, 3.3], []]], []], [[[[4.4, 5.5]]]]]) >>> array.type.show() 2 * var * var * var * var * float64 >>> array2 = ak.transform(insert_optiontype, array) >>> array2.type.show() 2 * option[var * option[var * option[var * option[var * ?float64]]]] In the original array, every node is a :py:obj:`ak.contents.ListOffsetArray` except the leaf, which is a :py:obj:`ak.contents.NumpyArray`. The call to ``continuation()`` returns a :py:obj:`ak.contents.ListOffsetArray` with its contents transformed, which is the argument of a new :py:obj:`ak.contents.UnmaskedArray`. To see this process as it happens, we can add ``print`` statements to the function. .. code-block:: python >>> def insert_optiontype(input, continuation, **kwargs): ... print("before", input.form.type) ... output = ak.contents.UnmaskedArray(continuation()) ... print("after ", output.form.type) ... return output ... >>> ak.transform(insert_optiontype, array) before var * var * var * var * float64 before var * var * var * float64 before var * var * float64 before var * float64 before float64 after ?float64 after option[var * ?float64] after option[var * option[var * ?float64]] after option[var * option[var * option[var * ?float64]]] after option[var * option[var * option[var * option[var * ?float64]]]] Broadcasting ============ When multiple arrays are provided (``more_arrays``), all of the arrays are broadcasted during the walk so that the ``transformation`` function is eventually provided with a list of layouts that have compatible types (for mathematical operations, etc.). For instance, given these two arrays: .. code-block:: python >>> array1 = ak.Array([[1, 2, 3], [], None, [4, 5]]) >>> array2 = ak.Array([10, 20, 30, 40]) The following single-array function shows the nodes encountered when walking down either one of them. .. code-block:: python >>> def one_array(layout, **kwargs): ... print(type(layout).__name__) ... >>> ak.transform(one_array, array1, return_value="none") IndexedOptionArray ListOffsetArray NumpyArray >>> ak.transform(one_array, array2, return_value="none") NumpyArray The first array has three nested nodes; the second has only one node. However, when the following two-array function is applied, .. code-block:: python >>> def two_arrays(layouts, **kwargs): ... assert len(layouts) == 2 ... print(type(layouts[0]).__name__, ak.to_list(layouts[0])) ... print(type(layouts[1]).__name__, ak.to_list(layouts[1])) ... print() ... >>> ak.transform(two_arrays, array1, array2) RegularArray [[[1, 2, 3], [], None, [4, 5]]] RegularArray [[10, 20, 30, 40]] IndexedOptionArray [[1, 2, 3], [], None, [4, 5]] NumpyArray [10, 20, 30, 40] ListArray [[1, 2, 3], [], [4, 5]] NumpyArray [10, 20, 40] NumpyArray [1, 2, 3, 4, 5] NumpyArray [10, 10, 10, 40, 40] (, ) The incompatible types of the two arrays eventually becomes the same type by duplicating and removing values wherever necessary. If you cannot perform an operation on a :py:obj:`ak.contents.ListArray` and a :py:obj:`ak.contents.NumpyArray`, wait for a later iteration, in which both will be :py:obj:`ak.contents.NumpyArray` (if the original arrays are broadcastable). The return value, without transformation, is the same as what :py:obj:`ak.broadcast_arrays` would return. See :py:obj:`ak.broadcast_arrays` for an explanation of ``left_broadcast`` and ``right_broadcast``. Broadcasting Parameters ======================= When broadcasting multiple arrays with parameters, there are different ways of assigning parameters to the outputs. The assignment of array parameters happens at every level above the transformation action. The method of parameter assignment used by the broadcasting routine is controlled by the ``broadcast_parameters_rule`` option, which can take one of the following values: ``"intersect"`` The parameters of each output array will correspond to the intersection of the parameters from each of the input arrays. ``"all_or_nothing"`` If the parameters of the input arrays are all equal, then they will be used for each output array. Otherwise, the output arrays will not be given parameters. ``"one_to_one"`` If the number of output arrays matches the number of input arrays, then the output arrays are given the parameters of the input arrays. Otherwise, a ValueError is raised. ``"none"`` The output arrays will not be given parameters. See also: :py:obj:`ak.is_valid` and :py:obj:`ak.valid_when` to check the validity of transformed outputs.