ak.to_buffers#
Defined in awkward.operations.ak_to_buffers on line 16.
- ak.to_buffers(array, container=None, buffer_key='{form_key}-{attribute}', form_key='node{id}', *, id_start=0, backend=None, byteorder=ak._util.native_byteorder)#
- Parameters:
array – Array-like data (anything
ak.to_layoutrecognizes).container (None or MutableMapping) – The str → NumPy arrays (or Python buffers) that represent the decomposed Awkward Array. This
containeris only assumed to have a__setitem__method that accepts strings as keys.buffer_key (str or callable) – Python format string containing
"{form_key}"and/or"{attribute}"or a function that takes these (and/orlayout) as keyword arguments and returns a string to use as a key for a buffer in thecontainer. Theform_keyis the result of applyingform_key(below), and theattributeis a hard-coded string representing the buffer’s function (e.g."data","offsets","index").form_key (str, callable) – Python format string containing
"{id}"or a function that takes this (and/orlayout) as a keyword argument and returns a string to use as a key for a Form node. Together, thebuffer_keyandform_keylinks attributes of each Form node to data in thecontainer.id_start (int) – Starting
idto use inform_keyand hencebuffer_key. This integer increases in a depth-first walk over thearraynodes and can be used to generate unique keys for each Form.backend (
"cpu","cuda","jax", None) – Backend to use to generate values that are put into thecontainer. The default,"cpu", makes NumPy arrays, which are in main memory (e.g. not GPU) and satisfy Python’s Buffer protocol. If all the buffers inarrayhave the samebackendas this, they won’t be copied. If the backend is None, then the backend of the layout will be used to generate the buffers.byteorder (
"<",">") – Endianness of buffers written tocontainer. If the byteorder does not match the current system byteorder, the arrays will be copied.
Decomposes an Awkward Array into a Form and a collection of memory buffers, so that data can be losslessly written to file formats and storage devices that only map names to binary blobs (such as a filesystem directory).
This function returns a 3-tuple:
(form, length, container)
where the
formis aak.forms.Form(whose string representation is JSON), thelengthis an integer (len(array)), and thecontaineris either the MutableMapping you passed in or a new dict containing the buffers (as NumPy arrays).These are also the first three arguments of
ak.from_buffers, so a full round-trip is>>> reconstituted = ak.from_buffers(*ak.to_buffers(original))
The
containerargument lets you specify your own MutableMapping, which might be an interface to some storage format or device (e.g. h5py). It’s okay if thecontainerdrops NumPy’sdtypeandshapeinformation, leaving raw bytes, sincedtypeandshapecan be reconstituted from theak.forms.NumpyForm.The
buffer_keyandform_keyarguments let you configure the names of the buffers added to thecontainerand string labels on each Form node, so that the two can be uniquely matched later.buffer_keyandform_keyare distinct arguments to allow for more indirection (buffer keys can differ from Form keys, as long as there’s a way to map them to each other) and because some Form nodes, such asak.forms.ListFormandak.forms.UnionForm, have more than one attribute (startsandstopsforak.forms.ListFormandtagsandindexforak.forms.UnionForm).Awkward 1.x also included partition numbers (
"part0-","part1-", …) in the buffer keys. In version 2.x onward, partitioning is handled externally by Dask, but partition numbers can be emulated by prepending a fixed"partN-"string to thebuffer_key. Thearrayrepresents exactly one partition.Here is a simple example:
>>> original = ak.Array([[1, 2, 3], [], [4, 5]]) >>> form, length, container = ak.to_buffers(original) >>> print(form) { "class": "ListOffsetArray", "offsets": "i64", "content": { "class": "NumpyArray", "primitive": "int64", "form_key": "node1" }, "form_key": "node0" } >>> length 3 >>> container {'node0-offsets': array([0, 3, 3, 5]), 'node1-data': array([1, 2, 3, 4, 5])}
which may be read back with
>>> ak.from_buffers(form, length, container) <Array [[1, 2, 3], [], [4, 5]] type='3 * var * int64'>
If you intend to use this function for saving data, you may want to pack it first with
ak.to_packed.See also
ak.from_buffersandak.to_packed.