How to ensure that an array is valid#
Awkward Arrays are complex data structures with their own rules for internal consistency. In principle, all data sources should serve valid array structures and all operations on valid structures should return valid structures. However, errors sometimes happen.
Awkward Array’s compiled routines check for validity in the course of computation, so that errors are reported as Python exceptions, rather than undefined behavior or segmentation faults. However, those errors can be hard to understand because the invalid structure might have been constructed much earlier in a program than the point where it is discovered.
For that reason, you have tools to check an Awkward Array’s internal validity: ak.is_valid()
, ak.validity_error()
, and the check_valid
argument to constructors like ak.Array
.
import awkward as ak
To demonstrate, here’s a valid array:
array_is_valid = ak.Array([[0, 1, 2], [], [3, 4], [5], [6, 7, 8, 9]])
array_is_valid
[[0, 1, 2], [], [3, 4], [5], [6, 7, 8, 9]] --------------------- backend: cpu nbytes: 128 B type: 5 * var * int64
and here is a copy of it that I will make invalid.
array_is_invalid = ak.copy(array_is_valid)
array_is_invalid.layout
<ListOffsetArray len='5'>
<offsets><Index dtype='int64' len='6'>
[ 0 3 3 5 6 10]
</Index></offsets>
<content><NumpyArray dtype='int64' len='10'>[0 1 2 3 4 5 6 7 8 9]</NumpyArray></content>
</ListOffsetArray>
array_is_invalid.layout.offsets.data
array([ 0, 3, 3, 5, 6, 10])
array_is_invalid.layout.offsets.data[3] = 100
array_is_invalid.layout
<ListOffsetArray len='5'>
<offsets><Index dtype='int64' len='6'>
[ 0 3 3 100 6 10]
</Index></offsets>
<content><NumpyArray dtype='int64' len='10'>[0 1 2 3 4 5 6 7 8 9]</NumpyArray></content>
</ListOffsetArray>
The ak.is_valid()
function only tells us whether an array is valid or not:
ak.is_valid(array_is_valid)
True
ak.is_valid(array_is_invalid)
False
But the ak.validity_error()
function tells us what the error was (if any).
ak.validity_error(array_is_valid)
''
ak.validity_error(array_is_invalid)
'at highlevel ("<class \'awkward.contents.listoffsetarray.ListOffsetArray\'>"): stop[i] > len(content) at i=2 (in compiled code: https://github.com/scikit-hep/awkward/blob/awkward-cpp-43/awkward-cpp/src/cpu-kernels/awkward_ListArray_validity.cpp#L24)'
If you suspect that an array is invalid or becomes invalid in the course of your program, you can either use these functions to check or construct arrays with check_valid=True
in the ak.Array
constructor.
ak.Array(array_is_valid, check_valid=True)
[[0, 1, 2], [], [3, 4], [5], [6, 7, 8, 9]] --------------------- backend: cpu nbytes: 128 B type: 5 * var * int64
ak.Array(array_is_invalid, check_valid=True)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[12], line 1
----> 1 ak.Array(array_is_invalid, check_valid=True)
File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/highlevel.py:366, in Array.__init__(self, data, behavior, with_name, check_valid, backend, attrs, named_axis)
363 self._update_class()
365 if check_valid:
--> 366 ak.operations.validity_error(self, exception=True)
File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/_dispatch.py:38, in named_high_level_function.<locals>.dispatch(*args, **kwargs)
35 @wraps(func)
36 def dispatch(*args, **kwargs):
37 # NOTE: this decorator assumes that the operation is exposed under `ak.`
---> 38 with OperationErrorContext(name, args, kwargs):
39 gen_or_result = func(*args, **kwargs)
40 if isgenerator(gen_or_result):
File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/_errors.py:80, in ErrorContext.__exit__(self, exception_type, exception_value, traceback)
78 self._slate.__dict__.clear()
79 # Handle caught exception
---> 80 raise self.decorate_exception(exception_type, exception_value)
81 else:
82 # Step out of the way so that another ErrorContext can become primary.
83 if self.primary() is self:
File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/_dispatch.py:64, in named_high_level_function.<locals>.dispatch(*args, **kwargs)
62 # Failed to find a custom overload, so resume the original function
63 try:
---> 64 next(gen_or_result)
65 except StopIteration as err:
66 return err.value
File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/operations/ak_validity_error.py:31, in validity_error(array, exception)
28 yield (array,)
30 # Implementation
---> 31 return _impl(array, exception)
File ~/micromamba/envs/awkward-docs/lib/python3.11/site-packages/awkward/operations/ak_validity_error.py:41, in _impl(array, exception)
38 out = ak._do.validity_error(layout, path="highlevel")
40 if out not in (None, "") and exception:
---> 41 raise ValueError(out)
42 else:
43 return out
ValueError: at highlevel ("<class 'awkward.contents.listoffsetarray.ListOffsetArray'>"): stop[i] > len(content) at i=2 (in compiled code: https://github.com/scikit-hep/awkward/blob/awkward-cpp-43/awkward-cpp/src/cpu-kernels/awkward_ListArray_validity.cpp#L24)
This error occurred while calling
ak.validity_error(
<Array [[0, 1, 2], [], ..., [], [6, 7, 8, 9]] type='5 * var * int64'>
exception = True
)