How to examine an array’s type#
The type of an Awkward Array can be determined using the ak.type()
function, or ak.Array.type
attribute of an array. It describes both the data-types of an array, e.g. float64
, and the structure of the array (how many dimensions, which dimensions are ragged, which dimensions contain missing values, etc.).
Array types#
import awkward as ak
array = ak.Array(
[
["Mr.", "Blue,", "you", "did", "it", "right"],
["But", "soon", "comes", "Mr.", "Night"],
["creepin'", "over"],
]
)
array.type.show()
3 * var * string
array.type.show()
displays an extended subset of the Datashape language, which describes both shape and layout of an array in the form of units and dimensions. array.type
actually returns an ak.types.Type
object, which can be inspected
array.type
ArrayType(ListType(ListType(NumpyType('uint8', parameters={'__array__': 'char'}), parameters={'__array__': 'string'})), 3, None)
ak.Array.type
always returns an ak.types.ArrayType
object describing the outermost length of the array, which is always known.1 The ak.types.ArrayType
wraps a ak.types.Type
object, which represents an array of “something”. For example, an array of integers:
ak.Array([1, 2, 3]).type
ArrayType(NumpyType('int64'), 3, None)
The outermost ak.types.ArrayType
object indicates that this array has a known length of 3. Its content
ak.Array([1, 2, 3]).type.content
NumpyType('int64')
describes the array itself, which is an array of np.int64
.
Regular vs ragged dimensions#
Regular arrays and ragged arrays have different types
import numpy as np
regular = ak.from_numpy(np.arange(8).reshape(2, 4))
ragged = ak.from_regular(regular)
regular.type.show()
ragged.type.show()
2 * 4 * int64
2 * var * int64
In the Datashape language, ragged dimensions are described as var
, whilst regular (fixed
) dimensions are expressed by an integer representing their size. At the type level, the ragged
type object does not contain any size information, as it is no longer a constant part of the type:
regular.type.content.size
4
ragged.type.content.size
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[7], line 1
----> 1 ragged.type.content.size
AttributeError: 'ListType' object has no attribute 'size'
Records and tuples#
An Awkward Array with records is expressed using curly braces, resembling a JSON object or Python dictionary:
poet_records = ak.Array(
[
{"first": "William", "last": "Shakespeare"},
{"first": "Sylvia", "last": "Plath"},
{"first": "Homer", "last": "Simpson"},
]
)
poet_records.type.show()
3 * {
first: string,
last: string
}
whereas an array with tuples is expressed using parentheses, resembling a Python tuple:
poet_tuples = ak.Array(
[
("William", "Shakespeare"),
("Sylvia", "Plath"),
("Homer", "Simpson"),
]
)
poet_tuples.type.show()
3 * (
string,
string
)
The ak.types.RecordType
object contains information such as whether the record is a tuple, e.g.
poet_records.type.content.is_tuple
False
poet_tuples.type.content.is_tuple
True
Let’s look at the type of a simpler array:
ak.type([{"x": 1, "y": 2}, {"x": 3, "y": 4}])
ArrayType(RecordType([NumpyType('int64'), NumpyType('int64')], ['x', 'y']), 2, None)
Missing items#
Missing items are represented by both the option[...]
and ?
tokens, according to readability:
missing = ak.Array([33.0, None, 15.5, 99.1])
missing.type.show()
4 * ?float64
Awkward’s ak.types.OptionType
object is used to represent this datashape type:
missing.type
ArrayType(OptionType(NumpyType('float64')), 4, None)
Unions#
A union is formed whenever multiple types are required for a particular dimension, e.g. if we concatenate two arrays with different records:
mixed = ak.concatenate(
(
[{"x": 1}],
[{"y": 2}],
)
)
mixed.type.show()
2 * union[
{
x: int64
},
{
y: int64
}
]
From the printed type, we can see that the formed union has two possible types. We can inspect these from the ak.types.UnionType
object in mixed.type.content
mixed.type.content
UnionType([RecordType([NumpyType('int64')], ['x']), RecordType([NumpyType('int64')], ['y'])])
mixed.type.content.contents[0].show()
{
x: int64
}
mixed.type.content.contents[1].show()
{
y: int64
}
Strings#
Awkward Array implements strings as views over a 1D array of uint8
characters (char
):
ak.type("hello world")
ArrayType(NumpyType('uint8', parameters={'__array__': 'char'}), 11, None)
This concept extends to an array of strings:
array = ak.Array(
["Mr.", "Blue,", "you", "did", "it", "right"]
)
array.type
ArrayType(ListType(NumpyType('uint8', parameters={'__array__': 'char'}), parameters={'__array__': 'string'}), 6, None)
array
is a list of strings, which is represented as a list-of-list-of-char. When we evaluate str(array.type)
(or directly print this value with array.type.show()
), Awkward returns a readable type-string:
array.type.show()
6 * string
Scalar types#
In Array types it was discussed that all ak.type.Type
objects are array-types, e.g. ak.types.NumpyType
is the type of a NumPy (or CuPy, etc.) array of a fixed dtype:
import numpy as np
ak.type(np.arange(3))
ArrayType(NumpyType('int64'), 3, None)
Let’s now consider the following array of records:
record_array = ak.Array([
{'x': 10, 'y': 11}
])
record_array.type
ArrayType(RecordType([NumpyType('int64'), NumpyType('int64')], ['x', 'y']), 1, None)
The resulting type object is an ak.types.ArrayType
of ak.types.RecordType
. This record-type represents an array of records, built from two NumPy arrays. From outside-to-inside, we can read the type object as:
An array of length 1
that is an array of records with two fields ‘x’ and ‘y’
which are both NumPy arrays of
np.int64
type.
Now, what happens if we pull out a single record and inspect its type?
record = record_array[0]
record.type
ScalarType(RecordType([NumpyType('int64'), NumpyType('int64')], ['x', 'y']), None)
Unlike the ak.types.ArrayType
objects returned by ak.type()
for arrays, ak.Record.type
always returns a ak.types.ScalarType
object. Reading the returned type again from outside-to-inside, we have
A scalar taken from an array
that is an array of records with two fields ‘x’ and ‘y’
which are both NumPy arrays of
np.int64
type.
Like ak.types.ArrayType
, ak.types.ScalarType
is an outermost type, but unlike ak.types.ArrayType
it does more than add length information; it also removes a dimension from the final type!
- 1
Except for typetracer arrays, which are used in the dask-awkward integration.