How to examine an array’s type#

The type of an Awkward Array can be determined using the ak.type() function, or ak.Array.type attribute of an array. It describes both the data-types of an array, e.g. float64, and the structure of the array (how many dimensions, which dimensions are ragged, which dimensions contain missing values, etc.).

Array types#

import awkward as ak

array = ak.Array(
    [
        ["Mr.", "Blue,", "you", "did", "it", "right"],
        ["But", "soon", "comes", "Mr.", "Night"],
        ["creepin'", "over"],
    ]
)
array.type.show()
3 * var * string

array.type.show() displays an extended subset of the Datashape language, which describes both shape and layout of an array in the form of units and dimensions. array.type actually returns an ak.types.Type object, which can be inspected

array.type
ArrayType(ListType(ListType(NumpyType('uint8', parameters={'__array__': 'char'}), parameters={'__array__': 'string'})), 3, None)

ak.Array.type always returns an ak.types.ArrayType object describing the outermost length of the array, which is always known.[1] The ak.types.ArrayType wraps a ak.types.Type object, which represents an array of “something”. For example, an array of integers:

ak.Array([1, 2, 3]).type
ArrayType(NumpyType('int64'), 3, None)

The outermost ak.types.ArrayType object indicates that this array has a known length of 3. Its content

ak.Array([1, 2, 3]).type.content
NumpyType('int64')

describes the array itself, which is an array of np.int64.

Regular vs ragged dimensions#

Regular arrays and ragged arrays have different types

import numpy as np

regular = ak.from_numpy(np.arange(8).reshape(2, 4))
ragged = ak.from_regular(regular)

regular.type.show()
ragged.type.show()
2 * 4 * int64
2 * var * int64

In the Datashape language, ragged dimensions are described as var, whilst regular (fixed) dimensions are expressed by an integer representing their size. At the type level, the ragged type object does not contain any size information, as it is no longer a constant part of the type:

regular.type.content.size
4
ragged.type.content.size
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[7], line 1
----> 1 ragged.type.content.size

AttributeError: 'ListType' object has no attribute 'size'

Records and tuples#

An Awkward Array with records is expressed using curly braces, resembling a JSON object or Python dictionary:

poet_records = ak.Array(
    [
        {"first": "William", "last": "Shakespeare"},
        {"first": "Sylvia", "last": "Plath"},
        {"first": "Homer", "last": "Simpson"},
    ]
)

poet_records.type.show()
3 * {
    first: string,
    last: string
}

whereas an array with tuples is expressed using parentheses, resembling a Python tuple:

poet_tuples = ak.Array(
    [
        ("William", "Shakespeare"),
        ("Sylvia", "Plath"),
        ("Homer", "Simpson"),
    ]
)

poet_tuples.type.show()
3 * (
    string,
    string
)

The ak.types.RecordType object contains information such as whether the record is a tuple, e.g.

poet_records.type.content.is_tuple
False
poet_tuples.type.content.is_tuple
True

Let’s look at the type of a simpler array:

ak.type([{"x": 1, "y": 2}, {"x": 3, "y": 4}])
ArrayType(RecordType([NumpyType('int64'), NumpyType('int64')], ['x', 'y']), 2, None)

Missing items#

Missing items are represented by both the option[...] and ? tokens, according to readability:

missing = ak.Array([33.0, None, 15.5, 99.1])
missing.type.show()
4 * ?float64

Awkward’s ak.types.OptionType object is used to represent this datashape type:

missing.type
ArrayType(OptionType(NumpyType('float64')), 4, None)

Unions#

A union is formed whenever multiple types are required for a particular dimension, e.g. if we concatenate two arrays with different records:

mixed = ak.concatenate(
    (
        [{"x": 1}],
        [{"y": 2}],
    )
)
mixed.type.show()
2 * union[
    {
        x: int64
    },
    {
        y: int64
    }
]

From the printed type, we can see that the formed union has two possible types. We can inspect these from the ak.types.UnionType object in mixed.type.content

mixed.type.content
UnionType([RecordType([NumpyType('int64')], ['x']), RecordType([NumpyType('int64')], ['y'])])
mixed.type.content.contents[0].show()
{
    x: int64
}
mixed.type.content.contents[1].show()
{
    y: int64
}

Strings#

Awkward Array implements strings as views over a 1D array of uint8 characters (char):

ak.type("hello world")
ArrayType(NumpyType('uint8', parameters={'__array__': 'char'}), 11, None)

This concept extends to an array of strings:

array = ak.Array(
    ["Mr.", "Blue,", "you", "did", "it", "right"]
)
array.type
ArrayType(ListType(NumpyType('uint8', parameters={'__array__': 'char'}), parameters={'__array__': 'string'}), 6, None)

array is a list of strings, which is represented as a list-of-list-of-char. When we evaluate str(array.type) (or directly print this value with array.type.show()), Awkward returns a readable type-string:

array.type.show()
6 * string

Scalar types#

In Array types it was discussed that all ak.type.Type objects are array-types, e.g. ak.types.NumpyType is the type of a NumPy (or CuPy, etc.) array of a fixed dtype:

import numpy as np

ak.type(np.arange(3))
ArrayType(NumpyType('int64'), 3, None)

Let’s now consider the following array of records:

record_array = ak.Array([
    {'x': 10, 'y': 11}
])
record_array.type
ArrayType(RecordType([NumpyType('int64'), NumpyType('int64')], ['x', 'y']), 1, None)

The resulting type object is an ak.types.ArrayType of ak.types.RecordType. This record-type represents an array of records, built from two NumPy arrays. From outside-to-inside, we can read the type object as:

  • An array of length 1

  • that is an array of records with two fields ‘x’ and ‘y’

  • which are both NumPy arrays of np.int64 type.

Now, what happens if we pull out a single record and inspect its type?

record = record_array[0]
record.type
ScalarType(RecordType([NumpyType('int64'), NumpyType('int64')], ['x', 'y']), None)

Unlike the ak.types.ArrayType objects returned by ak.type() for arrays, ak.Record.type always returns a ak.types.ScalarType object. Reading the returned type again from outside-to-inside, we have

  • A scalar taken from an array

  • that is an array of records with two fields ‘x’ and ‘y’

  • which are both NumPy arrays of np.int64 type.

Like ak.types.ArrayType, ak.types.ScalarType is an outermost type, but unlike ak.types.ArrayType it does more than add length information; it also removes a dimension from the final type!