What is an Awkward Array?

Efficiency and generality

Arrays are the most efficient data structures for sequential numeric processing, and NumPy makes it easy to interact with arrays in Python. However, NumPy’s arrays are rectangular tables or tensors that cannot express variable-length structures.

General tree-like data are often expressed using JSON, but at the expense of memory use and processing speed.

Awkward Arrays are general tree-like data structures, like JSON, but contiguous in memory and operated upon with compiled, vectorized code like NumPy. They’re basic building blocks for data analyses that are, well, more awkward than those involving neat tables.

This library was originally developed for high-energy particle physics. Particle physics datasets have rich data structures that usually can’t be flattened into rectangular arrays, but physicists need to process them efficiently because the datasets are enormous. Awkward Arrays combine generic data structures with high-performance number-crunching.

Let’s illustrate this with a non-physics dataset: maps of bike routes in my hometown of Chicago. You can also follow this as a video tutorial.

JSON to array

Here is a GeoJSON of bike paths of bike paths throughout the city of Chicago.

If you dig into the JSON, you’ll see that it contains street names, metadata, and longitude, latitude coordinates all along the bike paths.

Start by loading them into Python as Python objects,

import urllib.request
import json

url = "https://raw.githubusercontent.com/Chicago/osd-bike-routes/master/data/Bikeroutes.geojson"
bikeroutes_json = urllib.request.urlopen(url).read()
bikeroutes_pyobj = json.loads(bikeroutes_json)

and then as an Awkward Array (actually an ak.Record because the top-level construct is a JSON object).

import awkward as ak

bikeroutes = ak.from_json(bikeroutes_json)
# Alternatively, bikeroutes = ak.Record(bikeroutes_pyobj)
bikeroutes
<Record ... [-87.7, 42], [-87.7, 42]]]}}]} type='{"type": string, "crs": {"type"...'>

We only see a part of the data and its type if we don’t deliberately expand it out.

Data types

To get a full view of the type (Awkward’s equivalent of a NumPy dtype + shape), use the ak.type function. The display format adheres to Datashape syntax, when possible.

ak.type(bikeroutes)
{"type": string, "crs": {"type": string, "properties": {"name": string}}, "features": var * {"type": string, "properties": {"STREET": string, "TYPE": string, "BIKEROUTE": string, "F_STREET": string, "T_STREET": option[string]}, "geometry": {"type": string, "coordinates": var * var * var * float64}}}

In the above, {"field name": type, ...} denotes a record structure, which can be nested, and var indicates variable-length lists. The "coordinates" (at the end) are var * var * var * float64, lists of lists of lists of numbers, and any of these lists can have an arbitrary length.

In addition, there are strings (variable-length lists interpreted as text) and “option” types, meaning that values are allowed to be null.

Slicing

NumPy-like slicing extracts structures within the array. The slice may consist of integers, ranges, and many other slice types, like NumPy, and commas indicate different slices applied to different dimensions. Since the bike routes dataset contains records, we can use strings to select nested fields sequentially.

bikeroutes["features", "geometry", "coordinates"]
<Array [[[[-87.8, 41.9], ... [-87.7, 42]]]] type='1061 * var * var * var * float64'>

Alternatively, we could use dots for record field specifiers (if the field names are syntactically allowed in Python):

bikeroutes.features.geometry.coordinates
<Array [[[[-87.8, 41.9], ... [-87.7, 42]]]] type='1061 * var * var * var * float64'>

Slicing by field names (even if the records those fields belong to are nested within lists) slices across all elements of the lists. We can pick out just one object by putting integers in the square brackets:

bikeroutes["features", "geometry", "coordinates", 100, 0]
<Array [[-87.7, 42], ... [-87.7, 42]] type='7 * var * float64'>

or

bikeroutes.features.geometry.coordinates[100, 0]
<Array [[-87.7, 42], ... [-87.7, 42]] type='7 * var * float64'>

or even

bikeroutes.features[100].geometry.coordinates[0]
<Array [[-87.7, 42], ... [-87.7, 42]] type='7 * var * float64'>

(The strings that select record fields may be placed before or after integers and other slice types.)

To get full detail of one structured object, we can use the ak.to_list function, which converts Awkward records and lists into Python dicts and lists.

ak.to_list(bikeroutes.features[751])
{'type': 'Feature',
 'properties': {'STREET': 'E 26TH ST',
  'TYPE': '1',
  'BIKEROUTE': 'EXISTING BIKE LANE',
  'F_STREET': 'S STATE ST',
  'T_STREET': 'S DR MARTIN LUTHER KING JR DR'},
 'geometry': {'type': 'MultiLineString',
  'coordinates': [[[-87.62685625163756, 41.845587148411795],
    [-87.62675996392576, 41.84558902593194],
    [-87.62637708895348, 41.845596494328554],
    [-87.62626461651281, 41.845598326696425],
    [-87.62618268489399, 41.84559966093136],
    [-87.6261438116618, 41.84560027230502],
    [-87.62613206507362, 41.845600474403334],
    [-87.6261027723024, 41.8456009526551],
    [-87.62579736038116, 41.84560626159298],
    [-87.62553890383363, 41.845610239979905],
    [-87.62532611036139, 41.845613593674],
    [-87.6247932635836, 41.84562202574476]],
   [[-87.62532611036139, 41.845613593674],
    [-87.6247932635836, 41.84562202574476]],
   [[-87.6247932635836, 41.84562202574476],
    [-87.62446484629729, 41.84562675013391],
    [-87.62444032614908, 41.845627092762086]],
   [[-87.6247932635836, 41.84562202574476],
    [-87.62446484629729, 41.84562675013391],
    [-87.62444032614908, 41.845627092762086]],
   [[-87.62444032614908, 41.845627092762086],
    [-87.62417259047609, 41.84563048939241]],
   [[-87.62417259047609, 41.84563048939241],
    [-87.62407957610536, 41.845631726253856],
    [-87.62363619038386, 41.84563829041728],
    [-87.62339190417225, 41.845641912449615],
    [-87.62213773032211, 41.8456604706941],
    [-87.620481318361, 41.84568497173672],
    [-87.62033059867875, 41.84568719208078],
    [-87.61886420422526, 41.84571018731772],
    [-87.61783987848477, 41.845726258794926],
    [-87.61768559736353, 41.84572529758383],
    [-87.61767695024436, 41.84572400878766]],
   [[-87.62417259047609, 41.84563048939241],
    [-87.62407957610536, 41.845631726253856],
    [-87.62363619038386, 41.84563829041728]]]}}

Looking at one record in full detail can make it clear why, for instance, the “coordinates” field contains lists of lists of lists: they are path segments that collectively form a route, and there are many routes, each associated with a named street. This item, number 751, is Martin Luther King Drive, a route described by 7 segments. (Presumably, you have to pick up your bike and walk it.)

Variable-length lists

The last dimension of these lists always happens to have length 2. This is because it represents the longitude and latitude of each point along a path. You can see this with the ak.num function:

ak.num(bikeroutes.features[751].geometry.coordinates, axis=2)
<Array [[2, 2, 2, 2, 2, 2, ... 2], [2, 2, 2]] type='7 * var * int64'>

The axis is the depth at which this function is applied; the above could alternatively have been axis=-1 (deepest), and ak.num at less-deep axis values tells us the number of points in each segment:

ak.num(bikeroutes.features[751].geometry.coordinates, axis=1)
<Array [12, 2, 3, 3, 2, 11, 3] type='7 * int64'>

and the number of points:

ak.num(bikeroutes.features[751].geometry.coordinates, axis=0)
7

By verifying that all lists at this depth have length 2,

ak.all(ak.num(bikeroutes.features.geometry.coordinates, axis=-1) == 2)
True

we can be confident that we can select item 0 and item 1 without errors. Note that this is a major difference between variable-length lists and rectilinear arrays: in NumPy, a given index either exists for all nested lists or for none of them. For variable-length lists, we have to check (or ensure it with another selection).

Array math

We now know that the "coordinates" are longitude-latitude pairs, so let’s pull them out and name them as such. Item 0 of each of the deepest lists is the longitude and item 1 of each of the deepest lists is the latitude. We want to leave the structure of all lists other than the deepest untouched, which would mean a complete slice (colon : by itself) at each dimension except the last, but we can also use the ellipsis (...) shortcut from NumPy.

longitude = bikeroutes.features.geometry.coordinates[..., 0]
latitude = bikeroutes.features.geometry.coordinates[..., 1]
longitude, latitude
(<Array [[[-87.8, -87.8, ... -87.7, -87.7]]] type='1061 * var * var * float64'>,
 <Array [[[41.9, 41.9, 41.9, ... 42, 42, 42]]] type='1061 * var * var * float64'>)

Note that if we wanted to do this with Python objects, the above would have required many “append” operations in nested “for” loops. As Awkward Arrays, it’s just a slice.

Now that we have arrays of pure numbers (albeit inside of variable-length nested lists), we can run NumPy functions on them. For example,

import numpy as np

np.add(longitude, 180)
<Array [[[92.2, 92.2, 92.2, ... 92.3, 92.3]]] type='1061 * var * var * float64'>

rotates the longitude points 180 degrees around the world while maintaining the triply nested structure. Any “universal function” (ufunc) will work, including ufuncs from libraries other than NumPy (such as SciPy, or a domain-specific package). Simple NumPy functions like addition have the usual shortcuts:

longitude + 180
<Array [[[92.2, 92.2, 92.2, ... 92.3, 92.3]]] type='1061 * var * var * float64'>

In addition, some functions other than ufuncs have an Awkward equivalent, such as ak.mean, which is the equivalent of NumPy’s np.mean (not a ufunc because it takes a whole array and returns one value).

ak.mean(longitude)
-87.67152377693318

Using an extension mechanism within NumPy (introduced in NumPy 1.17), we can use ak.mean and np.mean interchangeably.

np.mean(longitude)
-87.67152377693318

Awkward functions have all or most of the same arguments as their NumPy equivalents. For instance, we can compute the mean along an axis, such as axis=1, which gives us the mean longitude of each path, rather than a single mean of all points.

np.mean(longitude, axis=1)
<Array [[-87.8, -87.8, ... -87.7, -87.7]] type='1061 * var * ?float64'>

To focus our discussion, let’s say that we’re trying to find the length of each path in the dataset. To do this, we need to convert the degrees longitude and latitude into common distance units, and to work with smaller numbers, we’ll start by subtracting the mean.

At Chicago’s latitude, one degree of longitude is 82.7 km and one degree of latitude is 111.1 km, which we can use as conversion factors.

km_east = (longitude - np.mean(longitude)) * 82.7 # km/deg
km_north = (latitude - np.mean(latitude)) * 111.1 # km/deg
km_east, km_north
(<Array [[[-9.68, -9.69, ... -3.58, -3.62]]] type='1061 * var * var * float64'>,
 <Array [[[6.68, 6.68, 6.67, ... 9.68, 9.72]]] type='1061 * var * var * float64'>)

To find distances between points, we first have to pair up points with their neighbors. Each path segment of \(N\) points has \(N-1\) pairs of neighbors. We can construct these pairs by making two partial copies of each list, one with everything except the first element and the other with everything except the last element, so that original index \(i\) can be compared with original index \(i+1\).

In plain NumPy, you would express it like this:

path = np.array([1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9])
path[1:] - path[:-1]
array([1.1, 1.1, 1.1, 1.1, 1.1, 1.1, 1.1, 1.1])

The array[1:] has the first element dropped and the array[:-1] has the last element dropped, so their differences are the 8 distances between each of the 9 points in the original array. In this example, all differences are 1.1.

Here’s what that looks like for the first segment of the first bike path in our sample:

km_east[0, 0, 1:], km_east[0, 0, :-1]
(<Array [-9.69, -9.7, -9.71, ... -9.92, -9.92] type='15 * float64'>,
 <Array [-9.68, -9.69, -9.7, ... -9.91, -9.92] type='15 * float64'>)

and their differences are:

km_east[0, 0, 1:] - km_east[0, 0, :-1]
<Array [-0.00603, -0.0165, ... -0.00203] type='15 * float64'>

If we can do it for one list, we can do it for all of them by swapping index 0 with slice : in the first two dimensions.

km_east[:, :, 1:] - km_east[:, :, :-1]
<Array [[[-0.00603, -0.0165, ... -0.0385]]] type='1061 * var * var * float64'>

This expression subtracts pairs of neighboring points in all lists, each with a different length, maintaining the segments-within-paths structure.

Now that we know how to compute differences in \(x\) (km_east) and \(y\) (km_north) individually, we can compute distances using the distance formula: \(\sqrt{(x_i - x_{i + 1})^2 + (y_i - y_{i + 1})^2}\).

segment_length = np.sqrt(
    ( km_east[:, :, 1:] -  km_east[:, :, :-1])**2 +
    (km_north[:, :, 1:] - km_north[:, :, :-1])**2
)
segment_length
<Array [[[0.00603, 0.0165, ... 0.0523]]] type='1061 * var * var * float64'>

Going back to our example of Martin Luther King Drive, these pairwise distances are

ak.to_list(segment_length[751])
[[0.007965725361784344,
  0.03167462986536074,
  0.009303698353632218,
  0.006777366140642045,
  0.0032155337770008734,
  0.0009717022897204878,
  0.0024230948099507807,
  0.025264451818903598,
  0.021378926022712963,
  0.01760196411492381,
  0.0440763850916575],
 [0.0440763850916575],
 [0.02716518085563118, 0.0020281735105371133],
 [0.02716518085563118, 0.0020281735105371133],
 [0.02214495567782706],
 [0.007693515757648775,
  0.03667525064914765,
  0.020206477031414642,
  0.10374066852963969,
  0.13701231191288935,
  0.012466958457596494,
  0.12129772855897196,
  0.08473055433052853,
  0.012759495625968342,
  0.0007293106274618985],
 [0.007693515757648775, 0.03667525064914765]]

for each of the segments in this discontiguous path. Some of these segments had only two longitude, latitude points, and hence they have only one distance (single-element lists).

To make path distances from the pairwise distances, we need to add them up. There’s an ak.sum (equivalent to np.sum) that we can use with axis=-1 to add up the innermost lists.

For Martin Luther King Drive, this is

ak.to_list(ak.sum(segment_length[751], axis=-1))
[0.17065347764628935,
 0.0440763850916575,
 0.029193354366168295,
 0.029193354366168295,
 0.02214495567782706,
 0.5373122714812673,
 0.04436876640679643]

and in general, it’s

path_length = np.sum(segment_length, axis=-1)
path_length
<Array [[0.241], [0.0971], ... 0.347], [0.281]] type='1061 * var * float64'>

Notice that segment_length has type

ak.type(segment_length)
1061 * var * var * float64

and path_length has type

ak.type(path_length)
1061 * var * float64

The path_length has one fewer var dimension because we have summed over it. We can further sum over the discontiguous curves that 11 of the streets have to get total lengths.

Since there are multiple paths for each bike route, we sum up the innermost dimension again:

route_length = np.sum(path_length, axis=-1)
route_length
<Array [0.241, 0.0971, 0.203, ... 0.347, 0.281] type='1061 * float64'>

Now there’s exactly one of these for each of the 1061 streets.

for i in range(10):
    print(bikeroutes.features.properties.STREET[i], "\t\t", route_length[i])
W FULLERTON AVE 		 0.24076035127094295
N LA CROSSE AVE 		 0.09706818131239836
S DR MARTIN LUTHER KING JR DR W 		 0.20258150113769838
W 51ST ST 		 0.8459916013923557
E 50TH ST 		 0.021616600297903087
W MARQUETTE RD 		 0.7926173720366738
W MARQUETTE RD 		 0.4040218089349682
W 83RD ST 		 0.20738439769758524
E 83RD ST 		 0.12660735184266853
E 103RD ST 		 0.2740970708688548

This would have been incredibly awkward to write using only NumPy, and slow if executed in Python loops.

Performance

The full analysis, expressed in Python for loops, would be:

%%timeit

route_length = []
for route in bikeroutes_pyobj["features"]:
    path_length = []
    for segment in route["geometry"]["coordinates"]:
        segment_length = []
        last = None
        for lng, lat in segment:
            km_east = lng * 82.7
            km_north = lat * 111.1
            if last is not None:
                dx2 = (km_east - last[0])**2
                dy2 = (km_north - last[1])**2
                segment_length.append(np.sqrt(dx2 + dy2))
            last = (km_east, km_north)
        path_length.append(sum(segment_length))
    route_length.append(sum(route_length))
67.9 ms ± 866 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

whereas for Awkward Arrays, it is:

%%timeit

km_east = bikeroutes.features.geometry.coordinates[..., 0] * 82.7
km_north = bikeroutes.features.geometry.coordinates[..., 1] * 111.1

segment_length = np.sqrt((km_east[:, :, 1:] - km_east[:, :, :-1])**2 +
                         (km_north[:, :, 1:] - km_north[:, :, :-1])**2)

path_length = np.sum(segment_length, axis=-1)
route_length = np.sum(path_length, axis=-1)
17.2 ms ± 799 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In addition to being more concise, the latter is typically 5‒8× faster, especially when we scale to ever-larger problems:

The reasons for this speedup are all related to Awkward Array’s data structure, that it is more suited to structured numerical math than Python objects. Like a NumPy array, its numerical data are packed in memory-contiguous arrays of homogeneous type, which means that

  • only a single block of memory needs to be fetched from memory into the CPU cache (no “pointer chasing”),

  • data for fields other than the one being operated upon are not in the same buffer, so they don’t even need to be loaded (“columnar,” rather than “record-oriented”),

  • the data type can be evaluated once before applying a precompiled opeation to a whole array buffer, rather than once before each element of a Python list.

This memory layout is especially good for applying one operation on all values in the array, thinking about the result, and then applying another. This is the “interactive” style of data analysis that you’re probably familiar with from NumPy and Pandas, especially if you use Jupyter notebooks. It does have a performance cost, however: array buffers need to be allocated and filled after each step of the process, and some of those might never be used again.

Just as NumPy can be accelerated by just-in-time compiling your code with Numba, Awkward Arrays can be accelerated in the same way. The speedups described on Numba’s website are possible because they avoid creating temporary, intermediate arrays and flushing the CPU cache with multiple passes over the same data. The Numba-accelerated equivalent of our bike routes example looks very similar to the pure Python code:

import numba as nb

@nb.jit
def compute_lengths(bikeroutes):
    route_length = np.zeros(len(bikeroutes.features))
    for i in range(len(bikeroutes.features)):
        for path in bikeroutes.features[i].geometry.coordinates:
            first = True
            last_east, last_north = 0.0, 0.0
            for lng_lat in path:
                km_east = lng_lat[0] * 82.7
                km_north = lng_lat[1] * 111.1
                if not first:
                    dx2 = (km_east - last_east)**2
                    dy2 = (km_north - last_north)**2
                    route_length[i] += np.sqrt(dx2 + dy2)
                first = False
                last_east, last_north = km_east, km_north
    return route_length

compute_lengths(bikeroutes)
array([0.24076035, 0.09706818, 0.2025815 , ..., 1.42737517, 0.34667691,
       0.28063495])

But it runs 250× faster than the pure Python code:

%%timeit

compute_lengths(bikeroutes)
539 µs ± 8.24 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

(Note that these are microseconds, not milliseconds.)

This improvement is due to a combination of streamlined data structures, precompiled logic, and minimizing the number of passes over the data. We haven’t even taken advantage of multithreading yet, which can multiply this speedup by (up to) the number of CPU cores your computer has. (See Numba’s parallel range, multithreading, and nogil mode for more.)

Internal structure

It’s possible to peek into this columnar structure (or manipulate it, if you’re a developer) by accessing the ak.Array’s layout. All of the columnar buffers are accessible this way.

If you look carefully at the following, you’ll see that all values for each field is in a separate buffer; the last of these is the longitude, latitude coordinates.

bikeroutes.layout
<Record at="0">
    <RecordArray length="1">
        <field index="0" key="type">
            <ListOffsetArray64>
                <parameters>
                    <param key="__array__">"string"</param>
                </parameters>
                <offsets><Index64 i="[0 17]" offset="0" length="2" at="0x000003752fc0"/></offsets>
                <content><NumpyArray format="B" shape="17" data="70 101 97 116 117 ... 99 116 105 111 110" at="0x0000037526a0">
                    <parameters>
                        <param key="__array__">"char"</param>
                    </parameters>
                </NumpyArray></content>
            </ListOffsetArray64>
        </field>
        <field index="1" key="crs">
            <RecordArray length="1">
                <field index="0" key="type">
                    <ListOffsetArray64>
                        <parameters>
                            <param key="__array__">"string"</param>
                        </parameters>
                        <offsets><Index64 i="[0 4]" offset="0" length="2" at="0x000003754fd0"/></offsets>
                        <content><NumpyArray format="B" shape="4" data="110 97 109 101" at="0x000003756fe0">
                            <parameters>
                                <param key="__array__">"char"</param>
                            </parameters>
                        </NumpyArray></content>
                    </ListOffsetArray64>
                </field>
                <field index="1" key="properties">
                    <RecordArray length="1">
                        <field index="0" key="name">
                            <ListOffsetArray64>
                                <parameters>
                                    <param key="__array__">"string"</param>
                                </parameters>
                                <offsets><Index64 i="[0 29]" offset="0" length="2" at="0x0000037573f0"/></offsets>
                                <content><NumpyArray format="B" shape="29" data="117 114 110 58 111 ... 67 82 83 56 52" at="0x000003759400">
                                    <parameters>
                                        <param key="__array__">"char"</param>
                                    </parameters>
                                </NumpyArray></content>
                            </ListOffsetArray64>
                        </field>
                    </RecordArray>
                </field>
            </RecordArray>
        </field>
        <field index="2" key="features">
            <ListOffsetArray64>
                <offsets><Index64 i="[0 1061]" offset="0" length="2" at="0x000003759810"/></offsets>
                <content><RecordArray length="1061">
                    <field index="0" key="type">
                        <ListOffsetArray64>
                            <parameters>
                                <param key="__array__">"string"</param>
                            </parameters>
                            <offsets><Index64 i="[0 7 14 21 28 35 42 49 56 63 ... 7364 7371 7378 7385 7392 7399 7406 7413 7420 7427]" offset="0" length="1062" at="0x000003b166a0"/></offsets>
                            <content><NumpyArray format="B" shape="7427" data="70 101 97 116 117 ... 97 116 117 114 101" at="0x0000039c5950">
                                <parameters>
                                    <param key="__array__">"char"</param>
                                </parameters>
                            </NumpyArray></content>
                        </ListOffsetArray64>
                    </field>
                    <field index="1" key="properties">
                        <RecordArray length="1061">
                            <field index="0" key="STREET">
                                <ListOffsetArray64>
                                    <parameters>
                                        <param key="__array__">"string"</param>
                                    </parameters>
                                    <offsets><Index64 i="[0 15 30 61 70 79 93 107 116 125 ... 14063 14074 14083 14098 14107 14122 14134 14146 14158 14170]" offset="0" length="1062" at="0x000003b196b0"/></offsets>
                                    <content><NumpyArray format="B" shape="14170" data="87 32 70 85 76 ... 78 32 65 86 69" at="0x0000039d05c0">
                                        <parameters>
                                            <param key="__array__">"char"</param>
                                        </parameters>
                                    </NumpyArray></content>
                                </ListOffsetArray64>
                            </field>
                            <field index="1" key="TYPE">
                                <ListOffsetArray64>
                                    <parameters>
                                        <param key="__array__">"string"</param>
                                    </parameters>
                                    <offsets><Index64 i="[0 1 2 3 4 5 6 7 8 9 ... 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065]" offset="0" length="1062" at="0x000003b1c6c0"/></offsets>
                                    <content><NumpyArray format="B" shape="1065" data="52 52 49 52 52 ... 57 57 57 49 49" at="0x00000375d830">
                                        <parameters>
                                            <param key="__array__">"char"</param>
                                        </parameters>
                                    </NumpyArray></content>
                                </ListOffsetArray64>
                            </field>
                            <field index="2" key="BIKEROUTE">
                                <ListOffsetArray64>
                                    <parameters>
                                        <param key="__array__">"string"</param>
                                    </parameters>
                                    <offsets><Index64 i="[0 22 44 62 84 106 124 144 166 188 ... 22285 22312 22332 22359 22386 22413 22440 22467 22485 22503]" offset="0" length="1062" at="0x0000039b50b0"/></offsets>
                                    <content><NumpyArray format="B" shape="22503" data="82 69 67 79 77 ... 32 76 65 78 69" at="0x000003b0d000">
                                        <parameters>
                                            <param key="__array__">"char"</param>
                                        </parameters>
                                    </NumpyArray></content>
                                </ListOffsetArray64>
                            </field>
                            <field index="3" key="F_STREET">
                                <ListOffsetArray64>
                                    <parameters>
                                        <param key="__array__">"string"</param>
                                    </parameters>
                                    <offsets><Index64 i="[0 11 22 31 62 93 111 123 133 147 ... 13305 13317 13334 13343 13353 13365 13374 13386 13397 13410]" offset="0" length="1062" at="0x0000039b80c0"/></offsets>
                                    <content><NumpyArray format="B" shape="13410" data="87 32 71 82 65 ... 76 32 65 86 69" at="0x0000039cbc20">
                                        <parameters>
                                            <param key="__array__">"char"</param>
                                        </parameters>
                                    </NumpyArray></content>
                                </ListOffsetArray64>
                            </field>
                            <field index="4" key="T_STREET">
                                <IndexedOptionArray64>
                                    <index><Index64 i="[0 1 2 3 4 5 6 7 8 9 ... 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059]" offset="0" length="1061" at="0x0000039c2110"/></index>
                                    <content><ListOffsetArray64>
                                        <parameters>
                                            <param key="__array__">"string"</param>
                                        </parameters>
                                        <offsets><Index64 i="[0 11 22 31 50 79 91 107 118 130 ... 13520 13534 13546 13559 13569 13584 13593 13604 13619 13634]" offset="0" length="1061" at="0x0000039bf100"/></offsets>
                                        <content><NumpyArray format="B" shape="13634" data="87 32 71 82 65 ... 83 32 65 86 69" at="0x0000039c77c0">
                                            <parameters>
                                                <param key="__array__">"char"</param>
                                            </parameters>
                                        </NumpyArray></content>
                                    </ListOffsetArray64></content>
                                </IndexedOptionArray64>
                            </field>
                        </RecordArray>
                    </field>
                    <field index="2" key="geometry">
                        <RecordArray length="1061">
                            <field index="0" key="type">
                                <ListOffsetArray64>
                                    <parameters>
                                        <param key="__array__">"string"</param>
                                    </parameters>
                                    <offsets><Index64 i="[0 15 30 45 60 75 90 105 120 135 ... 15780 15795 15810 15825 15840 15855 15870 15885 15900 15915]" offset="0" length="1062" at="0x000003b1f6d0"/></offsets>
                                    <content><NumpyArray format="B" shape="15915" data="77 117 108 116 105 ... 116 114 105 110 103" at="0x000003b08ba0">
                                        <parameters>
                                            <param key="__array__">"char"</param>
                                        </parameters>
                                    </NumpyArray></content>
                                </ListOffsetArray64>
                            </field>
                            <field index="1" key="coordinates">
                                <ListOffsetArray64>
                                    <offsets><Index64 i="[0 1 2 3 4 5 6 7 8 9 ... 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084]" offset="0" length="1062" at="0x000003b226e0"/></offsets>
                                    <content><ListOffsetArray64>
                                        <offsets><Index64 i="[0 16 32 39 69 71 101 118 126 131 ... 48105 48110 48135 48146 48159 48172 48251 48331 48351 48362]" offset="0" length="1085" at="0x000003b13690"/></offsets>
                                        <content><ListOffsetArray64>
                                            <offsets><Index64 i="[0 2 4 6 8 10 12 14 16 18 ... 96706 96708 96710 96712 96714 96716 96718 96720 96722 96724]" offset="0" length="48363" at="0x000003a95650"/></offsets>
                                            <content><NumpyArray format="d" shape="96724" data="-87.7886 41.9237 -87.7886 41.9237 -87.7888 ... 41.9505 -87.7148 41.9507 -87.7153 41.951" at="0x000003bca9e0"/></content>
                                        </ListOffsetArray64></content>
                                    </ListOffsetArray64></content>
                                </ListOffsetArray64>
                            </field>
                        </RecordArray>
                    </field>
                </RecordArray></content>
            </ListOffsetArray64>
        </field>
    </RecordArray>
</Record>

Compatibility

The Awkward Array library is not intended to replace your data analysis tools. It adds one key feature: the ability to manipulate JSON-like data structures with NumPy-like idioms. It “plays well” with the scientific Python ecosystem, providing functions to convert arrays into forms recognized by other libraries and adheres to standard protocols for sharing data.

They can be converted to and from Apache Arrow:

ak.to_arrow(bikeroutes.features).type
StructType(struct<type: string not null, properties: struct<STREET: string not null, TYPE: string not null, BIKEROUTE: string not null, F_STREET: string not null, T_STREET: string> not null, geometry: struct<type: string not null, coordinates: large_list<item: large_list<item: large_list<item: double not null> not null> not null> not null> not null>)

To and from Parquet files (through pyarrow):

ak.to_parquet(bikeroutes.features, "/tmp/bikeroutes.parquet")

To and from JSON:

ak.to_json(bikeroutes.features)[:100]
'[{"type":"Feature","properties":{"STREET":"W FULLERTON AVE","TYPE":"4","BIKEROUTE":"RECOMMENDED BIKE'

To Pandas:

ak.to_pandas(bikeroutes.features)
type properties geometry
STREET TYPE BIKEROUTE F_STREET T_STREET type coordinates
entry subentry subsubentry subsubsubentry
0 0 0 0 Feature W FULLERTON AVE 4 RECOMMENDED BIKE ROUTE W GRAND AVE W GRAND AVE MultiLineString -87.788573
1 Feature W FULLERTON AVE 4 RECOMMENDED BIKE ROUTE W GRAND AVE W GRAND AVE MultiLineString 41.923652
1 0 Feature W FULLERTON AVE 4 RECOMMENDED BIKE ROUTE W GRAND AVE W GRAND AVE MultiLineString -87.788646
1 Feature W FULLERTON AVE 4 RECOMMENDED BIKE ROUTE W GRAND AVE W GRAND AVE MultiLineString 41.923651
2 0 Feature W FULLERTON AVE 4 RECOMMENDED BIKE ROUTE W GRAND AVE W GRAND AVE MultiLineString -87.788845
... ... ... ... ... ... ... ... ... ... ... ...
1060 0 8 1 Feature N ELSTON AVE 1 EXISTING BIKE LANE N KIMBALL AVE N ST. LOUIS AVE MultiLineString 41.950493
9 0 Feature N ELSTON AVE 1 EXISTING BIKE LANE N KIMBALL AVE N ST. LOUIS AVE MultiLineString -87.714819
1 Feature N ELSTON AVE 1 EXISTING BIKE LANE N KIMBALL AVE N ST. LOUIS AVE MultiLineString 41.950724
10 0 Feature N ELSTON AVE 1 EXISTING BIKE LANE N KIMBALL AVE N ST. LOUIS AVE MultiLineString -87.715284
1 Feature N ELSTON AVE 1 EXISTING BIKE LANE N KIMBALL AVE N ST. LOUIS AVE MultiLineString 41.951042

96724 rows × 8 columns

And to NumPy, if the arrays are first padded to be rectilinear:

ak.to_numpy(
    ak.pad_none(
        ak.pad_none(
            bikeroutes.features.geometry.coordinates, 1980, axis=2
        ), 7, axis=1
    )
)
masked_array(
  data=[[[[-87.78857268239116, 41.92365204796192],
          [-87.7886455918368, 41.9236514059218],
          [-87.78884498837314, 41.923649881816345],
          ...,
          [--, --],
          [--, --],
          [--, --]],

         [[--, --],
          [--, --],
          [--, --],
          ...,
          [--, --],
          [--, --],
          [--, --]],

         [[--, --],
          [--, --],
          [--, --],
          ...,
          [--, --],
          [--, --],
          [--, --]],

         ...,

         [[--, --],
          [--, --],
          [--, --],
          ...,
          [--, --],
          [--, --],
          [--, --]],

         [[--, --],
          [--, --],
          [--, --],
          ...,
          [--, --],
          [--, --],
          [--, --]],

         [[--, --],
          [--, --],
          [--, --],
          ...,
          [--, --],
          [--, --],
          [--, --]]],


        [[[-87.74815752805499, 41.914431860310785],
          [-87.74816482757203, 41.91443315985752],
          [-87.74819817563908, 41.914438543841555],
          ...,
          [--, --],
          [--, --],
          [--, --]],

         [[--, --],
          [--, --],
          [--, --],
          ...,
          [--, --],
          [--, --],
          [--, --]],

         [[--, --],
          [--, --],
          [--, --],
          ...,
          [--, --],
          [--, --],
          [--, --]],

         ...,

         [[--, --],
          [--, --],
          [--, --],
          ...,
          [--, --],
          [--, --],
          [--, --]],

         [[--, --],
          [--, --],
          [--, --],
          ...,
          [--, --],
          [--, --],
          [--, --]],

         [[--, --],
          [--, --],
          [--, --],
          ...,
          [--, --],
          [--, --],
          [--, --]]],


        [[[-87.61671282963914, 41.80391856700042],
          [-87.61670796416884, 41.8037002018836],
          [-87.6166999837239, 41.803396031187475],
          ...,
          [--, --],
          [--, --],
          [--, --]],

         [[--, --],
          [--, --],
          [--, --],
          ...,
          [--, --],
          [--, --],
          [--, --]],

         [[--, --],
          [--, --],
          [--, --],
          ...,
          [--, --],
          [--, --],
          [--, --]],

         ...,

         [[--, --],
          [--, --],
          [--, --],
          ...,
          [--, --],
          [--, --],
          [--, --]],

         [[--, --],
          [--, --],
          [--, --],
          ...,
          [--, --],
          [--, --],
          [--, --]],

         [[--, --],
          [--, --],
          [--, --],
          ...,
          [--, --],
          [--, --],
          [--, --]]],


        ...,


        [[[-87.7479094892182, 41.972879322109975],
          [-87.74791172002767, 41.97288067814664],
          [-87.74822576170033, 41.973074647968],
          ...,
          [--, --],
          [--, --],
          [--, --]],

         [[--, --],
          [--, --],
          [--, --],
          ...,
          [--, --],
          [--, --],
          [--, --]],

         [[--, --],
          [--, --],
          [--, --],
          ...,
          [--, --],
          [--, --],
          [--, --]],

         ...,

         [[--, --],
          [--, --],
          [--, --],
          ...,
          [--, --],
          [--, --],
          [--, --]],

         [[--, --],
          [--, --],
          [--, --],
          ...,
          [--, --],
          [--, --],
          [--, --]],

         [[--, --],
          [--, --],
          [--, --],
          ...,
          [--, --],
          [--, --],
          [--, --]]],


        [[[-87.76121259508096, 41.98106473442933],
          [-87.76137581596201, 41.981163817649694],
          [-87.76162919929251, 41.98131861792552],
          ...,
          [--, --],
          [--, --],
          [--, --]],

         [[--, --],
          [--, --],
          [--, --],
          ...,
          [--, --],
          [--, --],
          [--, --]],

         [[--, --],
          [--, --],
          [--, --],
          ...,
          [--, --],
          [--, --],
          [--, --]],

         ...,

         [[--, --],
          [--, --],
          [--, --],
          ...,
          [--, --],
          [--, --],
          [--, --]],

         [[--, --],
          [--, --],
          [--, --],
          ...,
          [--, --],
          [--, --],
          [--, --]],

         [[--, --],
          [--, --],
          [--, --],
          ...,
          [--, --],
          [--, --],
          [--, --]]],


        [[[-87.71279113444339, 41.949330392505395],
          [-87.71333566982489, 41.949705971327184],
          [-87.71350481728275, 41.94982249839977],
          ...,
          [--, --],
          [--, --],
          [--, --]],

         [[--, --],
          [--, --],
          [--, --],
          ...,
          [--, --],
          [--, --],
          [--, --]],

         [[--, --],
          [--, --],
          [--, --],
          ...,
          [--, --],
          [--, --],
          [--, --]],

         ...,

         [[--, --],
          [--, --],
          [--, --],
          ...,
          [--, --],
          [--, --],
          [--, --]],

         [[--, --],
          [--, --],
          [--, --],
          ...,
          [--, --],
          [--, --],
          [--, --]],

         [[--, --],
          [--, --],
          [--, --],
          ...,
          [--, --],
          [--, --],
          [--, --]]]],
  mask=[[[[False, False],
          [False, False],
          [False, False],
          ...,
          [ True,  True],
          [ True,  True],
          [ True,  True]],

         [[ True,  True],
          [ True,  True],
          [ True,  True],
          ...,
          [ True,  True],
          [ True,  True],
          [ True,  True]],

         [[ True,  True],
          [ True,  True],
          [ True,  True],
          ...,
          [ True,  True],
          [ True,  True],
          [ True,  True]],

         ...,

         [[ True,  True],
          [ True,  True],
          [ True,  True],
          ...,
          [ True,  True],
          [ True,  True],
          [ True,  True]],

         [[ True,  True],
          [ True,  True],
          [ True,  True],
          ...,
          [ True,  True],
          [ True,  True],
          [ True,  True]],

         [[ True,  True],
          [ True,  True],
          [ True,  True],
          ...,
          [ True,  True],
          [ True,  True],
          [ True,  True]]],


        [[[False, False],
          [False, False],
          [False, False],
          ...,
          [ True,  True],
          [ True,  True],
          [ True,  True]],

         [[ True,  True],
          [ True,  True],
          [ True,  True],
          ...,
          [ True,  True],
          [ True,  True],
          [ True,  True]],

         [[ True,  True],
          [ True,  True],
          [ True,  True],
          ...,
          [ True,  True],
          [ True,  True],
          [ True,  True]],

         ...,

         [[ True,  True],
          [ True,  True],
          [ True,  True],
          ...,
          [ True,  True],
          [ True,  True],
          [ True,  True]],

         [[ True,  True],
          [ True,  True],
          [ True,  True],
          ...,
          [ True,  True],
          [ True,  True],
          [ True,  True]],

         [[ True,  True],
          [ True,  True],
          [ True,  True],
          ...,
          [ True,  True],
          [ True,  True],
          [ True,  True]]],


        [[[False, False],
          [False, False],
          [False, False],
          ...,
          [ True,  True],
          [ True,  True],
          [ True,  True]],

         [[ True,  True],
          [ True,  True],
          [ True,  True],
          ...,
          [ True,  True],
          [ True,  True],
          [ True,  True]],

         [[ True,  True],
          [ True,  True],
          [ True,  True],
          ...,
          [ True,  True],
          [ True,  True],
          [ True,  True]],

         ...,

         [[ True,  True],
          [ True,  True],
          [ True,  True],
          ...,
          [ True,  True],
          [ True,  True],
          [ True,  True]],

         [[ True,  True],
          [ True,  True],
          [ True,  True],
          ...,
          [ True,  True],
          [ True,  True],
          [ True,  True]],

         [[ True,  True],
          [ True,  True],
          [ True,  True],
          ...,
          [ True,  True],
          [ True,  True],
          [ True,  True]]],


        ...,


        [[[False, False],
          [False, False],
          [False, False],
          ...,
          [ True,  True],
          [ True,  True],
          [ True,  True]],

         [[ True,  True],
          [ True,  True],
          [ True,  True],
          ...,
          [ True,  True],
          [ True,  True],
          [ True,  True]],

         [[ True,  True],
          [ True,  True],
          [ True,  True],
          ...,
          [ True,  True],
          [ True,  True],
          [ True,  True]],

         ...,

         [[ True,  True],
          [ True,  True],
          [ True,  True],
          ...,
          [ True,  True],
          [ True,  True],
          [ True,  True]],

         [[ True,  True],
          [ True,  True],
          [ True,  True],
          ...,
          [ True,  True],
          [ True,  True],
          [ True,  True]],

         [[ True,  True],
          [ True,  True],
          [ True,  True],
          ...,
          [ True,  True],
          [ True,  True],
          [ True,  True]]],


        [[[False, False],
          [False, False],
          [False, False],
          ...,
          [ True,  True],
          [ True,  True],
          [ True,  True]],

         [[ True,  True],
          [ True,  True],
          [ True,  True],
          ...,
          [ True,  True],
          [ True,  True],
          [ True,  True]],

         [[ True,  True],
          [ True,  True],
          [ True,  True],
          ...,
          [ True,  True],
          [ True,  True],
          [ True,  True]],

         ...,

         [[ True,  True],
          [ True,  True],
          [ True,  True],
          ...,
          [ True,  True],
          [ True,  True],
          [ True,  True]],

         [[ True,  True],
          [ True,  True],
          [ True,  True],
          ...,
          [ True,  True],
          [ True,  True],
          [ True,  True]],

         [[ True,  True],
          [ True,  True],
          [ True,  True],
          ...,
          [ True,  True],
          [ True,  True],
          [ True,  True]]],


        [[[False, False],
          [False, False],
          [False, False],
          ...,
          [ True,  True],
          [ True,  True],
          [ True,  True]],

         [[ True,  True],
          [ True,  True],
          [ True,  True],
          ...,
          [ True,  True],
          [ True,  True],
          [ True,  True]],

         [[ True,  True],
          [ True,  True],
          [ True,  True],
          ...,
          [ True,  True],
          [ True,  True],
          [ True,  True]],

         ...,

         [[ True,  True],
          [ True,  True],
          [ True,  True],
          ...,
          [ True,  True],
          [ True,  True],
          [ True,  True]],

         [[ True,  True],
          [ True,  True],
          [ True,  True],
          ...,
          [ True,  True],
          [ True,  True],
          [ True,  True]],

         [[ True,  True],
          [ True,  True],
          [ True,  True],
          ...,
          [ True,  True],
          [ True,  True],
          [ True,  True]]]],
  fill_value=1e+20)

Where to go next

The rest of these tutorials show how to use Awkward Array with various libraries, as well as how to do things that only Awkward Array can do. They are organized by task: see the left-bar (≡ button on mobile) for what you’re trying to do. If, however, you’re looking for documentation on a specific function, see the Python and C++ references below.

Python
API reference

C++
API reference