ak.to_arrow#
Defined in awkward.operations.ak_to_arrow on line 15.
- ak.to_arrow(array, *, list_to32=False, string_to32=False, bytestring_to32=False, emptyarray_to=None, categorical_as_dictionary=False, extensionarray=True, count_nulls=True)#
- Parameters:
array – Array-like data (anything
ak.to_layoutrecognizes).list_to32 (bool) – If True, convert Awkward lists into 32-bit Arrow lists if they’re small enough, even if it means an extra conversion. Otherwise, signed 32-bit
ak.types.ListTypemaps to ArrowListType, signed 64-bitak.types.ListTypemaps to ArrowLargeListType, and unsigned 32-bitak.types.ListTypepicks whichever Arrow type its values fit into.string_to32 (bool) – Same as the above for Arrow
stringandlarge_string.bytestring_to32 (bool) – Same as the above for Arrow
binaryandlarge_binary.emptyarray_to (None or dtype) – If None,
ak.types.UnknownTypemaps to Arrow’s null type; otherwise, it is converted a given numeric dtype.categorical_as_dictionary (bool) – If True,
ak.contents.IndexedArrayandak.contents.IndexedOptionArraylabeled with__array__ = "categorical"are mapped to ArrowDictionaryArray; otherwise, the projection is evaluated before conversion (always the case without__array__ = "categorical").extensionarray (bool) – If True, this function returns extended Arrow arrays (at all levels of nesting), which preserve metadata so that Awkward → Arrow → Awkward preserves the array’s
ak.types.Type(though not theak.forms.Form). If False, this function returns generic Arrow arrays that might be needed for third-party tools that don’t recognize Arrow’s extensions. Even withextensionarray=False, the values produced by Arrow’sto_pylistmethod are the same as the values produced by Awkward’sak.to_list.count_nulls (bool) – If True, count the number of missing values at each level and include these in the resulting Arrow array, which makes some downstream applications faster. If False, skip the up-front cost of counting them.
Converts an Awkward Array into an Apache Arrow array.
This produces arrays of type
pyarrow.Array. You might need to further manipulations (using the pyarrow library) to build apyarrow.ChunkedArray, apyarrow.RecordBatch, or apyarrow.Table. For the latter, seeak.to_arrow_table.This function always preserves the values of a dataset; i.e. the Python objects returned by
ak.to_listare identical to the Python objects returned by Arrow’sto_pylistmethod. Withextensionarray=True, this function also preserves the data type (high-levelak.types.Type, though not the low-levelak.forms.Form), even through Parquet, making Parquet a good way to save Awkward Arrays for later use. If any third-party tools don’t recognize Arrow’s extension arrays, set this option to False for plain Arrow arrays.See also
ak.from_arrow,ak.to_arrow_table,ak.to_parquet,ak.from_arrow_schema.