ak.str.to_categorical#
Defined in awkward.operations.str.akstr_to_categorical on line 13.
- ak.str.to_categorical(array, *, highlevel=True, behavior=None, attrs=None)#
- Parameters:
array – Array-like data (anything
ak.to_layoutrecognizes).highlevel (bool) – If True, return an
ak.Array; otherwise, return a low-levelak.contents.Contentsubclass.behavior (None or dict) – Custom
ak.behaviorfor the output array, if high-level.attrs (None or dict) – Custom attributes for the output array, if high-level.
Returns a dictionary-encoded version of the given array of strings. Creates a categorical dataset, which has the following properties:
only distinct values (categories) are stored in their entirety,
pointers to those distinct values are represented by integers (an
ak.contents.IndexedArrayorak.contents.IndexedOptionArraylabeled with parameter"__array__" = "categorical".
This is equivalent to R’s “factor”, and Pandas’s “categorical”. It differs from generic uses of
ak.contents.IndexedArrayandak.contents.IndexedOptionArrayin Awkward Arrays by the guarantee of no duplicate categories and the"categorical"parameter.Unlike Arrow’s
dictionary_encode, this function has nonull_handlingargument. This function’s behavior is like``null_handling=”mask”`` (Arrow’s default). It is not possible to encode null values in Awkward Array, asak.contents.IndexedOptionArraycannot contain an option type node.Note: this function does not raise an error if the
arraydoes not contain any string or bytestring data.Requires the pyarrow library and calls pyarrow.compute.dictionary_encode.