ak.str.to_categorical --------------------- .. py:module: ak.str.to_categorical Defined in `awkward.operations.str.akstr_to_categorical `__ on `line 13 `__. .. py:function:: ak.str.to_categorical(array, *, highlevel=True, behavior=None, attrs=None) :param array: Array-like data (anything :py:obj:`ak.to_layout` recognizes). :param highlevel: If True, return an :py:obj:`ak.Array`; otherwise, return a low-level :py:obj:`ak.contents.Content` subclass. :type highlevel: bool :param behavior: Custom :py:obj:`ak.behavior` for the output array, if high-level. :type behavior: None or dict :param attrs: Custom attributes for the output array, if high-level. :type attrs: None or dict Returns a dictionary-encoded version of the given array of strings. Creates a categorical dataset, which has the following properties: * only distinct values (categories) are stored in their entirety, * pointers to those distinct values are represented by integers (an :py:obj:`ak.contents.IndexedArray` or :py:obj:`ak.contents.IndexedOptionArray` labeled with parameter ``"__array__" = "categorical"``. This is equivalent to R's "factor", and Pandas's "categorical". It differs from generic uses of :py:obj:`ak.contents.IndexedArray` and :py:obj:`ak.contents.IndexedOptionArray` in Awkward Arrays by the guarantee of no duplicate categories and the ``"categorical"`` parameter. Unlike Arrow's ``dictionary_encode``, this function has no ``null_handling`` argument. This function's behavior is like``null_handling="mask"`` (Arrow's default). It is not possible to encode null values in Awkward Array, as :py:obj:`ak.contents.IndexedOptionArray` cannot contain an option type node. Note: this function does not raise an error if the ``array`` does not contain any string or bytestring data. Requires the pyarrow library and calls `pyarrow.compute.dictionary_encode `__.