ak.from_json
------------

.. py:module: ak.from_json

Defined in `awkward.operations.ak_from_json <https://github.com/scikit-hep/awkward-1.0/blob/21f485b45e31dcd53968a279e47e4df4596a194b/src/awkward/operations/ak_from_json.py>`__ on `line 16 <https://github.com/scikit-hep/awkward-1.0/blob/21f485b45e31dcd53968a279e47e4df4596a194b/src/awkward/operations/ak_from_json.py#L16>`__.

.. py:function:: ak.from_json(source)


    :param source: Data source of the
               JSON-formatted string(s). If bytes/str, the string is parsed. If a
               ``pathlib.Path``, a file with that name is opened, parsed, and closed.
               If that path has a URI protocol (like ``"https://"`` or ``"s3://"``), this
               function attempts to open the file with the fsspec library. If a
               file-like object with a ``read`` method, this function reads from the
               object, but does not close it.
    :type source: bytes/str, pathlib.Path, or file-like object
    :param line_delimited: If False, a single JSON document is read as an
                       entire array or record. If True, this function reads line-delimited
                       JSON into an array (regardless of how many there are). The line
                       delimiter is not actually checked, so it may be ``"\n"``, ``"\r\n"``
                       or anything else.
    :type line_delimited: bool
    :param schema: If None, the data type
               is discovered while parsing. If a JSONSchema
               (`json-schema.org <https://json-schema.org/>`__), that schema is used to
               parse the JSON more quickly by skipping type-discovery.
    :type schema: None, JSON str or equivalent lists/dicts
    :param nan_string: If not None, strings with this value will be
                   interpreted as floating-point NaN values.
    :type nan_string: None or str
    :param posinf_string: If not None, strings with this value will
                      be interpreted as floating-point positive infinity values.
    :type posinf_string: None or str
    :param neginf_string: If not None, strings with this value
                      will be interpreted as floating-point negative infinity values.
    :type neginf_string: None or str
    :param complex_record_fields: If not None, defines a pair of
                              field names to interpret 2-field records as complex numbers.
    :type complex_record_fields: None or (str, str)
    :param buffersize: Number of bytes in each read from source: larger
                   values use more memory but read less frequently. (Python GIL is
                   released before and after read events.)
    :type buffersize: int
    :param initial: Initial size (in bytes) of buffers used by the ``ak::ArrayBuilder``.
    :type initial: int
    :param resize: Resize multiplier for buffers used by the ``ak::ArrayBuilder``;
               should be strictly greater than 1.
    :type resize: float
    :param highlevel: If True, return an :py:obj:`ak.Array`; otherwise, return
                  a low-level :py:obj:`ak.contents.Content` subclass.
    :type highlevel: bool
    :param behavior: Custom :py:obj:`ak.behavior` for the output array, if
                 high-level.
    :type behavior: None or dict

Converts a JSON string into an Awkward Array.

There are a few different dichotomies in JSON-reading; all of the combinations
are supported:

* Reading from in-memory str/bytes, on-disk or over-network file, or an
  arbitrary Python object with a ``read(num_bytes)`` method.
* Reading a single JSON document or a sequence of line-delimited documents.
* Unknown schema (slow and general) or with a provided JSONSchema (fast, but
  not all possible cases are supported).
* Conversion of strings representing not-a-number, plus and minus infinity
  into the appropriate floating-point numbers.
* Conversion of records with a real and imaginary part into complex numbers.

Non-JSON features not allowed, including literals for not-a-number or infinite
numbers; they must be quoted strings for ``nan_string``, ``posinf_string``, and
``neginf_string`` to recognize. The document or line-delimited documents must
adhere to the strict `JSON schema <https://www.json.org/>`__.

Sources
=======

In-memory strings or bytes are simply passed as the first argument:

.. code-block:: python


    >>> ak.from_json("[[1.1, 2.2, 3.3], [], [4.4, 5.5]]")
    <Array [[1.1, 2.2, 3.3], [], [4.4, 5.5]] type='3 * var * float64'>

File names/paths need to be wrapped in ``pathlib.Path``, and remote files are
recognized by URI protocol (like ``"https://"`` or ``"s3://"``) and handled by fsspec
(which must be installed).

.. code-block:: python


    >>> import pathlib
    >>> with open("tmp.json", "w") as file:
    ...     file.write("[[1.1, 2.2, 3.3], [], [4.4, 5.5]]")
    ...
    33
    >>> ak.from_json(pathlib.Path("tmp.json"))
    <Array [[1.1, 2.2, 3.3], [], [4.4, 5.5]] type='3 * var * float64'>

And any object with a ``read(num_bytes)`` method can be used as the ``source``.

.. code-block:: python


    >>> class HasReadMethod:
    ...     def __init__(self, data):
    ...         self.bytes = data.encode()
    ...         self.pos = 0
    ...     def read(self, num_bytes):
    ...         start = self.pos
    ...         self.pos += num_bytes
    ...         return self.bytes[start:self.pos]
    ...
    >>> filelike_obj = HasReadMethod("[[1.1, 2.2, 3.3], [], [4.4, 5.5]]")
    >>> ak.from_json(filelike_obj)
    <Array [[1.1, 2.2, 3.3], [], [4.4, 5.5]] type='3 * var * float64'>

If this function opens a file or network connection (because it is passed as
a ``pathlib.Path``), then this function will also close that file or connection.

If this function is provided a file-like object with a ``read(num_bytes)`` method,
this function will not close it. (It might not even have a ``close`` method.)

Data structures
===============

This function interprets JSON arrays and JSON objects in the same way that
:py:obj:`ak.from_iter` interprets Python lists and Python dicts. It could be used as a
synonym for Python's ``json.loads`` followed by :py:obj:`ak.from_iter`, but the direct
JSON-reading is faster (especially with a schema) and uses less memory.

Consider

.. code-block:: python


    >>> import json
    >>> json_data = "[[1.1, 2.2, 3.3], [], [4.4, 5.5]]"
    >>> ak.from_iter(json.loads(json_data))
    <Array [[1.1, 2.2, 3.3], [], [4.4, 5.5]] type='3 * var * float64'>
    >>> ak.from_json(json_data)
    <Array [[1.1, 2.2, 3.3], [], [4.4, 5.5]] type='3 * var * float64'>

and

.. code-block:: python


    >>> json_data = '{"x": 1.1, "y": [1, 2]}'
    >>> ak.from_iter(json.loads(json_data))
    <Record {x: 1.1, y: [1, 2]} type='{x: float64, y: var * int64}'>
    >>> ak.from_json(json_data)
    <Record {x: 1.1, y: [1, 2]} type='{x: float64, y: var * int64}'>

As shown above, reading JSON may result in :py:obj:`ak.Array` or :py:obj:`ak.Record`, but line-delimited
(``line_delimited=True``) only results in :py:obj:`ak.Array`:

.. code-block:: python


    >>> ak.from_json(
    ...     '{"x": 1.1, "y": [1]}\n{"x": 2.2, "y": [1, 2]}\n{"x": 3.3, "y": [1, 2, 3]}',
    ...     line_delimited=True,
    ... )
    <Array [{x: 1.1, y: [1]}, ..., {x: 3.3, ...}] type='3 * {x: float64, y: var...'>

Even arrays of length zero:

.. code-block:: python


    >>> ak.from_json("", line_delimited=True)
    <Array [] type='0 * unknown'>

Note that JSON interpreted with ``line_delimited`` doesn't actually need delimiters
between JSON documents or an absence of delimiters within each document. Parsing
with ``line_delimited=True`` continues to the end of a JSON document and starts
again with the next JSON document. It may be necessary to require actual delimiters
between and never within JSON documents to split a large source for
parallel-processing, but that consideration is beyond this function.

If a JSONSchema is provided, the schema describes the structure of the JSON
document, regardless of whether there's only one of them (may be an :py:obj:`ak.Record`)
or many of them (must be an :py:obj:`ak.Array`).

.. code-block:: python


    >>> schema = {
    ...     "type": "object",
    ...     "properties": {
    ...         "x": {"type": "number"},
    ...         "y": {"type": "array", "items": {"type": "integer"}},
    ...     },
    ...     "required": ["x", "y"],
    ... }

    >>> ak.from_json(
    ...     '{"x": 1.1, "y": [1, 2, 3]}',
    ...     schema=schema,
    ... )
    <Record {x: 1.1, y: [1, ..., 3]} type='{x: float64, y: var * int64}'>

    >>> ak.from_json(
    ...     '{"x": 1.1, "y": [1]}\n{"x": 2.2, "y": [1, 2]}\n{"x": 3.3, "y": [1, 2, 3]}',
    ...     schema=schema,
    ...     line_delimited=True,
    ... )
    <Array [{x: 1.1, y: [1]}, ..., {x: 3.3, ...}] type='3 * {x: float64, y: var...'>

All numbers in the final array are signed 64-bit (integers and floating-point).

JSONSchemas
===========

This function supports a subset of JSONSchema (see the
`JSONSchema specification <https://json-schema.org/>`__). The schemas may be passed
as JSON text or as Python lists and dicts representing JSON, but the following
conditions apply:

* The root of the schema must be ``"type": "array"`` or ``"type": "object"``.
* Every level must have a ``"type"``, which can only name one type (as a string
  or length-1 list) or one type and ``"null"`` (as a length-2 list).
* ``"type": "boolean"`` → 1-byte boolean values.
* ``"type": "integer"`` → 8-byte integer values. If a part of the schema
  is declared to have integer type but the JSON numbers are expressed as
  floating-point, such as ``3.14``, ``3.0``, or ``3e0``, this function raises an
  error.
* ``"type": "number"`` → 8-byte floating-point values. If used with
  this function's ``nan_string``, ``posinf_string``, and/or ``neginf_string``, the
  value in the JSON could be a string, as long as it matches one of these
  three.
* ``"type": "string"`` → UTF-8 encoded strings. All JSON escape sequences are
  supported. Remember that the ``source`` data are ASCII; Unicode is derived from
  "``\uXXXX``" escape sequences. If an ``"enum"`` is given, strings are represented
  as categorical values (:py:obj:`ak.contents.IndexedArray` or :py:obj:`ak.contents.IndexedOptionArray`).
* ``"type": "array"`` → nested lists. The ``"items"`` must be specified. If
  ``"minItems"`` and ``"maxItems"`` are specified and equal to each other, the
  list has regular-type (:py:obj:`ak.types.RegularType`); otherwise, it has variable-length
  type (:py:obj:`ak.types.ListType`).
* ``"type": "object"`` → nested records. The ``"properties"`` must be specified,
  and any properties in the data not described by ``"properties"`` will not
  appear in the output.

Substitutions for non-finite and complex numbers
================================================

JSON doesn't support not-a-number values, infinite values, or complex number
types (as in numbers with a real and imaginary part). Some work-arounds use
non-JSON syntax, but this function converts valid JSON into these numbers with
user-specified rules.

The ``nan_string``, ``posinf_string``, and ``neginf_string`` convert quoted strings
into floating-point numbers. You can specify what these strings are.

.. code-block:: python


    >>> ak.from_json(
    ...     '[1, 2, "nan", "inf", "-inf"]',
    ...     nan_string="nan",
    ...     posinf_string="inf",
    ...     neginf_string="-inf",
    ... )
    <Array [1, 2, nan, inf, -inf] type='5 * float64'>

Without these rules, the array would be interpreted as a union of numbers and
strings:

.. code-block:: python


    >>> ak.from_json(
    ...     '[1, 2, "nan", "inf", "-inf"]',
    ... )
    <Array [1, 2, 'nan', 'inf', '-inf'] type='5 * union[int64, string]'>

When combined with a JSONSchema, you need to say that these values have type
``"number"``, not a union of strings and numbers (i.e. the conversion is performed
*before* schema-validation). Note that they can't be ``"integer"``, since
not-a-number and infinite values are only possible for floating-point numbers.

.. code-block:: python


    >>> ak.from_json(
    ...     '[1, 2, "nan", "inf", "-inf"]',
    ...     nan_string="nan",
    ...     posinf_string="inf",
    ...     neginf_string="-inf",
    ...     schema={"type": "array", "items": {"type": "number"}}
    ... )
    <Array [1, 2, nan, inf, -inf] type='5 * float64'>

The ``complex_record_fields`` is a 2-tuple of field names (strings) of objects
to identify as the real and imaginary parts of complex numbers. Complex number
representations in JSON vary, though most are JSON objects with real and
imaginary parts and possibly other fields. Any other fields will be excluded
from the output array.

.. code-block:: python


    >>> ak.from_json(
    ...     '[{"r": 1, "i": 1.1, "other": ""}, {"r": 2, "i": 2.2, "other": ""}]',
    ...     complex_record_fields=("r", "i"),
    ... )
    <Array [1+1.1j, 2+2.2j] type='2 * complex128'>

Without this rule, the array would be interpreted as an array of records:

.. code-block:: python


    >>> ak.from_json(
    ...     '[{"r": 1, "i": 1.1, "other": ""}, {"r": 2, "i": 2.2, "other": ""}]',
    ... )
    <Array [{r: 1, i: 1.1, other: ''}, {...}] type='2 * {r: int64, i: float64, ...'>

When combined with a JSONSchema, you need to specify the object type (i.e. the
conversion is performed *after* schema-validation). Note that even the fields
that will be ignored by ``complex_record_fields`` need to be specified.

.. code-block:: python


    >>> ak.from_json(
    ...     '[{"r": 1, "i": 1.1, "other": ""}, {"r": 2, "i": 2.2, "other": ""}]',
    ...     complex_record_fields=("r", "i"),
    ...     schema={
    ...         "type": "array",
    ...         "items": {
    ...             "type": "object",
    ...             "properties": {
    ...                 "r": {"type": "number"},
    ...                 "i": {"type": "number"},
    ...                 "other": {"type": "string"},
    ...             },
    ...             "required": ["r", "i"],
    ...         },
    ...     },
    ... )
    <Array [1+1.1j, 2+2.2j] type='2 * complex128'>

See also :py:obj:`ak.to_json`.