ak.from_json ------------ .. py:module: ak.from_json Defined in `awkward.operations.ak_from_json `__ on `line 29 `__. .. py:function:: ak.from_json(source, *, line_delimited=False, schema=None, nan_string=None, posinf_string=None, neginf_string=None, complex_record_fields=None, buffersize=65536, initial=1024, resize=8, highlevel=True, behavior=None, attrs=None) :param source: Data source of the JSON-formatted string(s). If bytes/str, the string is parsed. If a ``pathlib.Path``, a file with that name is opened, parsed, and closed. If that path has a URI protocol (like ``"https://"`` or ``"s3://"``), this function attempts to open the file with the fsspec library. If a file-like object with a ``read`` method, this function reads from the object, but does not close it. :type source: bytes/str, pathlib.Path, or file-like object :param line_delimited: If False, a single JSON document is read as an entire array or record. If True, this function reads line-delimited JSON into an array (regardless of how many there are). The line delimiter is not actually checked, so it may be ``"\n"``, ``"\r\n"`` or anything else. :type line_delimited: bool :param schema: If None, the data type is discovered while parsing. If a JSONSchema (`json-schema.org `__), that schema is used to parse the JSON more quickly by skipping type-discovery. :type schema: None, JSON str or equivalent lists/dicts :param nan_string: If not None, strings with this value will be interpreted as floating-point NaN values. :type nan_string: None or str :param posinf_string: If not None, strings with this value will be interpreted as floating-point positive infinity values. :type posinf_string: None or str :param neginf_string: If not None, strings with this value will be interpreted as floating-point negative infinity values. :type neginf_string: None or str :param complex_record_fields: If not None, defines a pair of field names to interpret 2-field records as complex numbers. :type complex_record_fields: None or (str, str) :param buffersize: Number of bytes in each read from source: larger values use more memory but read less frequently. (Python GIL is released before and after read events.) :type buffersize: int :param initial: Initial size (in bytes) of buffers used by the ``ak::ArrayBuilder``. :type initial: int :param resize: Resize multiplier for buffers used by the ``ak::ArrayBuilder``; should be strictly greater than 1. :type resize: float :param highlevel: If True, return an :py:obj:`ak.Array`; otherwise, return a low-level :py:obj:`ak.contents.Content` subclass. :type highlevel: bool :param behavior: Custom :py:obj:`ak.behavior` for the output array, if high-level. :type behavior: None or dict :param attrs: Custom attributes for the output array, if high-level. :type attrs: None or dict Converts a JSON string into an Awkward Array. There are a few different dichotomies in JSON-reading; all of the combinations are supported: * Reading from in-memory str/bytes, on-disk or over-network file, or an arbitrary Python object with a ``read(num_bytes)`` method. * Reading a single JSON document or a sequence of line-delimited documents. * Unknown schema (slow and general) or with a provided JSONSchema (fast, but not all possible cases are supported). * Conversion of strings representing not-a-number, plus and minus infinity into the appropriate floating-point numbers. * Conversion of records with a real and imaginary part into complex numbers. Non-JSON features not allowed, including literals for not-a-number or infinite numbers; they must be quoted strings for ``nan_string``, ``posinf_string``, and ``neginf_string`` to recognize. The document or line-delimited documents must adhere to the strict `JSON schema `__. Sources ======= In-memory strings or bytes are simply passed as the first argument: .. code-block:: python >>> ak.from_json("[[1.1, 2.2, 3.3], [], [4.4, 5.5]]") File names/paths need to be wrapped in ``pathlib.Path``, and remote files are recognized by URI protocol (like ``"https://"`` or ``"s3://"``) and handled by fsspec (which must be installed). .. code-block:: python >>> import pathlib >>> with open("tmp.json", "w") as file: ... file.write("[[1.1, 2.2, 3.3], [], [4.4, 5.5]]") ... 33 >>> ak.from_json(pathlib.Path("tmp.json")) And any object with a ``read(num_bytes)`` method can be used as the ``source``. .. code-block:: python >>> class HasReadMethod: ... def __init__(self, data): ... self.bytes = data.encode() ... self.pos = 0 ... def read(self, num_bytes): ... start = self.pos ... self.pos += num_bytes ... return self.bytes[start:self.pos] ... >>> filelike_obj = HasReadMethod("[[1.1, 2.2, 3.3], [], [4.4, 5.5]]") >>> ak.from_json(filelike_obj) If this function opens a file or network connection (because it is passed as a ``pathlib.Path``), then this function will also close that file or connection. If this function is provided a file-like object with a ``read(num_bytes)`` method, this function will not close it. (It might not even have a ``close`` method.) Data structures =============== This function interprets JSON arrays and JSON objects in the same way that :py:obj:`ak.from_iter` interprets Python lists and Python dicts. It could be used as a synonym for Python's ``json.loads`` followed by :py:obj:`ak.from_iter`, but the direct JSON-reading is faster (especially with a schema) and uses less memory. Consider .. code-block:: python >>> import json >>> json_data = "[[1.1, 2.2, 3.3], [], [4.4, 5.5]]" >>> ak.from_iter(json.loads(json_data)) >>> ak.from_json(json_data) and .. code-block:: python >>> json_data = '{"x": 1.1, "y": [1, 2]}' >>> ak.from_iter(json.loads(json_data)) >>> ak.from_json(json_data) As shown above, reading JSON may result in :py:obj:`ak.Array` or :py:obj:`ak.Record`, but line-delimited (``line_delimited=True``) only results in :py:obj:`ak.Array`: .. code-block:: python >>> ak.from_json( ... '{"x": 1.1, "y": [1]}\n{"x": 2.2, "y": [1, 2]}\n{"x": 3.3, "y": [1, 2, 3]}', ... line_delimited=True, ... ) Even arrays of length zero: .. code-block:: python >>> ak.from_json("", line_delimited=True) Note that JSON interpreted with ``line_delimited`` doesn't actually need delimiters between JSON documents or an absence of delimiters within each document. Parsing with ``line_delimited=True`` continues to the end of a JSON document and starts again with the next JSON document. It may be necessary to require actual delimiters between and never within JSON documents to split a large source for parallel-processing, but that consideration is beyond this function. If a JSONSchema is provided, the schema describes the structure of the JSON document, regardless of whether there's only one of them (may be an :py:obj:`ak.Record`) or many of them (must be an :py:obj:`ak.Array`). .. code-block:: python >>> schema = { ... "type": "object", ... "properties": { ... "x": {"type": "number"}, ... "y": {"type": "array", "items": {"type": "integer"}}, ... }, ... "required": ["x", "y"], ... } >>> ak.from_json( ... '{"x": 1.1, "y": [1, 2, 3]}', ... schema=schema, ... ) >>> ak.from_json( ... '{"x": 1.1, "y": [1]}\n{"x": 2.2, "y": [1, 2]}\n{"x": 3.3, "y": [1, 2, 3]}', ... schema=schema, ... line_delimited=True, ... ) All numbers in the final array are signed 64-bit (integers and floating-point). JSONSchemas =========== This function supports a subset of JSONSchema (see the `JSONSchema specification `__). The schemas may be passed as JSON text or as Python lists and dicts representing JSON, but the following conditions apply: * The root of the schema must be ``"type": "array"`` or ``"type": "object"``. * Every level must have a ``"type"``, which can only name one type (as a string or length-1 list) or one type and ``"null"`` (as a length-2 list). * ``"type": "boolean"`` → 1-byte boolean values. * ``"type": "integer"`` → 8-byte integer values. If a part of the schema is declared to have integer type but the JSON numbers are expressed as floating-point, such as ``3.14``, ``3.0``, or ``3e0``, this function raises an error. * ``"type": "number"`` → 8-byte floating-point values. If used with this function's ``nan_string``, ``posinf_string``, and/or ``neginf_string``, the value in the JSON could be a string, as long as it matches one of these three. * ``"type": "string"`` → UTF-8 encoded strings. All JSON escape sequences are supported. Remember that the ``source`` data are ASCII; Unicode is derived from "``\uXXXX``" escape sequences. If an ``"enum"`` is given, strings are represented as categorical values (:py:obj:`ak.contents.IndexedArray` or :py:obj:`ak.contents.IndexedOptionArray`). * ``"type": "array"`` → nested lists. The ``"items"`` must be specified. If ``"minItems"`` and ``"maxItems"`` are specified and equal to each other, the list has regular-type (:py:obj:`ak.types.RegularType`); otherwise, it has variable-length type (:py:obj:`ak.types.ListType`). * ``"type": "object"`` → nested records. The ``"properties"`` must be specified, and any properties in the data not described by ``"properties"`` will not appear in the output. Substitutions for non-finite and complex numbers ================================================ JSON doesn't support not-a-number values, infinite values, or complex number types (as in numbers with a real and imaginary part). Some work-arounds use non-JSON syntax, but this function converts valid JSON into these numbers with user-specified rules. The ``nan_string``, ``posinf_string``, and ``neginf_string`` convert quoted strings into floating-point numbers. You can specify what these strings are. .. code-block:: python >>> ak.from_json( ... '[1, 2, "nan", "inf", "-inf"]', ... nan_string="nan", ... posinf_string="inf", ... neginf_string="-inf", ... ) Without these rules, the array would be interpreted as a union of numbers and strings: .. code-block:: python >>> ak.from_json( ... '[1, 2, "nan", "inf", "-inf"]', ... ) When combined with a JSONSchema, you need to say that these values have type ``"number"``, not a union of strings and numbers (i.e. the conversion is performed *before* schema-validation). Note that they can't be ``"integer"``, since not-a-number and infinite values are only possible for floating-point numbers. .. code-block:: python >>> ak.from_json( ... '[1, 2, "nan", "inf", "-inf"]', ... nan_string="nan", ... posinf_string="inf", ... neginf_string="-inf", ... schema={"type": "array", "items": {"type": "number"}} ... ) The ``complex_record_fields`` is a 2-tuple of field names (strings) of objects to identify as the real and imaginary parts of complex numbers. Complex number representations in JSON vary, though most are JSON objects with real and imaginary parts and possibly other fields. Any other fields will be excluded from the output array. .. code-block:: python >>> ak.from_json( ... '[{"r": 1, "i": 1.1, "other": ""}, {"r": 2, "i": 2.2, "other": ""}]', ... complex_record_fields=("r", "i"), ... ) Without this rule, the array would be interpreted as an array of records: .. code-block:: python >>> ak.from_json( ... '[{"r": 1, "i": 1.1, "other": ""}, {"r": 2, "i": 2.2, "other": ""}]', ... ) When combined with a JSONSchema, you need to specify the object type (i.e. the conversion is performed *after* schema-validation). Note that even the fields that will be ignored by ``complex_record_fields`` need to be specified. .. code-block:: python >>> ak.from_json( ... '[{"r": 1, "i": 1.1, "other": ""}, {"r": 2, "i": 2.2, "other": ""}]', ... complex_record_fields=("r", "i"), ... schema={ ... "type": "array", ... "items": { ... "type": "object", ... "properties": { ... "r": {"type": "number"}, ... "i": {"type": "number"}, ... "other": {"type": "string"}, ... }, ... "required": ["r", "i"], ... }, ... }, ... ) See also :py:obj:`ak.to_json`.