ak.metadata_from_parquet#

Defined in awkward.operations.ak_metadata_from_parquet on line 22.

ak.metadata_from_parquet(path, *, storage_options=None, row_groups=None, ignore_metadata=False, scan_files=True)#
Parameters:
  • path (str) – Local filename or remote URL, passed to fsspec for resolution. May contain glob patterns. A list of paths is also allowed, but they must be data files, not directories.

  • storage_options – Passed to fsspec.parquet.open_parquet_file.

  • row_groups (None or set of int) – Row groups to read; must be non-negative. Order is ignored: the output array is presented in the order specified by Parquet metadata. If None, all row groups/all rows are read.

  • ignore_metadata (bool) – ignore the dedicated _metadata file if found and instead derive metadata from the first data file.

  • scan_files (bool) – TODO

This function differs from ak.from_parquet._metadata as follows:

  • this function will always use a _metadata file, if present

  • if there is no _metadata, the schema comes from _common_metadata or the first data file

  • the total number of rows is always known

Returns dict containing

  • form: an Awkward Form representing the low-level type of the data (use .type to get a high-level type),

  • fs: the fsspec filesystem object,

  • paths: a list of matching path names,

  • col_counts: the number of rows in each row group,

  • columns: the columns defined by the schema,

  • num_rows: the length of the array that would be read by ak.from_parquet,

  • num_row_groups: the units that can be filtered (for the ak.from_parquet row_groups argument).

See also ak.from_parquet, ak.to_parquet.