ak.str.extract_regex#
Defined in awkward.operations.str.akstr_extract_regex on line 13.
- ak.str.extract_regex(array, pattern, *, highlevel=True, behavior=None)#
- Parameters
array – Array-like data (anything
ak.to_layout
recognizes).pattern (str or bytes) – Regular expression with named capture fields.
highlevel (bool) – If True, return an
ak.Array
; otherwise, return a low-levelak.contents.Content
subclass.behavior (None or dict) – Custom
ak.behavior
for the output array, if high-level.
Returns None for every string in array
if it does not match pattern
;
otherwise, a record whose fields are named capture groups and whose
contents are the substrings they’ve captured.
Uses Google RE2, and pattern
must
contain named groups. The syntax for a named group is (?P<...>...)
in which
the first ...
is a name and the last ...
is a regular expression.
For example,
>>> array = ak.Array([["one1", "two2", "three3"], [], ["four4", "five5"]])
>>> result = ak.str.extract_regex(array, "(?P<vowel>[aeiou])(?P<number>[0-9]+)")
>>> result.show(type=True)
type: 3 * var * ?{
vowel: ?string,
number: ?string
}
[[{vowel: 'e', number: '1'}, {vowel: 'o', number: '2'}, {vowel: 'e', number: '3'}],
[],
[None, {vowel: 'e', number: '5'}]]
(The string "four4"
does not match because the vowel is not immediately before
the number.)
Regular expressions with unnamed groups or features not implemented by RE2 raise an error.
Note: this function does not raise an error if the array
does not
contain any string or bytestring data.
Requires the pyarrow library and calls pyarrow.compute.extract_regex.