ak.str.extract_regex#
Defined in awkward.operations.str.akstr_extract_regex on line 13.
- ak.str.extract_regex(array, pattern, *, highlevel=True, behavior=None, attrs=None)#
- Parameters:
array – Array-like data (anything
ak.to_layoutrecognizes).pattern (str or bytes) – Regular expression with named capture fields.
highlevel (bool) – If True, return an
ak.Array; otherwise, return a low-levelak.contents.Contentsubclass.behavior (None or dict) – Custom
ak.behaviorfor the output array, if high-level.attrs (None or dict) – Custom attributes for the output array, if high-level.
Returns None for every string in
arrayif it does not matchpattern; otherwise, a record whose fields are named capture groups and whose contents are the substrings they’ve captured.Uses Google RE2, and
patternmust contain named groups. The syntax for a named group is(?P<...>...)in which the first...is a name and the last...is a regular expression.For example,
>>> array = ak.Array([["one1", "two2", "three3"], [], ["four4", "five5"]]) >>> result = ak.str.extract_regex(array, "(?P<vowel>[aeiou])(?P<number>[0-9]+)") >>> result.show(type=True) type: 3 * var * ?{ vowel: ?string, number: ?string } [[{vowel: 'e', number: '1'}, {vowel: 'o', number: '2'}, {vowel: 'e', number: '3'}], [], [None, {vowel: 'e', number: '5'}]]
(The string
"four4"does not match because the vowel is not immediately before the number.)Regular expressions with unnamed groups or features not implemented by RE2 raise an error.
Note: this function does not raise an error if the
arraydoes not contain any string or bytestring data.Requires the pyarrow library and calls pyarrow.compute.extract_regex.