Classes

The Replus Class

class replus.Replus(patterns_dir_or_dict: Union[os.PathLike, Dict[str, Dict]], whitespace_noise: Optional[str] = None, flags: Optional[int] = RegexFlag.V0)

The Replus engine class builds and compiles regular expressions based on templates.

Variables
  • group_counter – a Counter object to count group name occurance on each template

  • patterns – a list of tuples made of [(key, pattern, template), …]

  • patterns_src – a dict containing all of patterns_dir/*.json combined together, “patterns” excluded

  • patterns_all – all patterns that can be run, e.g. {“dates”: [pattern0, pattern1], …}

  • all_groups – a dict of list with the templates as keys, e.g. {pattern_template_a: [group_0, group_1], pattern_template_b: [group_0, group_1]}

Instanciates the Replus engine

Parameters
  • patterns_dir_or_dict (Union[os.PathLike, Dict[str, Dict]]) – the path to the directory where the *.json pattern templates are stored or a dict of dicts with the patterns

  • whitespace_noise (str, defaults to None) – a pattern to replace white space in the template

  • flags (int, defaults to regex.V0) – the regex flags to compile the patterns

parse(string: str, filters: Optional[List[str]] = None, exclude: Optional[List[str]] = None, pos: Optional[int] = None, endpos: Optional[int] = None, flags: Optional[int] = 0, overlapped: Optional[bool] = False, partial: Optional[bool] = False, concurrent: Optional[bool] = None, timeout: Optional[float] = None, ignore_unused: Optional[bool] = False, **kwargs: Any) List[replus.Match]

Returns a list of Match objects

Parameters
  • string (str) – the string to parse

  • filters (List[str]) – one or more pattern types to parse; if none is provided, all will be used

  • exclude (List[str], defaults to None) – a list of pattern types to exclude

  • pos (int, defaults to None) – starting position of the matching

  • endpos (int, defaults to None) – ending position of the matching

  • flags (int, defaults to 0) – flags to use while matching

  • overlapped (bool, defaults to False) – if True will allow overlapping matches

  • partial (float, defaults to None) – if True will allow partial matches

  • concurrent (bool, defaults to None) – if True will run concurrently

  • timeout – timeout for matching

  • ignore_unused (bool, defaults to False) – ignore unused

Returns

a list of Match objects

Return type

List[Match]

static purge_overlaps(matches: Union[List[replus.Match], List[replus.Group]]) Union[List[replus.Match], List[replus.Group]]

Purge the list of Match and Group objects from overlapping instances

Parameters

matches (Union[List[Match], List[Group]]) – a list of Match or Group objects

Retrurn

a list of Match or Group objects

Return type

Union[List[Match], List[Group]]

search(string: str, filters: Optional[List[str]] = None, exclude: Optional[List[str]] = None, pos: Optional[int] = None, endpos: Optional[int] = None, flags: Optional[int] = 0, overlapped: Optional[bool] = False, partial: Optional[bool] = False, concurrent: Optional[bool] = None, timeout: Optional[float] = None, ignore_unused: Optional[bool] = False, **kwargs: Any) replus.Match

Returns a single Match object

Parameters
  • string (str) – the string to parse

  • filters (Tuple[str]) – one or more pattern types to parse; if none is provided, all will be used

  • exclude (List[str], defaults to None) – a list of pattern types to exclude

  • pos (int, defaults to None) – starting position of the matching

  • endpos (int, defaults to None) – ending position of the matching

  • flags (int, defaults to 0) – flags to use while matching

  • overlapped (bool, defaults to False) – if True will allow overlapping matches

  • partial (float, defaults to None) – if True will allow partial matches

  • concurrent (bool, defaults to None) – if True will run concurrently

  • timeout – timeout for matching

  • ignore_unused (bool, defaults to False) – ignore unused

Returns

a Match object

Return type

Match

The Match Class

class replus.Match(match_type: str, match: _regex.Match, all_groups_names: List[str], pattern: _regex.Pattern)

A Match object is an abstract and expanded representation of a regex.regex.Match

Variables
  • type – the type of the match, corresponding to the stem of the file of the pattern’s template

  • match – a regex.regex.Match object

  • partial – if it’s a partial match

  • value – the string value of the match

  • offset – the offset of the match {"start": int, "end": int}

  • pattern – the string representation of the pattern that matched

  • lenth – the length of the match (no. of characters)

  • all_group_names – all the names of all the groups for the corresponding pattern for this match

  • _start – the start offset of the Match

  • _end – the end offset Match

  • _span – the span of the Match (_start, _end)

Instanciates a Match object

Parameters
  • match_type (str) – the type of the match, corresponding to the stem of the file of the pattern’s template

  • match (regex.regex.Match) – a regex.regex.Match object

  • all_groups_names – all the names of all the groups for the corresponding pattern for this match

  • pattern – the pattern that matched

Type

pattern: regex.regex.Pattern

end(group_name: Optional[str] = None, rep_index: int = 0) int

Returns the end character index of self or of Group with group_name

Parameters
  • group_name (str, defaults to None) – the name of the group

  • rep_index (int, defaults to 0) – the repetition index of the group

Returns

the end index of the Match

Return type

int

first() Optional[replus.Group]

Returns the first Group object or None

Returns

the first Group object

Return type

Union[Group, None]

group(group_name: str) Optional[replus.Group]

Returns a Group object with the given group_name or None

Parameters

group_name (str) – the name of the group

Returns

a Group object

Return type

Union[Group, None]

groups(group_query: Optional[str] = None, root: bool = False) List[replus.Group]

Returns a list of repeated Group objects that belong to the Match object

Parameters
  • group_query (str, defaults to None) – the name of the group to find repetitions of

  • root (bool, defaults to False) – includes the root if True

Returns

a list of Group objects

Return type

List[Group]

json(*args, **kwargs) str

Returns a json-string of the serialized object

Returns

a json-string of the serialized object

Return type

str

last() Optional[replus.Group]

Returns the last Group object or None

Returns

the last Group object

Return type

Union[Group, None]

serialize() dict

Returns a dict representation of the Match object structured as follows:

{
    "type": self.type,
    "offset": self.offset,
    "value": self.value,
    "groups": [<serialized_groups>, ], # <- including root (itself)
}
Returns

a dict representation of the Match object

Return type

dict

span(group_name: Optional[str] = None, rep_index: int = 0) Tuple[int]

Returns the span of self or of Group with group_name

Parameters
  • group_name (str, defaults to None) – the name of the group

  • rep_index (int, defaults to 0) – the repetition index of the group

Returns

the span of the Match

Return type

Tuple[int]

start(group_name: Optional[str] = None, rep_index: int = 0) int

Returns the start character index of self or of Group with group_name

Parameters
  • group_name (str, defaults to None) – the name of the group

  • rep_index (int, defaults to 0) – the repetition index of the group

Returns

the start index of the Match

Return type

int

The Group Class

class replus.Group(match: _regex.Match, group_name: str, root: replus.Match, rep_index: int = 0)

A Group object is an abstract and expanded representation of a regex.regex.Match

Variables
  • root – the root Match object

  • match – a regex.regex.Match object

  • name – the name of the group, including its rep_index. E.g.: date_0

  • key – the key of the group, i.e. the name without the rep_index

  • value – the string value of the match

  • offset – the offset of the match {"start": int, "end": int}

  • length – the length of the match (no. of characters)

  • rep_index – the repetition index

  • _start – the start offset of the Match

  • _end – the end offset Match

  • _span – the span of the Match (_start, _end)

end(group_name: Optional[str] = None, rep_index: int = 0) int

Returns the end character index of self or of Group with group_name

Parameters
  • group_name (str, defaults to None) – the name of the group

  • rep_index (int, defaults to 0) – the repetition index of the group

Returns

the end index of the Match

Return type

int

first() Optional[replus.Group]

Returns the first Group object or None

Returns

the first Group object

Return type

Union[Group, None]

group(group_name) replus.Group

Returns a Group object with the given group_name or None

Parameters

group_name (str) – the name of the group

Returns

a Group object

Return type

Union[Group, None]

groups(group_query: Optional[str] = None, root=False) List[replus.Group]

Returns a list of repeated Group objects that belong to the Group object

Parameters
  • group_query (str, defaults to None) – the name of the group to find repetitions of

  • root (bool, defaults to False) – includes the root if True

Returns

a list of Group objects

Return type

List[Group]

json(*args, **kwargs) str

Returns a json-string of the serialized object

Returns

a json-string of the serialized object

Return type

str

last() Optional[replus.Group]

Returns the last Group object or None

Returns

the last Group object

Return type

Union[Group, None]

reps() List[replus.Group]

Returns a list of the Group object’s repetitions

Returns

a a list of the Group object’s repetitions

Return type

List[Group]

serialize() dict

Returns a dict representation of the Match object structured as follows

o = {
    "key": self.key,
    "name": self.name,
    "offset": self.offset,
    "value": self.value,
    "groups": {subgroup_0: [group_object.serialize()]}
}
Returns

a dict representation of the Match object

Return type

dict

span(group_name: Optional[str] = None, rep_index: int = 0) Tuple[int]

Returns the span of self or of Group with group_name

Parameters
  • group_name (str, defaults to None) – the name of the group

  • rep_index (int, defaults to 0) – the repetition index of the group

Returns

the span of the Match

Return type

Tuple[int]

start(group_name: Optional[str] = None, rep_index: int = 0) int

Returns the start character index of self or of Group with group_name

Parameters
  • group_name (str, defaults to None) – the name of the group

  • rep_index (int, defaults to 0) – the repetition index of the group

Returns

the start index of the Match

Return type

int