Classes
The Replus Class
- class replus.Replus(patterns_dir_or_dict: Union[os.PathLike, Dict[str, Dict]], whitespace_noise: Optional[str] = None, flags: Optional[int] = RegexFlag.V0)
The Replus engine class builds and compiles regular expressions based on templates.
- Variables
group_counter – a Counter object to count group name occurance on each template
patterns – a list of tuples made of [(key, pattern, template), …]
patterns_src – a dict containing all of patterns_dir/*.json combined together, “patterns” excluded
patterns_all – all patterns that can be run, e.g. {“dates”: [pattern0, pattern1], …}
all_groups – a dict of list with the templates as keys, e.g. {pattern_template_a: [group_0, group_1], pattern_template_b: [group_0, group_1]}
Instanciates the Replus engine
- Parameters
patterns_dir_or_dict (Union[os.PathLike, Dict[str, Dict]]) – the path to the directory where the *.json pattern templates are stored or a dict of dicts with the patterns
whitespace_noise (str, defaults to None) – a pattern to replace white space in the template
flags (int, defaults to regex.V0) – the regex flags to compile the patterns
- parse(string: str, filters: Optional[List[str]] = None, exclude: Optional[List[str]] = None, pos: Optional[int] = None, endpos: Optional[int] = None, flags: Optional[int] = 0, overlapped: Optional[bool] = False, partial: Optional[bool] = False, concurrent: Optional[bool] = None, timeout: Optional[float] = None, ignore_unused: Optional[bool] = False, **kwargs: Any) List[replus.Match]
Returns a list of Match objects
- Parameters
string (str) – the string to parse
filters (List[str]) – one or more pattern types to parse; if none is provided, all will be used
exclude (List[str], defaults to None) – a list of pattern types to exclude
pos (int, defaults to None) – starting position of the matching
endpos (int, defaults to None) – ending position of the matching
flags (int, defaults to 0) – flags to use while matching
overlapped (bool, defaults to False) – if True will allow overlapping matches
partial (float, defaults to None) – if True will allow partial matches
concurrent (bool, defaults to None) – if True will run concurrently
timeout – timeout for matching
ignore_unused (bool, defaults to False) – ignore unused
- Returns
a list of Match objects
- Return type
List[Match]
- static purge_overlaps(matches: Union[List[replus.Match], List[replus.Group]]) Union[List[replus.Match], List[replus.Group]]
Purge the list of Match and Group objects from overlapping instances
- search(string: str, filters: Optional[List[str]] = None, exclude: Optional[List[str]] = None, pos: Optional[int] = None, endpos: Optional[int] = None, flags: Optional[int] = 0, overlapped: Optional[bool] = False, partial: Optional[bool] = False, concurrent: Optional[bool] = None, timeout: Optional[float] = None, ignore_unused: Optional[bool] = False, **kwargs: Any) replus.Match
Returns a single Match object
- Parameters
string (str) – the string to parse
filters (Tuple[str]) – one or more pattern types to parse; if none is provided, all will be used
exclude (List[str], defaults to None) – a list of pattern types to exclude
pos (int, defaults to None) – starting position of the matching
endpos (int, defaults to None) – ending position of the matching
flags (int, defaults to 0) – flags to use while matching
overlapped (bool, defaults to False) – if True will allow overlapping matches
partial (float, defaults to None) – if True will allow partial matches
concurrent (bool, defaults to None) – if True will run concurrently
timeout – timeout for matching
ignore_unused (bool, defaults to False) – ignore unused
- Returns
a Match object
- Return type
The Match Class
- class replus.Match(match_type: str, match: _regex.Match, all_groups_names: List[str], pattern: _regex.Pattern)
A Match object is an abstract and expanded representation of a regex.regex.Match
- Variables
type – the type of the match, corresponding to the stem of the file of the pattern’s template
match – a regex.regex.Match object
partial – if it’s a partial match
value – the string value of the match
offset – the offset of the match
{"start": int, "end": int}
pattern – the string representation of the pattern that matched
lenth – the length of the match (no. of characters)
all_group_names – all the names of all the groups for the corresponding pattern for this match
_start – the start offset of the Match
_end – the end offset Match
_span – the span of the Match (_start, _end)
Instanciates a Match object
- Parameters
match_type (str) – the type of the match, corresponding to the stem of the file of the pattern’s template
match (regex.regex.Match) – a regex.regex.Match object
all_groups_names – all the names of all the groups for the corresponding pattern for this match
pattern – the pattern that matched
- Type
pattern: regex.regex.Pattern
- end(group_name: Optional[str] = None, rep_index: Optional[int] = 0) int
Returns the end character index of self or of Group with group_name
- Parameters
group_name (str, defaults to None) – the name of the group
rep_index (int, defaults to 0) – the repetition index of the group
- Returns
the end index of the Match
- Return type
int
- first() Optional[replus.Group]
Returns the first Group object or None
- Returns
the first Group object
- Return type
Union[Group, None]
- group(group_name: str) Optional[replus.Group]
Returns a Group object with the given group_name or None
- Parameters
group_name (str) – the name of the group
- Returns
a Group object
- Return type
Union[Group, None]
- groups(group_query: Optional[str] = None, root: bool = False) List[replus.Group]
Returns a list of repeated Group objects that belong to the Match object
- Parameters
group_query (str, defaults to None) – the name of the group to find repetitions of
root (bool, defaults to False) – includes the root if True
- Returns
a list of Group objects
- Return type
List[Group]
- json(*args, **kwargs) str
Returns a json-string of the serialized object
- Returns
a json-string of the serialized object
- Return type
str
- last() Optional[replus.Group]
Returns the last Group object or None
- Returns
the last Group object
- Return type
Union[Group, None]
- serialize() dict
Returns a dict representation of the Match object structured as follows:
{ "type": self.type, "offset": self.offset, "value": self.value, "groups": [<serialized_groups>, ], # <- including root (itself) }
- Returns
a dict representation of the Match object
- Return type
dict
- span(group_name: Optional[str] = None, rep_index: Optional[int] = 0) Tuple[int]
Returns the span of self or of Group with group_name
- Parameters
group_name (str, defaults to None) – the name of the group
rep_index (int, defaults to 0) – the repetition index of the group
- Returns
the span of the Match
- Return type
Tuple[int]
- start(group_name: Optional[str] = None, rep_index: Optional[int] = 0) int
Returns the start character index of self or of Group with group_name
- Parameters
group_name (str, defaults to None) – the name of the group
rep_index (int, defaults to 0) – the repetition index of the group
- Returns
the start index of the Match
- Return type
int
The Group Class
- class replus.Group(match: _regex.Match, group_name: str, root: replus.Match, rep_index: int = 0)
A Group object is an abstract and expanded representation of a regex.regex.Match
- Variables
root – the root Match object
match – a regex.regex.Match object
name – the name of the group, including its rep_index. E.g.: date_0
key – the key of the group, i.e. the name without the rep_index
value – the string value of the match
offset – the offset of the match
{"start": int, "end": int}
length – the length of the match (no. of characters)
rep_index – the repetition index
_start – the start offset of the Match
_end – the end offset Match
_span – the span of the Match (_start, _end)
- end(group_name: Optional[str] = None, rep_index: Optional[int] = None) int
Returns the end character index of self or of Group with group_name
- Parameters
group_name (str, defaults to None) – the name of the group
rep_index (int, defaults to 0) – the repetition index of the group
- Returns
the end index of the Match
- Return type
int
- first() Optional[replus.Group]
Returns the first Group object or None
- Returns
the first Group object
- Return type
Union[Group, None]
- group(group_name) replus.Group
Returns a Group object with the given group_name or None
- Parameters
group_name (str) – the name of the group
- Returns
a Group object
- Return type
Union[Group, None]
- groups(group_query: Optional[str] = None, root=False) List[replus.Group]
Returns a list of repeated Group objects that belong to the Group object
- Parameters
group_query (str, defaults to None) – the name of the group to find repetitions of
root (bool, defaults to False) – includes the root if True
- Returns
a list of Group objects
- Return type
List[Group]
- json(*args, **kwargs) str
Returns a json-string of the serialized object
- Returns
a json-string of the serialized object
- Return type
str
- last() Optional[replus.Group]
Returns the last Group object or None
- Returns
the last Group object
- Return type
Union[Group, None]
- reps() List[replus.Group]
Returns a list of the Group object’s repetitions
- Returns
a a list of the Group object’s repetitions
- Return type
List[Group]
- serialize() dict
Returns a dict representation of the Match object structured as follows
o = { "key": self.key, "name": self.name, "offset": self.offset, "value": self.value, "groups": {subgroup_0: [group_object.serialize()]} }
- Returns
a dict representation of the Match object
- Return type
dict
- span(group_name: Optional[str] = None, rep_index: Optional[int] = None) Tuple[int]
Returns the span of self or of Group with group_name
- Parameters
group_name (str, defaults to None) – the name of the group
rep_index (int, defaults to 0) – the repetition index of the group
- Returns
the span of the Match
- Return type
Tuple[int]
- start(group_name: Optional[str] = None, rep_index: Optional[int] = None) int
Returns the start character index of self or of Group with group_name
- Parameters
group_name (str, defaults to None) – the name of the group
rep_index (int, defaults to 0) – the repetition index of the group
- Returns
the start index of the Match
- Return type
int