Span
class
Its main original methods are
partition(range(start,stop))
: which partitions the initialSpan
in three newSpan
instance, collected in a uniqueSpans
instance (see below)split([range(start,end),range(start2,end2), ...])
: which splits theSpan
in several instances grouped in a a list ofSpan
objectsslice(start,stop,size,step)
: which slices the initial string from positionstart
to positionstop
bystep
in sub-strings of sizesize
, all grouped in a list of Span objects
In addition, one can compare two Span
using the non-overlapping order <
and >
or the overlapping-order <=
and >=
when two Span
s overlap.
Finally, since a Span
can be seen as a selection of a set of character positions from the parent string, one can apply the basic set operations to two Span
s, in order to construct more elaborated Span
instance.
Span Objects¶
class Span()
Span
object is basically a collection of a parent string named string
, and a ranges
collection (list of range) of positions. Its basic usage is its str
method, which consists in a string extracted from all intervals defined in the ranges
list, and joined by the subtoksep
separator.
__init__¶
| __init__(string='', ranges=None, subtoksep=chr(32), encoding='utf-8')
Span
attributes are
Span.string
-> a string, default is emptySpan.ranges
-> a list of ranges, default is None, in which case the ranges are calculated to contain the entire stringSpan.subtoksep
-> a string, preferably of length 1, default is a white space ‘ ‘Span.encoding
-> a string, representing the encoding, default is'utf-8'
append_range¶
| append_range(r)
Append a range object to self.ranges. The range r must be given in absolute coordinates.
Return self (append in place).
Raise a ValueError
in case r is not a range object.
Raise a BoundaryWarning
in case r has start or stop attributes outside the size of Span.string, in which case thse parameters are recalculated to fit Span.string (being either 0 for start or len(Span.string) for stop).
append_ranges¶
| append_ranges(ranges)
Append a list of range objects to self.ranges. This method applies append_range
several times, so please see its documentation for more details.
remove_range¶
| remove_range(r)
Remove the range r from Span.ranges. The range r must be given in absolute coordinates. Return self (remove in place). In case the range r encompass the complete string, there is no more Span.ranges associated to the outcome of this method.
remove_ranges¶
| remove_ranges(ranges)
Remove a list of range objects toself.ranges. This method applies remove_range
several times, so please see its documentation for more details.
__repr__¶
| __repr__()
Return the two main elements (namely the string
and the ranges
attributes) of a Span
instance in a readable way.
__str__¶
| __str__()
str(Span)
method returns the recombination of the extract of each Span.subSpan
from the Span.string
attribute corresponding to all its Span.ranges
attribute, and joined by the Span.subtoksep
character.
__hash__¶
| __hash__()
Make the Span object hashable, such that it can serve for set and dict.keys. Span is constructed on the unicity of the Span object, that is, this is the hash of the string made of the parent string, plus the string representation of the instance, including subtoksep. Everything is then converted to hashlib.sha1.hexdigest
__contains__¶
| __contains__(s)
If the object to be compared with is a Span related to the same string as this instance, check whether the ranges are overlapping. Otherwise, check whether the string str(s) (which transforms the other Span instance in a string in case s is not related to the same string) is a sub-string of the Span
instance.
__getitem__¶
| __getitem__(n)
Allow slice and integer catch of the elements of the string of Span
.
Return a string.
Note: As for the usual Python string, a slice with positions outside str(Span) will outcome an empty string, whereas Span[x] with x>len(Span) would results in an IndexError.
get_subSpan¶
| get_subSpan(n)
Get the Span associated to the ranges elements n (being an integer or a slice). Return a Span object. Raise an IndexError in case n is larger than the number of ranges in self.ranges.
subSpans¶
| @property
| subSpans()
Get the Span associated to each Span.ranges in a Span object. Return a Span object.
__eq__¶
| __eq__(span)
Verify whether the actual instance of Span and an extra ones have the same attributes.
Returns a boolean.
Raise a ValueError when one object is not a Span instance
__add__¶
| __add__(span)
If the two Span objects have same strings, returns a new Span object with combined ranges of the initial ones.
__sub__¶
| __sub__(span)
If the two Span objects have same strings, returns a new Span object with ranges of self with Span ranges removed. Might returns an empty Span.
__mul__¶
| __mul__(span)
If the two Span objects have same strings, returns a new Span object with ranges of self having intersection with Span ranges removed. Might returns an empty Span.
__truediv__¶
| __truediv__(span)
If the two Span objects have same strings, returns a new Span object with ranges of self having symmetric_difference with Span ranges removed. Might returns an empty Span.
start¶
| @property
| start()
Returns the starting position (an integer) of the first ranges. Make sense only for contiguous Span.
stop¶
| @property
| stop()
Returns the ending position (an integer) of the last ranges. Make sense only for contiguous Span.
__lt__¶
| __lt__(span)
Returns True if Span is entirely on the left of span (the Span object given as parameter). Make sense only for contiguous Span.
__gt__¶
| __gt__(span)
Returns True if Span is entirely on the right of span (the Span object given as parameter). Make sense only for contiguous Span.
__le__¶
| __le__(span)
Returns True if Span is partly on the left of span (the Span object given as parameter). Make sense only for contiguous Span.
__ge__¶
| __ge__(span)
Returns True if Span is partly on the right of span (the Span object given as parameter). Make sense only for contiguous Span.
union¶
| union(span)
Takes a Span object as entry, and returns a new Span instance, with Span.ranges given by the union of the actual Span.ranges with the span.ranges, when one sees the ranges
attributes as sets of positions of each instance.
Parameters |
Type |
Details |
---|---|---|
|
|
A Span object with same mother string (Span.string) and eventually different ranges that the actual instance. |
Returns |
Type |
Details |
---|---|---|
|
|
A |
Raises |
Details |
---|---|
ValueError |
in case the entry is not a Span instance. |
TypeError |
in case the span.string is not the same as Span.string. |
difference¶
| difference(span)
Takes a Span object as entry, and returns a new Span instance with Span.ranges given by the difference of the actual Span.ranges with the span.ranges, when one sees the ranges
attributes as sets of positions of each instance.
Parameters |
Type |
Details |
---|---|---|
|
|
A Span object with same mother string (Span.string) and eventually different ranges that the actual instance. |
Returns |
Type |
Details |
---|---|---|
|
|
A |
Raises |
Details |
---|---|
ValueError |
in case the entry is not a Span instance. |
TypeError |
in case the span.string is not the same as Span.string. |
intersection¶
| intersection(span)
Takes a Span object as entry and returns a new Span whose Span.ranges given by the intersection of the actual Span.ranges with the span.ranges, when one sees the ranges
attributes as sets of positions of each instance..
Parameters |
Type |
Details |
---|---|---|
|
|
A Span object with same mother string (Span.string) and eventually different ranges that the actual instance. |
Returns |
Type |
Details |
---|---|---|
|
|
A |
Raises |
Details |
---|---|
ValueError |
in case the entry is not a Span instance. |
TypeError |
in case the span.string is not the same as Span.string. |
symmetric_difference¶
| symmetric_difference(span)
Takes a Span object as entry, and return a new Span instance whose Span.ranges given by the symmetric difference of the actual Span.ranges with the span.ranges, when one sees the ranges
attributes as sets of positions of each instance.
Parameters |
Type |
Details |
---|---|---|
|
|
A Span object with same mother string (Span.string) and eventually different ranges that the actual instance. |
Returns |
Type |
Details |
---|---|---|
|
|
A |
Raises |
Details |
---|---|
ValueError |
in case the entry is not a Span instance. |
TypeError |
in case the span.string is not the same as Span.string. |
_prepareSpans¶
| _prepareSpans(ranges, remove_empty)
Utility that removes empty ranges and constructs a list of Span objects.
partition¶
| partition(start, stop, remove_empty=False)
Split the Span.string
in three Span
objects :
string[:start]
string[start:stop]
string[stop:]
and put all non-emptySpan
objects in a list ofSpan
instances.
It acts a bit like the str.partition(s)
method of the Python
string
object, but Span.partition
takes start
and stop
argument instead of a string.
Parameters |
Type |
Details |
---|---|---|
|
int |
Starting position of the splitting sequence. |
|
int |
Ending position of the splitting sequence. |
|
bool. Default is |
If |
Returns |
Type |
Details |
---|---|---|
|
|
The |
split¶
| split(cuts, remove_empty=False)
Split a text as many times as there are range entities in the cuts list.
Return a list of Span
instances.
This is a bit like str.split(s)
method from Python string
object, except one has to feed Span.split
with a full list
of range(start,stop)
range objects instead of the string ‘s’ in str.split(s)
If the range(start,stop)
tuples in cuts are given by a regex re.finditer
search on str(Span)
, the two methods give the same thing.
Parameters |
Type |
Details |
---|---|---|
|
a list of |
Basic usage is to take these cuts from |
|
bool. Default is |
If |
Return |
Type |
Details |
---|---|---|
|
|
The |
slice¶
| slice(start=0, stop=None, size=1, step=1, remove_empty=False)
Cut the Span.string
in overlapping sequences of strings of size size
by step
,
put all these sequences in separated Span
objects, and finally
put all theses objects in a list of Span
instances.
Parameters |
Type |
Details |
---|---|---|
|
int |
The relative position where to start slicing the Span. |
|
int |
The relative position where to stop slicing the Span. |
|
int |
The size of the string in each subsequent Span objects. |
|
int |
The number of characters skipped from one Span object to the next one. A character is given by |
Returns |
Type |
Details |
---|---|---|
|
|
The |