Tools module¶
Functions support for the Token and Tokens classes.
_startstop¶
_startstop(obj, start: int, stop: int) -> Tuple[int, int]
Tool function to catch the edge of the constructed string in the following methods. Works for both the Token and Tokens classes, if obj is passed as self. Recast start and stop inside the Token.string coordinates in case obj is a Token.string.
Raise a BoundaryWarning in case at least one of the the start or stop has been changed.
_isSpan¶
_isSpan(span) -> bool
Returns True if span
has all the attributes of a Span instance. False if not.
_isToken¶
_isToken(token)
Returns True if token
has all the attributes of a Token instance. False if not.
_isTokens¶
_isTokens(tokens)
Returns True if all elements of tokens
is a Token instance according to _isToken
, otherwise returns False
_checkTokens¶
_checkTokens(tokens)
Raises a ValueError in case tokens
is not a proper Tokens instance.
_checkRanges¶
_checkRanges(ranges)
Raises a ValueError in case one range in ranges
is not suitable for the Span
or Token
classes.
_checkSameString¶
_checkSameString(obj1, obj2)
Raises a TypeError in case the two objects have not the same .string attribute. raises an AttributeError in case at least one of the objects has no attribute .string.
_areOverlapping¶
_areOverlapping(r1: range, r2: range) -> bool
Take two range objects, and return True if they are disjoint, otherwise return False if they overlap on some range
_isInside¶
_isInside(r1: range, r2: range) -> bool
Take two range objects r1 and r2, and return True if r2 is in r1, otherwise return False.
_combineRanges¶
_combineRanges(ranges: List[range]) -> List[range]
Take a list of range objects, and transform it such that overlapping ranges and consecutive ranges are combined.
Exemple:
_combineRanges([(12,25),(35,40)]) # -> [(12,25),(35,40),]
_combineRanges([(12,25),(26,40)]) # -> [(12,25),(26,40),]
_combineRanges([(12,26),(26,40)]) # -> [(12,40),]
_combineRanges([(12,25),(15,40)]) # -> [(12,40),]
_combineRanges([(12,25),(15,16)]) # -> [(12,25),]
Where all range objects have been transformed in tuples (12,25)==range(12,25)
for illustration purpose.
The overlapping information is lost in the process.
ranges
is a list of range
object, all with range.step==1
(not verified by this function, but required for the algorithm to work properly)..
_findCut¶
_findCut(ranges: List[range], cut: int, step: int = 0) -> Tuple[int, int, bool]
Find the index i_ and absolute position cut_ of the cuting of ranges at relative position cut. Handle the case where the cut is inside some separator of size given by the step parameter, in which case a flag_ is raised.
Returns a tuple (i_,cut_,flag_)
:
i_
index of the ranges element onto which the cut appliescut_
absolute position where the cut applies in the coordinates of the rangesflag_
True if the cut appears inside the separator
Raises an IndexError in case the cut is outside the range of the ranges.
Examples:
ranges = [range(10,20), range(30,40)]
_findCut(ranges,5,step=0) # -> (0, 15, False)
_findCut(ranges,10,step=0) # -> (1, 30, False)
_findCut(ranges,10,step=1) # -> (0, 20, False)
_findCut(ranges,11,step=1) # -> (1, 30, False)
_findCut(ranges,10,step=2) # -> (0, 20, False)
_findCut(ranges,11,step=2) # -> (0, 20, True)
_findCut(ranges,12,step=2) # -> (1, 30, False)
_findCut(ranges,20,step=1) # -> (1, 39, False)
_findCut(ranges,21,step=1) # -> (1, 40, False)
_findCut(ranges,22,step=1) # -> IndexError
This function is the main tool to cut ranges in sub-ranges, see _cutRanges
below.
Warning, this function does not sort the ranges, but they must be sorted for the algorithm to work properly.
_cutRanges¶
_cutRanges(ranges: List[range], cut: int, step: int = 0) -> Tuple[List[range], List[range]]
Cut the ranges (given in absolute positions) at the cut (given in relative position). Handle the case where the cut is inside some separator of size given by the step parameter, in which case the entire separator length is on the left list of range objects.
Returns two lists of range objects. The first list will correspond to the left Token (before the cut), and the second one to the right Token (after the cut).
Warnings:
some of the returned ranges might be empty. This is the case when the cut appears at a boundary of the range. It allows the Token instance to handle the size of the
subtoksep
separator in a smart way. This is the reason for the introduction of theflag_
parameter in the_findCut
function.the
ranges
must be sorted for this function to work properly.
Examples:
ranges = [range(10,20), range(30,40)]
_cutRanges(ranges,5,step=0) # -> ([range(10, 15)], [range(15, 20), range(30, 40)])
_cutRanges(ranges,10,step=0) # -> ([range(10, 20), range(30, 30)], [range(30, 40)])
_cutRanges(ranges,11,step=0) # -> ([range(10, 20), range(30, 31)], [range(31, 40)])
# note the position of the empty range in the two exemples to come
_cutRanges(ranges,10,step=1) # -> ([range(10, 20)], [range(20, 20), range(30, 40)])
_cutRanges(ranges,11,step=1) # -> ([range(10, 20), range(30, 30)], [range(30, 40)])
_cutRanges(ranges,10,step=2) # -> ([range(10, 20)], [range(20, 20), range(30, 40)])
# flag_ = True, separator is on the left :
_cutRanges(ranges,11,step=2) # -> ([range(10, 20), range(20, 20)], [range(30, 40)])
_cutRanges(ranges,12,step=2) # -> ([range(10, 20), range(30, 30)], [range(30, 40)])
_cutRanges(ranges,20,step=1) # -> ([range(10, 20), range(30, 39)], [range(39, 40)])
_cutRanges(ranges,21,step=1) # -> ([range(10, 20), range(30, 40)], [range(40, 40)])
_cutRanges(ranges,22,step=1) # -> IndexError
_removeRange¶
_removeRange(ranges: List[range], r: range) -> List[range]
Remove the range r from the list of ranges.
Examples:
ranges = [range(10,20), range(30,40)]
_removeRange(ranges,range(0,10)) # -> [range(10, 20), range(30, 40)]
_removeRange(ranges,range(0,11)) # -> [range(11, 20), range(30, 40)]
_removeRange(ranges,range(11,15)) # -> [range(10, 11), range(15, 20), range(30, 40)]
_removeRange(ranges,range(25,30)) # -> [range(10, 20), range(30, 40)]
_removeRange(ranges,range(0,25)) # -> [range(30, 40)]
_removeRange(ranges,range(11,11)) # ->[range(10, 20), range(30, 40)]
_removeRange(ranges,range(30,31)) # -> [range(10, 20), range(31, 40)]
_removeRange(ranges,range(100)) # -> []
_removeRange(ranges,range(15,35)) # -> [range(10, 15), range(35, 40)]
_fusionAttributesList¶
_fusionAttributesList(attributesList)
Take a list of dictionnaries, and return a dictionnary of lists
_fusionAttributesDict¶
_fusionAttributesDict(attributesDict)
Take a dictionnaries of lists, and return a dictionnary of dictionnaries of lists