Tools module

Functions support for the Token and Tokens classes.

_startstop

_startstop(obj, start: int, stop: int) -> Tuple[int, int]

Tool function to catch the edge of the constructed string in the following methods. Works for both the Token and Tokens classes, if obj is passed as self. Recast start and stop inside the Token.string coordinates in case obj is a Token.string.

Raise a BoundaryWarning in case at least one of the the start or stop has been changed.

_isSpan

_isSpan(span) -> bool

Returns True if span has all the attributes of a Span instance. False if not.

_checkSpan

_checkSpan(span)

Raises a ValueError in case span is not a proper Span instance

_isToken

_isToken(token)

Returns True if token has all the attributes of a Token instance. False if not.

_checkToken

_checkToken(token)

Raises a ValueError in case token is not a proper Token instance

_isTokens

_isTokens(tokens)

Returns True if all elements of tokens is a Token instance according to _isToken, otherwise returns False

_checkTokens

_checkTokens(tokens)

Raises a ValueError in case tokens is not a proper Tokens instance.

_checkRange

_checkRange(r)

Raises a ValueError in case r is not a compatible range object.

_checkRanges

_checkRanges(ranges)

Raises a ValueError in case one range in ranges is not suitable for the Span or Token classes.

_checkSameString

_checkSameString(obj1, obj2)

Raises a TypeError in case the two objects have not the same .string attribute. raises an AttributeError in case at least one of the objects has no attribute .string.

_areOverlapping

_areOverlapping(r1: range, r2: range) -> bool

Take two range objects, and return True if they are disjoint, otherwise return False if they overlap on some range

_isInside

_isInside(r1: range, r2: range) -> bool

Take two range objects r1 and r2, and return True if r2 is in r1, otherwise return False.

_combineRanges

_combineRanges(ranges: List[range]) -> List[range]

Take a list of range objects, and transform it such that overlapping ranges and consecutive ranges are combined.

Exemple:

_combineRanges([(12,25),(35,40)]) # -> [(12,25),(35,40),]
_combineRanges([(12,25),(26,40)]) # -> [(12,25),(26,40),]
_combineRanges([(12,26),(26,40)]) # -> [(12,40),]
_combineRanges([(12,25),(15,40)]) # -> [(12,40),]
_combineRanges([(12,25),(15,16)]) # -> [(12,25),]

Where all range objects have been transformed in tuples (12,25)==range(12,25) for illustration purpose. The overlapping information is lost in the process.

ranges is a list of range object, all with range.step==1 (not verified by this function, but required for the algorithm to work properly)..

_findCut

_findCut(ranges: List[range], cut: int, step: int = 0) -> Tuple[int, int, bool]

Find the index i_ and absolute position cut_ of the cuting of ranges at relative position cut. Handle the case where the cut is inside some separator of size given by the step parameter, in which case a flag_ is raised.

Returns a tuple (i_,cut_,flag_) :

  • i_ index of the ranges element onto which the cut applies

  • cut_ absolute position where the cut applies in the coordinates of the ranges

  • flag_ True if the cut appears inside the separator

Raises an IndexError in case the cut is outside the range of the ranges.

Examples:

ranges = [range(10,20), range(30,40)]
_findCut(ranges,5,step=0)   # -> (0, 15, False)
_findCut(ranges,10,step=0)  # -> (1, 30, False)
_findCut(ranges,10,step=1)  # -> (0, 20, False)
_findCut(ranges,11,step=1)  # -> (1, 30, False)
_findCut(ranges,10,step=2)  # -> (0, 20, False)
_findCut(ranges,11,step=2)  # -> (0, 20, True)
_findCut(ranges,12,step=2)  # -> (1, 30, False)
_findCut(ranges,20,step=1)  # -> (1, 39, False)
_findCut(ranges,21,step=1)  # -> (1, 40, False)
_findCut(ranges,22,step=1)  # -> IndexError

This function is the main tool to cut ranges in sub-ranges, see _cutRanges below.

Warning, this function does not sort the ranges, but they must be sorted for the algorithm to work properly.

_cutRanges

_cutRanges(ranges: List[range], cut: int, step: int = 0) -> Tuple[List[range], List[range]]

Cut the ranges (given in absolute positions) at the cut (given in relative position). Handle the case where the cut is inside some separator of size given by the step parameter, in which case the entire separator length is on the left list of range objects.

Returns two lists of range objects. The first list will correspond to the left Token (before the cut), and the second one to the right Token (after the cut).

Warnings:

  • some of the returned ranges might be empty. This is the case when the cut appears at a boundary of the range. It allows the Token instance to handle the size of the subtoksep separator in a smart way. This is the reason for the introduction of the flag_ parameter in the _findCut function.

  • the ranges must be sorted for this function to work properly.

Examples:

ranges = [range(10,20), range(30,40)]
_cutRanges(ranges,5,step=0)   # -> ([range(10, 15)], [range(15, 20), range(30, 40)])
_cutRanges(ranges,10,step=0)  # -> ([range(10, 20), range(30, 30)], [range(30, 40)])
_cutRanges(ranges,11,step=0)  # -> ([range(10, 20), range(30, 31)], [range(31, 40)])

# note the position of the empty range in the two exemples to come
_cutRanges(ranges,10,step=1)  # -> ([range(10, 20)], [range(20, 20), range(30, 40)])
_cutRanges(ranges,11,step=1)  # -> ([range(10, 20), range(30, 30)], [range(30, 40)])

_cutRanges(ranges,10,step=2)  # -> ([range(10, 20)], [range(20, 20), range(30, 40)])
# flag_ = True, separator is on the left : 
_cutRanges(ranges,11,step=2)  # -> ([range(10, 20), range(20, 20)], [range(30, 40)])
_cutRanges(ranges,12,step=2)  # -> ([range(10, 20), range(30, 30)], [range(30, 40)])

_cutRanges(ranges,20,step=1)  # -> ([range(10, 20), range(30, 39)], [range(39, 40)])
_cutRanges(ranges,21,step=1)  # -> ([range(10, 20), range(30, 40)], [range(40, 40)])
_cutRanges(ranges,22,step=1)  # -> IndexError

_removeRange

_removeRange(ranges: List[range], r: range) -> List[range]

Remove the range r from the list of ranges.

Examples:

ranges = [range(10,20), range(30,40)]
_removeRange(ranges,range(0,10))   # -> [range(10, 20), range(30, 40)]
_removeRange(ranges,range(0,11))  # -> [range(11, 20), range(30, 40)]
_removeRange(ranges,range(11,15))  # -> [range(10, 11), range(15, 20), range(30, 40)]

_removeRange(ranges,range(25,30))  # -> [range(10, 20), range(30, 40)]
_removeRange(ranges,range(0,25))  # -> [range(30, 40)]

_removeRange(ranges,range(11,11))  # ->[range(10, 20), range(30, 40)]

_removeRange(ranges,range(30,31))  # -> [range(10, 20), range(31, 40)]
_removeRange(ranges,range(100))  # -> []

_removeRange(ranges,range(15,35))  # -> [range(10, 15), range(35, 40)]

_fusionAttributesList

_fusionAttributesList(attributesList)

Take a list of dictionnaries, and return a dictionnary of lists

_fusionAttributesDict

_fusionAttributesDict(attributesDict)

Take a dictionnaries of lists, and return a dictionnary of dictionnaries of lists

_fusionAttributes

_fusionAttributes(attributes)

Apply the two above methods in a raw