strops¶

This module gathers text operations to be run on a string

StrOp¶

class textops.StrOp¶

cut¶

class textops.cut(sep=None, col=None, default='')¶
Extract columns from a string or a list of strings

This works like the unix shell command ‘cut’. It uses str.split() function.

if the input is a simple string, cut() will return a list of strings representing the splitted input string.

if the input is a list of strings or a string with newlines, cut() will return a list of list of strings : each line of the input will splitted and put in a list.

if only one column is extracted, one level of list is removed.

Parameters:

sep (str) – a string as a column separator, default is None : this means ‘any kind of spaces’

col (int or list of int or str) –
specify one or many columns you want to get back, You can specify :

an int as a single column number (starting with 0)

a list of int as the list of colmun

a string containing a comma separated list of int

None (default value) for all columns

default (str) – A string to display when requesting a column that does not exist

Returns:
A string, a list of strings or a list of list of strings

Examples
>>> s='col1 col2 col3'
>>> s | cut()
['col1', 'col2', 'col3']
>>> s | cut(col=1)
'col2'
>>> s | cut(col='1,2,10',default='N/A')
['col2', 'col3', 'N/A']
>>> s='col1.1 col1.2 col1.3\ncol2.1 col2.2 col2.3'
>>> s | cut()
[['col1.1', 'col1.2', 'col1.3'], ['col2.1', 'col2.2', 'col2.3']]
>>> s | cut(col=1)
['col1.2', 'col2.2']
>>> s | cut(col='0,1')
[['col1.1', 'col1.2'], ['col2.1', 'col2.2']]
>>> s | cut(col=[1,2])
[['col1.2', 'col1.3'], ['col2.2', 'col2.3']]
>>> s='col1.1 | col1.2 |  col1.3\ncol2.1 | col2.2 | col2.3'
>>> s | cut()
[['col1.1', '|', 'col1.2', '|', 'col1.3'], ['col2.1', '|', 'col2.2', '|', 'col2.3']]
>>> s | cut(sep=' | ')
[['col1.1', 'col1.2', ' col1.3'], ['col2.1', 'col2.2', 'col2.3']]

cutca¶

class textops.cutca(sep=None, col=None, default='')¶
Extract columns from a string or a list of strings through pattern capture

This works like textops.cutre except it needs a pattern having parenthesis to capture column. It uses re.match() for capture, this means the pattern must start at line beginning.

if the input is a simple string, cutca() will return a list of strings representing the splitted input string.

if the input is a list of strings or a string with newlines, cut() will return a list of list of strings : each line of the input will splitted and put in a list.

if only one column is extracted, one level of list is removed.

Parameters:

sep (str or re.RegexObject) – a regular expression string or object having capture parenthesis

col (int or list of int or str) –
specify one or many columns you want to get back, You can specify :

an int as a single column number (starting with 0)

a list of int as the list of colmun

a string containing a comma separated list of int

None (default value) for all columns

default (str) – A string to display when requesting a column that does not exist

Returns:
a list of strings or a list of list of strings

Examples
>>> s='-col1- =col2= _col3_'
>>> s | cutca(r'[^-]*-([^-]*)-[^=]*=([^=]*)=[^_]*_([^_]*)_')
['col1', 'col2', 'col3']
>>> s=['-col1- =col2= _col3_','-col11- =col22= _col33_']
>>> s | cutca(r'[^-]*-([^-]*)-[^=]*=([^=]*)=[^_]*_([^_]*)_','0,2,4','not present')
[['col1', 'col3', 'not present'], ['col11', 'col33', 'not present']]

cutdct¶

class textops.cutdct(sep=None, col=None, default='')¶
Extract columns from a string or a list of strings through pattern capture

This works like textops.cutca except it needs a pattern having named parenthesis to capture column.

if the input is a simple string, cutca() will return a list of strings representing the splitted input string.

if the input is a list of strings or a string with newlines, cut() will return a list of list of strings : each line of the input will splitted and put in a list.

if only one column is extracted, one level of list is removed.

Parameters:

sep (str or re.RegexObject) – a regular expression string or object having named capture parenthesis

col (int or list of int or str) –
specify one or many columns you want to get back, You can specify :

an int as a single column number (starting with 0)

a list of int as the list of colmun

a string containing a comma separated list of int

None (default value) for all columns

default (str) – A string to display when requesting a column that does not exist

Returns:
A string, a list of strings or a list of list of strings

Examples
>>> s='item="col1" count="col2" price="col3"'
>>> s | cutdct(r'item="(?P<item>[^"]*)" count="(?P<i_count>[^"]*)" price="(?P<i_price>[^"]*)"')
{'item': 'col1', 'i_price': 'col3', 'i_count': 'col2'}
>>> s='item="col1" count="col2" price="col3"\nitem="col11" count="col22" price="col33"'
>>> s | cutdct(r'item="(?P<item>[^"]*)" count="(?P<i_count>[^"]*)" price="(?P<i_price>[^"]*)"') # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE
[{'item': 'col1', 'i_price': 'col3', 'i_count': 'col2'},...
{'item': 'col11', 'i_price': 'col33', 'i_count': 'col22'}]

cutkv¶

class textops.cutkv(sep=None, col=None, default='')¶
Extract columns from a string or a list of strings through pattern capture

This works like textops.cutdct except it return a dict where the key is the one captured with the name given in parameter ‘key_name’, and where the value is the full dict of captured values. The interest is to merge informations into a bigger dict : see merge_dicts()

Parameters:

sep (str or re.RegexObject) – a regular expression string or object having named capture parenthesis

key_name (str) – specify the named capture to use as the key for the returned dict Default value is ‘key’

Note

key_name= must be specified (not a positionnal parameter)

Returns: A dict or a list of dict

Examples
>>> s='item="col1" count="col2" price="col3"'
>>> pattern=r'item="(?P<item>[^"]*)" count="(?P<i_count>[^"]*)" price="(?P<i_price>[^"]*)"'
>>> s | cutkv(pattern,key_name='item')
{'col1': {'item': 'col1', 'i_price': 'col3', 'i_count': 'col2'}}
>>> s='item="col1" count="col2" price="col3"\nitem="col11" count="col22" price="col33"'
>>> s | cutkv(pattern,key_name='item')                                                         # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE
[{'col1': {'item': 'col1', 'i_price': 'col3', 'i_count': 'col2'}},...
{'col11': {'item': 'col11', 'i_price': 'col33', 'i_count': 'col22'}}]

cutm¶

class textops.cutm(sep=None, col=None, default='')¶
Extract exactly one column by using re.match()

It returns the matched pattern. Beware : the pattern must match the beginning of the line. One may use capture parenthesis to only return a part of the found pattern.

if the input is a simple string, textops.cutm will return a strings representing the captured substring.

if the input is a list of strings or a string with newlines, textops.cutm will return a list of captured substring.

Parameters:

sep (str or re.RegexObject) – a regular expression string or object having capture parenthesis

col (int or list of int or str) –
specify one or many columns you want to get back, You can specify :

an int as a single column number (starting with 0)

a list of int as the list of colmun

a string containing a comma separated list of int

None (default value) for all columns

default (str) – A string to display when requesting a column that does not exist

Returns:
a list of strings or a list of list of strings

Examples
>>> s='-col1- =col2= _col3_'
>>> s | cutm(r'[^=]*=[^=]*=')
'-col1- =col2='
>>> s | cutm(r'=[^=]*=')
''
>>> s | cutm(r'[^=]*=([^=]*)=')
'col2'
>>> s=['-col1- =col2= _col3_','-col11- =col22= _col33_']
>>> s | cutm(r'[^-]*-([^-]*)-')
['col1', 'col11']
>>> s | cutm(r'[^-]*-(badpattern)-',default='-')
['-', '-']

cutmi¶

class textops.cutmi(sep=None, col=None, default='')¶
Extract exactly one column by using re.match() (case insensitive)

This works like textops.cutm except it is case insensitive.

if the input is a simple string, textops.cutmi will return a strings representing the captured substring.

if the input is a list of strings or a string with newlines, textops.cutmi will return a list of captured substring.

Parameters:

sep (str or re.RegexObject) – a regular expression string or object having capture parenthesis

col (int or list of int or str) –
specify one or many columns you want to get back, You can specify :

an int as a single column number (starting with 0)

a list of int as the list of colmun

a string containing a comma separated list of int

None (default value) for all columns

default (str) – A string to display when requesting a column that does not exist

Returns:
a list of strings or a list of list of strings

Examples
>>> s='-col1- =col2= _col3_'
>>> s | cutm(r'.*(COL\d+)',default='no found')
'no found'
>>> s='-col1- =col2= _col3_'
>>> s | cutmi(r'.*(COL\d+)',default='no found')  #as .* is the longest possible, only last column is extracted
'col3'

cutre¶

class textops.cutre(sep=None, col=None, default='')¶
Extract columns from a string or a list of strings with re.split()

This works like the unix shell command ‘cut’. It uses re.split() function.

if the input is a simple string, cutre() will return a list of strings representing the splitted input string.

if the input is a list of strings or a string with newlines, cut() will return a list of list of strings : each line of the input will splitted and put in a list.

if only one column is extracted, one level of list is removed.

Parameters:

sep (str or re.RegexObject) – a regular expression string or object as a column separator

col (int or list of int or str) –
specify one or many columns you want to get back, You can specify :

an int as a single column number (starting with 0)

a list of int as the list of colmun

a string containing a comma separated list of int

None (default value) for all columns

default (str) – A string to display when requesting a column that does not exist

Returns:
A string, a list of strings or a list of list of strings

Examples
>>> s='col1.1 | col1.2 | col1.3\ncol2.1 | col2.2 | col2.3'
>>> print s
col1.1 | col1.2 | col1.3
col2.1 | col2.2 | col2.3
>>> s | cutre(r'\s+')
[['col1.1', '|', 'col1.2', '|', 'col1.3'], ['col2.1', '|', 'col2.2', '|', 'col2.3']]
>>> s | cutre(r'[\s|]+')
[['col1.1', 'col1.2', 'col1.3'], ['col2.1', 'col2.2', 'col2.3']]
>>> s | cutre(r'[\s|]+','0,2,4','-')
[['col1.1', 'col1.3', '-'], ['col2.1', 'col2.3', '-']]
>>> mysep = re.compile(r'[\s|]+')
>>> s | cutre(mysep)
[['col1.1', 'col1.2', 'col1.3'], ['col2.1', 'col2.2', 'col2.3']]

cuts¶

class textops.cuts(sep=None, col=None, default='')¶
Extract exactly one column by using re.search()

This works like textops.cutm except it searches the first occurence of the pattern in the string. One may use capture parenthesis to only return a part of the found pattern.

if the input is a simple string, textops.cuts will return a strings representing the captured substring.

if the input is a list of strings or a string with newlines, textops.cuts will return a list of captured substring.

Parameters:

sep (str or re.RegexObject) – a regular expression string or object having capture parenthesis

col (int or list of int or str) –
specify one or many columns you want to get back, You can specify :

an int as a single column number (starting with 0)

a list of int as the list of colmun

a string containing a comma separated list of int

None (default value) for all columns

default (str) – A string to display when requesting a column that does not exist

Returns:
a list of strings or a list of list of strings

Examples
>>> s='-col1- =col2= _col3_'
>>> s | cuts(r'_[^_]*_')
'_col3_'
>>> s | cuts(r'_([^_]*)_')
'col3'
>>> s=['-col1- =col2= _col3_','-col11- =col22= _col33_']
>>> s | cuts(r'_([^_]*)_')
['col3', 'col33']

cutsa¶

class textops.cutsa(sep=None, col=None, default='')¶
Extract all columns having the specified pattern.

It uses re.finditer() to find all occurences of the pattern.

if the input is a simple string, textops.cutfa will return a list of found strings.

if the input is a list of strings or a string with newlines, textops.cutfa will return a list of list of found string.

Parameters:

sep (str or re.RegexObject) – a regular expression string or object having capture parenthesis

col (int or list of int or str) –
specify one or many columns you want to get back, You can specify :

an int as a single column number (starting with 0)

a list of int as the list of colmun

a string containing a comma separated list of int

None (default value) for all columns

default (str) – A string to display when requesting a column that does not exist

Returns:
a list of strings or a list of list of strings

Examples
>>> s='-col1- =col2= _col3_'
>>> s | cutsa(r'col\d+')
['col1', 'col2', 'col3']
>>> s | cutsa(r'col(\d+)')
['1', '2', '3']
>>> s=['-col1- =col2= _col3_','-col11- =col22= _col33_']
>>> s | cutsa(r'col\d+')
[['col1', 'col2', 'col3'], ['col11', 'col22', 'col33']]

cutsai¶

class textops.cutsai(sep=None, col=None, default='')¶
Extract all columns having the specified pattern. (case insensitive)

It works like textops.cutsa but is case insensitive if the pattern is given as a string.

Parameters:

sep (str or re.RegexObject) – a regular expression string or object having capture parenthesis

col (int or list of int or str) –
specify one or many columns you want to get back, You can specify :

an int as a single column number (starting with 0)

a list of int as the list of colmun

a string containing a comma separated list of int

None (default value) for all columns

default (str) – A string to display when requesting a column that does not exist

Returns:
a list of strings or a list of list of strings

Examples
>>> s='-col1- =col2= _col3_'
>>> s | cutsa(r'COL\d+')
[]
>>> s | cutsai(r'COL\d+')
['col1', 'col2', 'col3']
>>> s | cutsai(r'COL(\d+)')
['1', '2', '3']
>>> s=['-col1- =col2= _col3_','-col11- =col22= _col33_']
>>> s | cutsai(r'COL\d+')
[['col1', 'col2', 'col3'], ['col11', 'col22', 'col33']]

cutsi¶

class textops.cutsi(sep=None, col=None, default='')¶
Extract exactly one column by using re.search() (case insensitive)

This works like textops.cuts except it is case insensitive.

Parameters:

sep (str or re.RegexObject) – a regular expression string or object having capture parenthesis

col (int or list of int or str) –
specify one or many columns you want to get back, You can specify :

an int as a single column number (starting with 0)

a list of int as the list of colmun

a string containing a comma separated list of int

None (default value) for all columns

default (str) – A string to display when requesting a column that does not exist

Returns:
a list of strings or a list of list of strings

Examples
>>> s='-col1- =col2= _col3_'
>>> s | cuts(r'_(COL[^_]*)_')
''
>>> s='-col1- =col2= _col3_'
>>> s | cutsi(r'_(COL[^_]*)_')
'col3'

echo¶

class textops.echo¶
identity operation

it returns the same text, except that is uses textops Extended classes (StrExt, ListExt …). This could be usefull in some cases to access str methods (upper, replace, …) just after a pipe.

Returns: length of the string

Return type: int

Examples
>>> s='this is a string'
>>> type(s)
<type 'str'>
>>> t=s | echo()
>>> type(t)
<class 'textops.base.StrExt'>
>>> s.upper()
'THIS IS A STRING'
>>> s | upper()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'upper' is not defined
>>> s | echo().upper()
'THIS IS A STRING'
>>> s | strop.upper()
'THIS IS A STRING'

length¶

class textops.length¶
Returns the length of a string, list or generator

Returns: length of the string

Return type: int

Examples
>>> s='this is a string'
>>> s | length()
16
>>> s=StrExt(s)
>>> s.length()
16
>>> ['a','b','c'] | length()
3
>>> def mygenerator():yield 3; yield 2
>>> mygenerator() | length()
2

matches¶

class textops.matches(pattern)¶
Tests whether a pattern is present or not

Uses re.match() to match a pattern against the string.

Parameters: pattern (str) – a regular expression string

Returns: The pattern found

Return type: re.RegexObject

Note

Be careful : the pattern is tested from the beginning of the string, the pattern is NOT searched somewhere in the middle of the string.

Examples
>>> state=StrExt('good')
>>> print 'OK' if state.matches(r'good|not_present|charging') else 'CRITICAL'
OK
>>> state=StrExt('looks like all is good')
>>> print 'OK' if state.matches(r'good|not_present|charging') else 'CRITICAL'
CRITICAL
>>> print 'OK' if state.matches(r'.*(good|not_present|charging)') else 'CRITICAL'
OK
>>> state=StrExt('Error')
>>> print 'OK' if state.matches(r'good|not_present|charging') else 'CRITICAL'
CRITICAL

searches¶

class textops.searches(pattern)¶

Search a pattern

Uses re.search() to find a pattern in the string.

Parameters:	pattern (str) – a regular expression string
Returns:	The pattern found
Return type:	re.RegexObject

Examples

>>> state=StrExt('good')
>>> print 'OK' if state.searches(r'good|not_present|charging') else 'CRITICAL'
OK
>>> state=StrExt('looks like all is good')
>>> print 'OK' if state.searches(r'good|not_present|charging') else 'CRITICAL'
OK
>>> print 'OK' if state.searches(r'.*(good|not_present|charging)') else 'CRITICAL'
OK
>>> state=StrExt('Error')
>>> print 'OK' if state.searches(r'good|not_present|charging') else 'CRITICAL'
CRITICAL

splitln¶

class textops.splitln¶
Transforms a string with newlines into a list of lines

It uses python str.splitlines() : newline separator can be \n or \r or both. They are removed during the process.

Returns: The splitted text

Return type: list

Example
>>> s='this is\na multi-line\nstring'
>>> s | splitln()
['this is', 'a multi-line', 'string']