strops

This module gathers text operations to be run on a string

StrOp

class textops.StrOp

cut

class textops.cut(sep=None, col=None, default='')

Extract columns from a string or a list of strings

This works like the unix shell command ‘cut’. It uses str.split() function.

  • if the input is a simple string, cut() will return a list of strings representing the splitted input string.
  • if the input is a list of strings or a string with newlines, cut() will return a list of list of strings : each line of the input will splitted and put in a list.
  • if only one column is extracted, one level of list is removed.
Parameters:
  • sep (str) – a string as a column separator, default is None : this means ‘any kind of spaces’
  • col (int or list of int or str) –

    specify one or many columns you want to get back, You can specify :

    • an int as a single column number (starting with 0)
    • a list of int as the list of colmun
    • a string containing a comma separated list of int
    • None (default value) for all columns
  • default (str) – A string to display when requesting a column that does not exist
Returns:

A string, a list of strings or a list of list of strings

Examples

>>> s='col1 col2 col3'
>>> s | cut()
['col1', 'col2', 'col3']
>>> s | cut(col=1)
'col2'
>>> s | cut(col='1,2,10',default='N/A')
['col2', 'col3', 'N/A']
>>> s='col1.1 col1.2 col1.3\ncol2.1 col2.2 col2.3'
>>> s | cut()
[['col1.1', 'col1.2', 'col1.3'], ['col2.1', 'col2.2', 'col2.3']]
>>> s | cut(col=1)
['col1.2', 'col2.2']
>>> s | cut(col='0,1')
[['col1.1', 'col1.2'], ['col2.1', 'col2.2']]
>>> s | cut(col=[1,2])
[['col1.2', 'col1.3'], ['col2.2', 'col2.3']]
>>> s='col1.1 | col1.2 |  col1.3\ncol2.1 | col2.2 | col2.3'
>>> s | cut()
[['col1.1', '|', 'col1.2', '|', 'col1.3'], ['col2.1', '|', 'col2.2', '|', 'col2.3']]
>>> s | cut(sep=' | ')
[['col1.1', 'col1.2', ' col1.3'], ['col2.1', 'col2.2', 'col2.3']]

cutca

class textops.cutca(sep=None, col=None, default='')

Extract columns from a string or a list of strings through pattern capture

This works like textops.cutre except it needs a pattern having parenthesis to capture column. It uses re.match() for capture, this means the pattern must start at line beginning.

  • if the input is a simple string, cutca() will return a list of strings representing the splitted input string.
  • if the input is a list of strings or a string with newlines, cut() will return a list of list of strings : each line of the input will splitted and put in a list.
  • if only one column is extracted, one level of list is removed.
Parameters:
  • sep (str or re.RegexObject) – a regular expression string or object having capture parenthesis
  • col (int or list of int or str) –

    specify one or many columns you want to get back, You can specify :

    • an int as a single column number (starting with 0)
    • a list of int as the list of colmun
    • a string containing a comma separated list of int
    • None (default value) for all columns
  • default (str) – A string to display when requesting a column that does not exist
Returns:

a list of strings or a list of list of strings

Examples

>>> s='-col1- =col2= _col3_'
>>> s | cutca(r'[^-]*-([^-]*)-[^=]*=([^=]*)=[^_]*_([^_]*)_')
['col1', 'col2', 'col3']
>>> s=['-col1- =col2= _col3_','-col11- =col22= _col33_']
>>> s | cutca(r'[^-]*-([^-]*)-[^=]*=([^=]*)=[^_]*_([^_]*)_','0,2,4','not present')
[['col1', 'col3', 'not present'], ['col11', 'col33', 'not present']]

cutdct

class textops.cutdct(sep=None, col=None, default='')

Extract columns from a string or a list of strings through pattern capture

This works like textops.cutca except it needs a pattern having named parenthesis to capture column.

  • if the input is a simple string, cutca() will return a list of strings representing the splitted input string.
  • if the input is a list of strings or a string with newlines, cut() will return a list of list of strings : each line of the input will splitted and put in a list.
  • if only one column is extracted, one level of list is removed.
Parameters:
  • sep (str or re.RegexObject) – a regular expression string or object having named capture parenthesis
  • col (int or list of int or str) –

    specify one or many columns you want to get back, You can specify :

    • an int as a single column number (starting with 0)
    • a list of int as the list of colmun
    • a string containing a comma separated list of int
    • None (default value) for all columns
  • default (str) – A string to display when requesting a column that does not exist
Returns:

A string, a list of strings or a list of list of strings

Examples

>>> s='item="col1" count="col2" price="col3"'
>>> s | cutdct(r'item="(?P<item>[^"]*)" count="(?P<i_count>[^"]*)" price="(?P<i_price>[^"]*)"')
{'item': 'col1', 'i_price': 'col3', 'i_count': 'col2'}
>>> s='item="col1" count="col2" price="col3"\nitem="col11" count="col22" price="col33"'
>>> s | cutdct(r'item="(?P<item>[^"]*)" count="(?P<i_count>[^"]*)" price="(?P<i_price>[^"]*)"') # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE
[{'item': 'col1', 'i_price': 'col3', 'i_count': 'col2'},...
{'item': 'col11', 'i_price': 'col33', 'i_count': 'col22'}]

cutkv

class textops.cutkv(sep=None, col=None, default='')

Extract columns from a string or a list of strings through pattern capture

This works like textops.cutdct except it return a dict where the key is the one captured with the name given in parameter ‘key_name’, and where the value is the full dict of captured values. The interest is to merge informations into a bigger dict : see merge_dicts()

Parameters:
  • sep (str or re.RegexObject) – a regular expression string or object having named capture parenthesis
  • key_name (str) – specify the named capture to use as the key for the returned dict Default value is ‘key’

Note

key_name= must be specified (not a positionnal parameter)

Returns:A dict or a list of dict

Examples

>>> s='item="col1" count="col2" price="col3"'
>>> pattern=r'item="(?P<item>[^"]*)" count="(?P<i_count>[^"]*)" price="(?P<i_price>[^"]*)"'
>>> s | cutkv(pattern,key_name='item')
{'col1': {'item': 'col1', 'i_price': 'col3', 'i_count': 'col2'}}
>>> s='item="col1" count="col2" price="col3"\nitem="col11" count="col22" price="col33"'
>>> s | cutkv(pattern,key_name='item')                                                         # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE
[{'col1': {'item': 'col1', 'i_price': 'col3', 'i_count': 'col2'}},...
{'col11': {'item': 'col11', 'i_price': 'col33', 'i_count': 'col22'}}]

cutm

class textops.cutm(sep=None, col=None, default='')

Extract exactly one column by using re.match()

It returns the matched pattern. Beware : the pattern must match the beginning of the line. One may use capture parenthesis to only return a part of the found pattern.

  • if the input is a simple string, textops.cutm will return a strings representing the captured substring.
  • if the input is a list of strings or a string with newlines, textops.cutm will return a list of captured substring.
Parameters:
  • sep (str or re.RegexObject) – a regular expression string or object having capture parenthesis
  • col (int or list of int or str) –

    specify one or many columns you want to get back, You can specify :

    • an int as a single column number (starting with 0)
    • a list of int as the list of colmun
    • a string containing a comma separated list of int
    • None (default value) for all columns
  • default (str) – A string to display when requesting a column that does not exist
Returns:

a list of strings or a list of list of strings

Examples

>>> s='-col1- =col2= _col3_'
>>> s | cutm(r'[^=]*=[^=]*=')
'-col1- =col2='
>>> s | cutm(r'=[^=]*=')
''
>>> s | cutm(r'[^=]*=([^=]*)=')
'col2'
>>> s=['-col1- =col2= _col3_','-col11- =col22= _col33_']
>>> s | cutm(r'[^-]*-([^-]*)-')
['col1', 'col11']
>>> s | cutm(r'[^-]*-(badpattern)-',default='-')
['-', '-']

cutmi

class textops.cutmi(sep=None, col=None, default='')

Extract exactly one column by using re.match() (case insensitive)

This works like textops.cutm except it is case insensitive.

  • if the input is a simple string, textops.cutmi will return a strings representing the captured substring.
  • if the input is a list of strings or a string with newlines, textops.cutmi will return a list of captured substring.
Parameters:
  • sep (str or re.RegexObject) – a regular expression string or object having capture parenthesis
  • col (int or list of int or str) –

    specify one or many columns you want to get back, You can specify :

    • an int as a single column number (starting with 0)
    • a list of int as the list of colmun
    • a string containing a comma separated list of int
    • None (default value) for all columns
  • default (str) – A string to display when requesting a column that does not exist
Returns:

a list of strings or a list of list of strings

Examples

>>> s='-col1- =col2= _col3_'
>>> s | cutm(r'.*(COL\d+)',default='no found')
'no found'
>>> s='-col1- =col2= _col3_'
>>> s | cutmi(r'.*(COL\d+)',default='no found')  #as .* is the longest possible, only last column is extracted
'col3'

cutre

class textops.cutre(sep=None, col=None, default='')

Extract columns from a string or a list of strings with re.split()

This works like the unix shell command ‘cut’. It uses re.split() function.

  • if the input is a simple string, cutre() will return a list of strings representing the splitted input string.
  • if the input is a list of strings or a string with newlines, cut() will return a list of list of strings : each line of the input will splitted and put in a list.
  • if only one column is extracted, one level of list is removed.
Parameters:
  • sep (str or re.RegexObject) – a regular expression string or object as a column separator
  • col (int or list of int or str) –

    specify one or many columns you want to get back, You can specify :

    • an int as a single column number (starting with 0)
    • a list of int as the list of colmun
    • a string containing a comma separated list of int
    • None (default value) for all columns
  • default (str) – A string to display when requesting a column that does not exist
Returns:

A string, a list of strings or a list of list of strings

Examples

>>> s='col1.1 | col1.2 | col1.3\ncol2.1 | col2.2 | col2.3'
>>> print s
col1.1 | col1.2 | col1.3
col2.1 | col2.2 | col2.3
>>> s | cutre(r'\s+')
[['col1.1', '|', 'col1.2', '|', 'col1.3'], ['col2.1', '|', 'col2.2', '|', 'col2.3']]
>>> s | cutre(r'[\s|]+')
[['col1.1', 'col1.2', 'col1.3'], ['col2.1', 'col2.2', 'col2.3']]
>>> s | cutre(r'[\s|]+','0,2,4','-')
[['col1.1', 'col1.3', '-'], ['col2.1', 'col2.3', '-']]
>>> mysep = re.compile(r'[\s|]+')
>>> s | cutre(mysep)
[['col1.1', 'col1.2', 'col1.3'], ['col2.1', 'col2.2', 'col2.3']]

cuts

class textops.cuts(sep=None, col=None, default='')

Extract exactly one column by using re.search()

This works like textops.cutm except it searches the first occurence of the pattern in the string. One may use capture parenthesis to only return a part of the found pattern.

  • if the input is a simple string, textops.cuts will return a strings representing the captured substring.
  • if the input is a list of strings or a string with newlines, textops.cuts will return a list of captured substring.
Parameters:
  • sep (str or re.RegexObject) – a regular expression string or object having capture parenthesis
  • col (int or list of int or str) –

    specify one or many columns you want to get back, You can specify :

    • an int as a single column number (starting with 0)
    • a list of int as the list of colmun
    • a string containing a comma separated list of int
    • None (default value) for all columns
  • default (str) – A string to display when requesting a column that does not exist
Returns:

a list of strings or a list of list of strings

Examples

>>> s='-col1- =col2= _col3_'
>>> s | cuts(r'_[^_]*_')
'_col3_'
>>> s | cuts(r'_([^_]*)_')
'col3'
>>> s=['-col1- =col2= _col3_','-col11- =col22= _col33_']
>>> s | cuts(r'_([^_]*)_')
['col3', 'col33']

cutsa

class textops.cutsa(sep=None, col=None, default='')

Extract all columns having the specified pattern.

It uses re.finditer() to find all occurences of the pattern.

  • if the input is a simple string, textops.cutfa will return a list of found strings.
  • if the input is a list of strings or a string with newlines, textops.cutfa will return a list of list of found string.
Parameters:
  • sep (str or re.RegexObject) – a regular expression string or object having capture parenthesis
  • col (int or list of int or str) –

    specify one or many columns you want to get back, You can specify :

    • an int as a single column number (starting with 0)
    • a list of int as the list of colmun
    • a string containing a comma separated list of int
    • None (default value) for all columns
  • default (str) – A string to display when requesting a column that does not exist
Returns:

a list of strings or a list of list of strings

Examples

>>> s='-col1- =col2= _col3_'
>>> s | cutsa(r'col\d+')
['col1', 'col2', 'col3']
>>> s | cutsa(r'col(\d+)')
['1', '2', '3']
>>> s=['-col1- =col2= _col3_','-col11- =col22= _col33_']
>>> s | cutsa(r'col\d+')
[['col1', 'col2', 'col3'], ['col11', 'col22', 'col33']]

cutsai

class textops.cutsai(sep=None, col=None, default='')

Extract all columns having the specified pattern. (case insensitive)

It works like textops.cutsa but is case insensitive if the pattern is given as a string.

Parameters:
  • sep (str or re.RegexObject) – a regular expression string or object having capture parenthesis
  • col (int or list of int or str) –

    specify one or many columns you want to get back, You can specify :

    • an int as a single column number (starting with 0)
    • a list of int as the list of colmun
    • a string containing a comma separated list of int
    • None (default value) for all columns
  • default (str) – A string to display when requesting a column that does not exist
Returns:

a list of strings or a list of list of strings

Examples

>>> s='-col1- =col2= _col3_'
>>> s | cutsa(r'COL\d+')
[]
>>> s | cutsai(r'COL\d+')
['col1', 'col2', 'col3']
>>> s | cutsai(r'COL(\d+)')
['1', '2', '3']
>>> s=['-col1- =col2= _col3_','-col11- =col22= _col33_']
>>> s | cutsai(r'COL\d+')
[['col1', 'col2', 'col3'], ['col11', 'col22', 'col33']]

cutsi

class textops.cutsi(sep=None, col=None, default='')

Extract exactly one column by using re.search() (case insensitive)

This works like textops.cuts except it is case insensitive.

Parameters:
  • sep (str or re.RegexObject) – a regular expression string or object having capture parenthesis
  • col (int or list of int or str) –

    specify one or many columns you want to get back, You can specify :

    • an int as a single column number (starting with 0)
    • a list of int as the list of colmun
    • a string containing a comma separated list of int
    • None (default value) for all columns
  • default (str) – A string to display when requesting a column that does not exist
Returns:

a list of strings or a list of list of strings

Examples

>>> s='-col1- =col2= _col3_'
>>> s | cuts(r'_(COL[^_]*)_')
''
>>> s='-col1- =col2= _col3_'
>>> s | cutsi(r'_(COL[^_]*)_')
'col3'

echo

class textops.echo

identity operation

it returns the same text, except that is uses textops Extended classes (StrExt, ListExt …). This could be usefull in some cases to access str methods (upper, replace, …) just after a pipe.

Returns:length of the string
Return type:int

Examples

>>> s='this is a string'
>>> type(s)
<type 'str'>
>>> t=s | echo()
>>> type(t)
<class 'textops.base.StrExt'>
>>> s.upper()
'THIS IS A STRING'
>>> s | upper()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'upper' is not defined
>>> s | echo().upper()
'THIS IS A STRING'
>>> s | strop.upper()
'THIS IS A STRING'

length

class textops.length

Returns the length of a string, list or generator

Returns:length of the string
Return type:int

Examples

>>> s='this is a string'
>>> s | length()
16
>>> s=StrExt(s)
>>> s.length()
16
>>> ['a','b','c'] | length()
3
>>> def mygenerator():yield 3; yield 2
>>> mygenerator() | length()
2

matches

class textops.matches(pattern)

Tests whether a pattern is present or not

Uses re.match() to match a pattern against the string.

Parameters:pattern (str) – a regular expression string
Returns:The pattern found
Return type:re.RegexObject

Note

Be careful : the pattern is tested from the beginning of the string, the pattern is NOT searched somewhere in the middle of the string.

Examples

>>> state=StrExt('good')
>>> print 'OK' if state.matches(r'good|not_present|charging') else 'CRITICAL'
OK
>>> state=StrExt('looks like all is good')
>>> print 'OK' if state.matches(r'good|not_present|charging') else 'CRITICAL'
CRITICAL
>>> print 'OK' if state.matches(r'.*(good|not_present|charging)') else 'CRITICAL'
OK
>>> state=StrExt('Error')
>>> print 'OK' if state.matches(r'good|not_present|charging') else 'CRITICAL'
CRITICAL

searches

class textops.searches(pattern)

Search a pattern

Uses re.search() to find a pattern in the string.

Parameters:pattern (str) – a regular expression string
Returns:The pattern found
Return type:re.RegexObject

Examples

>>> state=StrExt('good')
>>> print 'OK' if state.searches(r'good|not_present|charging') else 'CRITICAL'
OK
>>> state=StrExt('looks like all is good')
>>> print 'OK' if state.searches(r'good|not_present|charging') else 'CRITICAL'
OK
>>> print 'OK' if state.searches(r'.*(good|not_present|charging)') else 'CRITICAL'
OK
>>> state=StrExt('Error')
>>> print 'OK' if state.searches(r'good|not_present|charging') else 'CRITICAL'
CRITICAL

splitln

class textops.splitln

Transforms a string with newlines into a list of lines

It uses python str.splitlines() : newline separator can be \n or \r or both. They are removed during the process.

Returns:The splitted text
Return type:list

Example

>>> s='this is\na multi-line\nstring'
>>> s | splitln()
['this is', 'a multi-line', 'string']