strops

This module gathers text operations to be run on a string

cut

class textops.cut(sep=None, col=None, default='')

Extract columns from a string or a list of strings

This works like the unix shell command ‘cut’. It uses str.split() function.

  • if the input is a simple string, cut() will return a list of strings representing the splitted input string.
  • if the input is a list of strings or a string with newlines, cut() will return a list of list of strings : each line of the input will splitted and put in a list.
  • if only one column is extracted, one level of list is removed.
Parameters:
  • sep (str) – a string as a column separator, default is None : this means ‘any kind of spaces’
  • col (int or list of int or str) –

    specify one or many columns you want to get back, You can specify :

    • an int as a single column number (starting with 0)
    • a list of int as the list of colmun
    • a string containing a comma separated list of int
    • None (default value) for all columns
  • default (str) – A string to display when requesting a column that does not exist
Returns:

A string, a list of strings or a list of list of strings

Examples

>>> s='col1 col2 col3'
>>> s | cut()
['col1', 'col2', 'col3']
>>> s | cut(col=1)
'col2'
>>> s | cut(col='1,2,10',default='N/A')
['col2', 'col3', 'N/A']
>>> s='col1.1 col1.2 col1.3\ncol2.1 col2.2 col2.3'
>>> s | cut()
[['col1.1', 'col1.2', 'col1.3'], ['col2.1', 'col2.2', 'col2.3']]
>>> s | cut(col=1)
['col1.2', 'col2.2']
>>> s | cut(col='0,1')
[['col1.1', 'col1.2'], ['col2.1', 'col2.2']]
>>> s | cut(col=[1,2])
[['col1.2', 'col1.3'], ['col2.2', 'col2.3']]
>>> s='col1.1 | col1.2 |  col1.3\ncol2.1 | col2.2 | col2.3'
>>> s | cut()
[['col1.1', '|', 'col1.2', '|', 'col1.3'], ['col2.1', '|', 'col2.2', '|', 'col2.3']]
>>> s | cut(sep=' | ')
[['col1.1', 'col1.2', ' col1.3'], ['col2.1', 'col2.2', 'col2.3']]

cutca

class textops.cutca(sep, col=None, default='')

Extract columns from a string or a list of strings through pattern capture

This works like textops.cutre except it needs a pattern having parenthesis to capture column. It uses re.match() for capture, this means the pattern must start at line beginning.

  • if the input is a simple string, cutca() will return a list of strings representing the splitted input string.
  • if the input is a list of strings or a string with newlines, cut() will return a list of list of strings : each line of the input will splitted and put in a list.
  • if only one column is extracted, one level of list is removed.
Parameters:
  • sep (str or re.RegexObject) – a regular expression string or object having capture parenthesis
  • col (int or list of int or str) –

    specify one or many columns you want to get back, You can specify :

    • an int as a single column number (starting with 0)
    • a list of int as the list of colmun
    • a string containing a comma separated list of int
    • None (default value) for all columns
  • default (str) – A string to display when requesting a column that does not exist
Returns:

a list of strings or a list of list of strings

Examples

>>> s='-col1- =col2= _col3_'
>>> s | cutca(r'[^-]*-([^-]*)-[^=]*=([^=]*)=[^_]*_([^_]*)_')
['col1', 'col2', 'col3']
>>> s=['-col1- =col2= _col3_','-col11- =col22= _col33_']
>>> s | cutca(r'[^-]*-([^-]*)-[^=]*=([^=]*)=[^_]*_([^_]*)_','0,2,4','not present')
[['col1', 'col3', 'not present'], ['col11', 'col33', 'not present']]

cutm

class textops.cutm(sep, col=None, default='')

Extract exactly one column by using re.match()

This is a shortcut for cutca('mypattern',col=0)

  • if the input is a simple string, textops.cutm will return a strings representing the captured substring.
  • if the input is a list of strings or a string with newlines, textops.cutm will return a list of captured substring.
Parameters:
  • sep (str or re.RegexObject) – a regular expression string or object having capture parenthesis
  • col (int or list of int or str) –

    specify one or many columns you want to get back, You can specify :

    • an int as a single column number (starting with 0)
    • a list of int as the list of colmun
    • a string containing a comma separated list of int
    • None (default value) for all columns
  • default (str) – A string to display when requesting a column that does not exist
Returns:

a list of strings or a list of list of strings

Examples

>>> s='-col1- =col2= _col3_'
>>> s | cutm(r'[^-]*-([^-]*)-')
'col1'
>>> s=['-col1- =col2= _col3_','-col11- =col22= _col33_']
>>> s | cutm(r'[^-]*-([^-]*)-')
['col1', 'col11']
>>> s | cutm(r'[^-]*-(badpattern)-',default='-')
['-', '-']

cutmi

class textops.cutmi(sep, col=None, default='')

Extract exactly one column by using re.match() (case insensitive)

This works like textops.cutm except it is caseinsensitive.

  • if the input is a simple string, textops.cutmi will return a strings representing the captured substring.
  • if the input is a list of strings or a string with newlines, textops.cutmi will return a list of captured substring.
Parameters:
  • sep (str or re.RegexObject) – a regular expression string or object having capture parenthesis
  • col (int or list of int or str) –

    specify one or many columns you want to get back, You can specify :

    • an int as a single column number (starting with 0)
    • a list of int as the list of colmun
    • a string containing a comma separated list of int
    • None (default value) for all columns
  • default (str) – A string to display when requesting a column that does not exist
Returns:

a list of strings or a list of list of strings

Examples

>>> s='-col1- =col2= _col3_'
>>> s | cutm(r'.*(COL\d+)',default='no found')
'no found'
>>> s='-col1- =col2= _col3_'
>>> s | cutmi(r'.*(COL\d+)',default='no found')
'col3'

cutdct

class textops.cutdct(sep=None, col=None, default='')

Extract columns from a string or a list of strings through pattern capture

This works like textops.cutca except it needs a pattern having named parenthesis to capture column.

  • if the input is a simple string, cutca() will return a list of strings representing the splitted input string.
  • if the input is a list of strings or a string with newlines, cut() will return a list of list of strings : each line of the input will splitted and put in a list.
  • if only one column is extracted, one level of list is removed.
Parameters:
  • sep (str or re.RegexObject) – a regular expression string or object having named capture parenthesis
  • col (int or list of int or str) –

    specify one or many columns you want to get back, You can specify :

    • an int as a single column number (starting with 0)
    • a list of int as the list of colmun
    • a string containing a comma separated list of int
    • None (default value) for all columns
  • default (str) – A string to display when requesting a column that does not exist
Returns:

A string, a list of strings or a list of list of strings

Examples

>>> s='item="col1" count="col2" price="col3"'
>>> s | cutdct(r'item="(?P<item>[^"]*)" count="(?P<i_count>[^"]*)" price="(?P<i_price>[^"]*)"')
{'item': 'col1', 'i_price': 'col3', 'i_count': 'col2'}
>>> s='item="col1" count="col2" price="col3"\nitem="col11" count="col22" price="col33"'
>>> s | cutdct(r'item="(?P<item>[^"]*)" count="(?P<i_count>[^"]*)" price="(?P<i_price>[^"]*)"') 
[{'item': 'col1', 'i_price': 'col3', 'i_count': 'col2'},...
{'item': 'col11', 'i_price': 'col33', 'i_count': 'col22'}]

cutkv

class textops.cutkv(sep=None, col=None, default='', key_name = 'key')

Extract columns from a string or a list of strings through pattern capture

This works like textops.cutdct except it return a dict where the key is the one captured with the name given in parameter ‘key_name’, and where the value is the full dict of captured values. The interest is to merge informations into a bigger dict : see merge_dicts()

Parameters:
  • sep (str or re.RegexObject) – a regular expression string or object having named capture parenthesis
  • key_name (str) – specify the named capture to use as the key for the returned dict Default value is ‘key’

Note

key_name= must be specified (not a positionnal parameter)

Returns:A dict or a list of dict

Examples

>>> s='item="col1" count="col2" price="col3"'
>>> pattern=r'item="(?P<item>[^"]*)" count="(?P<i_count>[^"]*)" price="(?P<i_price>[^"]*)"'
>>> s | cutkv(pattern,key_name='item')
{'col1': {'item': 'col1', 'i_price': 'col3', 'i_count': 'col2'}}
>>> s='item="col1" count="col2" price="col3"\nitem="col11" count="col22" price="col33"'
>>> s | cutkv(pattern,key_name='item')                                                         
[{'col1': {'item': 'col1', 'i_price': 'col3', 'i_count': 'col2'}},...
{'col11': {'item': 'col11', 'i_price': 'col33', 'i_count': 'col22'}}]

cutre

class textops.cutre(sep=None, col=None, default='')

Extract columns from a string or a list of strings with re.split()

This works like the unix shell command ‘cut’. It uses re.split() function.

  • if the input is a simple string, cutre() will return a list of strings representing the splitted input string.
  • if the input is a list of strings or a string with newlines, cut() will return a list of list of strings : each line of the input will splitted and put in a list.
  • if only one column is extracted, one level of list is removed.
Parameters:
  • sep (str or re.RegexObject) – a regular expression string or object as a column separator
  • col (int or list of int or str) –

    specify one or many columns you want to get back, You can specify :

    • an int as a single column number (starting with 0)
    • a list of int as the list of colmun
    • a string containing a comma separated list of int
    • None (default value) for all columns
  • default (str) – A string to display when requesting a column that does not exist
Returns:

A string, a list of strings or a list of list of strings

Examples

>>> s='col1.1 | col1.2 | col1.3\ncol2.1 | col2.2 | col2.3'
>>> print s
col1.1 | col1.2 | col1.3
col2.1 | col2.2 | col2.3
>>> s | cutre(r'\s+')
[['col1.1', '|', 'col1.2', '|', 'col1.3'], ['col2.1', '|', 'col2.2', '|', 'col2.3']]
>>> s | cutre(r'[\s|]+')
[['col1.1', 'col1.2', 'col1.3'], ['col2.1', 'col2.2', 'col2.3']]
>>> s | cutre(r'[\s|]+','0,2,4','-')
[['col1.1', 'col1.3', '-'], ['col2.1', 'col2.3', '-']]
>>> mysep = re.compile(r'[\s|]+')
>>> s | cutre(mysep)
[['col1.1', 'col1.2', 'col1.3'], ['col2.1', 'col2.2', 'col2.3']]

cuts

class textops.cuts(sep, col=None, default='')

Extract exactly one column by using re.search()

This works like textops.cutm except it searches the first occurence of the pattern in the string.

  • if the input is a simple string, textops.cuts will return a strings representing the captured substring.
  • if the input is a list of strings or a string with newlines, textops.cuts will return a list of captured substring.
Parameters:
  • sep (str or re.RegexObject) – a regular expression string or object having capture parenthesis
  • col (int or list of int or str) –

    specify one or many columns you want to get back, You can specify :

    • an int as a single column number (starting with 0)
    • a list of int as the list of colmun
    • a string containing a comma separated list of int
    • None (default value) for all columns
  • default (str) – A string to display when requesting a column that does not exist
Returns:

a list of strings or a list of list of strings

Examples

>>> s='-col1- =col2= _col3_'
>>> s | cuts(r'_([^_]*)_')
'col3'
>>> s=['-col1- =col2= _col3_','-col11- =col22= _col33_']
>>> s | cuts(r'_([^_]*)_')
['col3', 'col33']

cutsi

class textops.cutsi(sep, col=None, default='')

Extract exactly one column by using re.search()

This works like textops.cutm except it searches the first occurence of the pattern in the string.

  • if the input is a simple string, textops.cutsi will return a strings representing the captured substring.
  • if the input is a list of strings or a string with newlines, textops.cutsi will return a list of captured substring.
Parameters:
  • sep (str or re.RegexObject) – a regular expression string or object having capture parenthesis
  • col (int or list of int or str) –

    specify one or many columns you want to get back, You can specify :

    • an int as a single column number (starting with 0)
    • a list of int as the list of colmun
    • a string containing a comma separated list of int
    • None (default value) for all columns
  • default (str) – A string to display when requesting a column that does not exist
Returns:

a list of strings or a list of list of strings

Examples

>>> s='-col1- =col2= _col3_'
>>> s | cuts(r'_(COL[^_]*)_')
''
>>> s='-col1- =col2= _col3_'
>>> s | cutsi(r'_(COL[^_]*)_')
'col3'

echo

class textops.echo()

identity operation

it returns the same text, except that is uses textops Extended classes (StrExt, ListExt ...). This could be usefull in some cases to access str methods (upper, replace, ...) just after a pipe.

Returns:length of the string
Return type:int

Examples

>>> s='this is a string'
>>> type(s)
<type 'str'>
>>> t=s | echo()
>>> type(t)
<class 'textops.base.StrExt'>
>>> s.upper()
'THIS IS A STRING'
>>> s | upper()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'upper' is not defined
>>> s | echo().upper()
'THIS IS A STRING'
>>> s | strop.upper()
'THIS IS A STRING'

length

class textops.length()

Returns the length of a string, list or generator

Returns:length of the string
Return type:int

Examples

>>> s='this is a string'
>>> s | length()
16
>>> s=StrExt(s)
>>> s.length()
16
>>> ['a','b','c'] | length()
3
>>> def mygenerator():yield 3; yield 2
>>> mygenerator() | length()
2

matches

class textops.matches(pattern)

Tests whether a pattern is present or not

Uses re.match() to match a pattern against the string.

Parameters:pattern (str) – a regular expression string
Returns:The pattern found
Return type:re.RegexObject

Note

Be careful : the pattern is tested from the beginning of the string, the pattern is NOT searched somewhere in the middle of the string.

Examples

>>> state=StrExt('good')
>>> print 'OK' if state.matches(r'good|not_present|charging') else 'CRITICAL'
OK
>>> state=StrExt('looks like all is good')
>>> print 'OK' if state.matches(r'good|not_present|charging') else 'CRITICAL'
CRITICAL
>>> print 'OK' if state.matches(r'.*(good|not_present|charging)') else 'CRITICAL'
OK
>>> state=StrExt('Error')
>>> print 'OK' if state.matches(r'good|not_present|charging') else 'CRITICAL'
CRITICAL

searches

class textops.searches(pattern)

Search a pattern

Uses re.search() to find a pattern in the string.

Parameters:pattern (str) – a regular expression string
Returns:The pattern found
Return type:re.RegexObject

Examples

>>> state=StrExt('good')
>>> print 'OK' if state.searches(r'good|not_present|charging') else 'CRITICAL'
OK
>>> state=StrExt('looks like all is good')
>>> print 'OK' if state.searches(r'good|not_present|charging') else 'CRITICAL'
OK
>>> print 'OK' if state.searches(r'.*(good|not_present|charging)') else 'CRITICAL'
OK
>>> state=StrExt('Error')
>>> print 'OK' if state.searches(r'good|not_present|charging') else 'CRITICAL'
CRITICAL

splitln

class textops.splitln()

Transforms a string with newlines into a list of lines

It uses python str.splitlines() : newline separator can be \n or \r or both. They are removed during the process.

Returns:The splitted text
Return type:list

Example

>>> s='this is\na multi-line\nstring'
>>> s | splitln()
['this is', 'a multi-line', 'string']