strops¶
This module gathers text operations to be run on a string
cut¶
- class textops.cut(sep=None, col=None, default='')¶
Extract columns from a string or a list of strings
This works like the unix shell command ‘cut’. It uses str.split() function.
- if the input is a simple string, cut() will return a list of strings representing the splitted input string.
- if the input is a list of strings or a string with newlines, cut() will return a list of list of strings : each line of the input will splitted and put in a list.
- if only one column is extracted, one level of list is removed.
Parameters:
- sep (str) – a string as a column separator, default is None : this means ‘any kind of spaces’
- col (int or list of int or str) –
specify one or many columns you want to get back, You can specify :
- an int as a single column number (starting with 0)
- a list of int as the list of colmun
- a string containing a comma separated list of int
- None (default value) for all columns
- default (str) – A string to display when requesting a column that does not exist
Returns: A string, a list of strings or a list of list of strings
Examples
>>> s='col1 col2 col3' >>> s | cut() ['col1', 'col2', 'col3'] >>> s | cut(col=1) 'col2' >>> s | cut(col='1,2,10',default='N/A') ['col2', 'col3', 'N/A'] >>> s='col1.1 col1.2 col1.3\ncol2.1 col2.2 col2.3' >>> s | cut() [['col1.1', 'col1.2', 'col1.3'], ['col2.1', 'col2.2', 'col2.3']] >>> s | cut(col=1) ['col1.2', 'col2.2'] >>> s | cut(col='0,1') [['col1.1', 'col1.2'], ['col2.1', 'col2.2']] >>> s | cut(col=[1,2]) [['col1.2', 'col1.3'], ['col2.2', 'col2.3']] >>> s='col1.1 | col1.2 | col1.3\ncol2.1 | col2.2 | col2.3' >>> s | cut() [['col1.1', '|', 'col1.2', '|', 'col1.3'], ['col2.1', '|', 'col2.2', '|', 'col2.3']] >>> s | cut(sep=' | ') [['col1.1', 'col1.2', ' col1.3'], ['col2.1', 'col2.2', 'col2.3']]
cutca¶
- class textops.cutca(sep, col=None, default='')¶
Extract columns from a string or a list of strings through pattern capture
This works like textops.cutre except it needs a pattern having parenthesis to capture column. It uses re.match() for capture, this means the pattern must start at line beginning.
- if the input is a simple string, cutca() will return a list of strings representing the splitted input string.
- if the input is a list of strings or a string with newlines, cut() will return a list of list of strings : each line of the input will splitted and put in a list.
- if only one column is extracted, one level of list is removed.
Parameters:
- sep (str or re.RegexObject) – a regular expression string or object having capture parenthesis
- col (int or list of int or str) –
specify one or many columns you want to get back, You can specify :
- an int as a single column number (starting with 0)
- a list of int as the list of colmun
- a string containing a comma separated list of int
- None (default value) for all columns
- default (str) – A string to display when requesting a column that does not exist
Returns: a list of strings or a list of list of strings
Examples
>>> s='-col1- =col2= _col3_' >>> s | cutca(r'[^-]*-([^-]*)-[^=]*=([^=]*)=[^_]*_([^_]*)_') ['col1', 'col2', 'col3'] >>> s=['-col1- =col2= _col3_','-col11- =col22= _col33_'] >>> s | cutca(r'[^-]*-([^-]*)-[^=]*=([^=]*)=[^_]*_([^_]*)_','0,2,4','not present') [['col1', 'col3', 'not present'], ['col11', 'col33', 'not present']]
cutm¶
- class textops.cutm(sep, col=None, default='')¶
Extract exactly one column by using re.match()
This is a shortcut for cutca('mypattern',col=0)
- if the input is a simple string, textops.cutm will return a strings representing the captured substring.
- if the input is a list of strings or a string with newlines, textops.cutm will return a list of captured substring.
Parameters:
- sep (str or re.RegexObject) – a regular expression string or object having capture parenthesis
- col (int or list of int or str) –
specify one or many columns you want to get back, You can specify :
- an int as a single column number (starting with 0)
- a list of int as the list of colmun
- a string containing a comma separated list of int
- None (default value) for all columns
- default (str) – A string to display when requesting a column that does not exist
Returns: a list of strings or a list of list of strings
Examples
>>> s='-col1- =col2= _col3_' >>> s | cutm(r'[^-]*-([^-]*)-') 'col1' >>> s=['-col1- =col2= _col3_','-col11- =col22= _col33_'] >>> s | cutm(r'[^-]*-([^-]*)-') ['col1', 'col11'] >>> s | cutm(r'[^-]*-(badpattern)-',default='-') ['-', '-']
cutmi¶
- class textops.cutmi(sep, col=None, default='')¶
Extract exactly one column by using re.match() (case insensitive)
This works like textops.cutm except it is caseinsensitive.
- if the input is a simple string, textops.cutmi will return a strings representing the captured substring.
- if the input is a list of strings or a string with newlines, textops.cutmi will return a list of captured substring.
Parameters:
- sep (str or re.RegexObject) – a regular expression string or object having capture parenthesis
- col (int or list of int or str) –
specify one or many columns you want to get back, You can specify :
- an int as a single column number (starting with 0)
- a list of int as the list of colmun
- a string containing a comma separated list of int
- None (default value) for all columns
- default (str) – A string to display when requesting a column that does not exist
Returns: a list of strings or a list of list of strings
Examples
>>> s='-col1- =col2= _col3_' >>> s | cutm(r'.*(COL\d+)',default='no found') 'no found' >>> s='-col1- =col2= _col3_' >>> s | cutmi(r'.*(COL\d+)',default='no found') 'col3'
cutdct¶
- class textops.cutdct(sep=None, col=None, default='')¶
Extract columns from a string or a list of strings through pattern capture
This works like textops.cutca except it needs a pattern having named parenthesis to capture column.
- if the input is a simple string, cutca() will return a list of strings representing the splitted input string.
- if the input is a list of strings or a string with newlines, cut() will return a list of list of strings : each line of the input will splitted and put in a list.
- if only one column is extracted, one level of list is removed.
Parameters:
- sep (str or re.RegexObject) – a regular expression string or object having named capture parenthesis
- col (int or list of int or str) –
specify one or many columns you want to get back, You can specify :
- an int as a single column number (starting with 0)
- a list of int as the list of colmun
- a string containing a comma separated list of int
- None (default value) for all columns
- default (str) – A string to display when requesting a column that does not exist
Returns: A string, a list of strings or a list of list of strings
Examples
>>> s='item="col1" count="col2" price="col3"' >>> s | cutdct(r'item="(?P<item>[^"]*)" count="(?P<i_count>[^"]*)" price="(?P<i_price>[^"]*)"') {'item': 'col1', 'i_price': 'col3', 'i_count': 'col2'} >>> s='item="col1" count="col2" price="col3"\nitem="col11" count="col22" price="col33"' >>> s | cutdct(r'item="(?P<item>[^"]*)" count="(?P<i_count>[^"]*)" price="(?P<i_price>[^"]*)"') [{'item': 'col1', 'i_price': 'col3', 'i_count': 'col2'},... {'item': 'col11', 'i_price': 'col33', 'i_count': 'col22'}]
cutkv¶
- class textops.cutkv(sep=None, col=None, default='', key_name = 'key')¶
Extract columns from a string or a list of strings through pattern capture
This works like textops.cutdct except it return a dict where the key is the one captured with the name given in parameter ‘key_name’, and where the value is the full dict of captured values. The interest is to merge informations into a bigger dict : see merge_dicts()
Parameters:
- sep (str or re.RegexObject) – a regular expression string or object having named capture parenthesis
- key_name (str) – specify the named capture to use as the key for the returned dict Default value is ‘key’
Note
key_name= must be specified (not a positionnal parameter)
Returns: A dict or a list of dict Examples
>>> s='item="col1" count="col2" price="col3"' >>> pattern=r'item="(?P<item>[^"]*)" count="(?P<i_count>[^"]*)" price="(?P<i_price>[^"]*)"' >>> s | cutkv(pattern,key_name='item') {'col1': {'item': 'col1', 'i_price': 'col3', 'i_count': 'col2'}} >>> s='item="col1" count="col2" price="col3"\nitem="col11" count="col22" price="col33"' >>> s | cutkv(pattern,key_name='item') [{'col1': {'item': 'col1', 'i_price': 'col3', 'i_count': 'col2'}},... {'col11': {'item': 'col11', 'i_price': 'col33', 'i_count': 'col22'}}]
cutre¶
- class textops.cutre(sep=None, col=None, default='')¶
Extract columns from a string or a list of strings with re.split()
This works like the unix shell command ‘cut’. It uses re.split() function.
- if the input is a simple string, cutre() will return a list of strings representing the splitted input string.
- if the input is a list of strings or a string with newlines, cut() will return a list of list of strings : each line of the input will splitted and put in a list.
- if only one column is extracted, one level of list is removed.
Parameters:
- sep (str or re.RegexObject) – a regular expression string or object as a column separator
- col (int or list of int or str) –
specify one or many columns you want to get back, You can specify :
- an int as a single column number (starting with 0)
- a list of int as the list of colmun
- a string containing a comma separated list of int
- None (default value) for all columns
- default (str) – A string to display when requesting a column that does not exist
Returns: A string, a list of strings or a list of list of strings
Examples
>>> s='col1.1 | col1.2 | col1.3\ncol2.1 | col2.2 | col2.3' >>> print s col1.1 | col1.2 | col1.3 col2.1 | col2.2 | col2.3 >>> s | cutre(r'\s+') [['col1.1', '|', 'col1.2', '|', 'col1.3'], ['col2.1', '|', 'col2.2', '|', 'col2.3']] >>> s | cutre(r'[\s|]+') [['col1.1', 'col1.2', 'col1.3'], ['col2.1', 'col2.2', 'col2.3']] >>> s | cutre(r'[\s|]+','0,2,4','-') [['col1.1', 'col1.3', '-'], ['col2.1', 'col2.3', '-']] >>> mysep = re.compile(r'[\s|]+') >>> s | cutre(mysep) [['col1.1', 'col1.2', 'col1.3'], ['col2.1', 'col2.2', 'col2.3']]
cuts¶
- class textops.cuts(sep, col=None, default='')¶
Extract exactly one column by using re.search()
This works like textops.cutm except it searches the first occurence of the pattern in the string.
- if the input is a simple string, textops.cuts will return a strings representing the captured substring.
- if the input is a list of strings or a string with newlines, textops.cuts will return a list of captured substring.
Parameters:
- sep (str or re.RegexObject) – a regular expression string or object having capture parenthesis
- col (int or list of int or str) –
specify one or many columns you want to get back, You can specify :
- an int as a single column number (starting with 0)
- a list of int as the list of colmun
- a string containing a comma separated list of int
- None (default value) for all columns
- default (str) – A string to display when requesting a column that does not exist
Returns: a list of strings or a list of list of strings
Examples
>>> s='-col1- =col2= _col3_' >>> s | cuts(r'_([^_]*)_') 'col3' >>> s=['-col1- =col2= _col3_','-col11- =col22= _col33_'] >>> s | cuts(r'_([^_]*)_') ['col3', 'col33']
cutsi¶
- class textops.cutsi(sep, col=None, default='')¶
Extract exactly one column by using re.search()
This works like textops.cutm except it searches the first occurence of the pattern in the string.
- if the input is a simple string, textops.cutsi will return a strings representing the captured substring.
- if the input is a list of strings or a string with newlines, textops.cutsi will return a list of captured substring.
Parameters:
- sep (str or re.RegexObject) – a regular expression string or object having capture parenthesis
- col (int or list of int or str) –
specify one or many columns you want to get back, You can specify :
- an int as a single column number (starting with 0)
- a list of int as the list of colmun
- a string containing a comma separated list of int
- None (default value) for all columns
- default (str) – A string to display when requesting a column that does not exist
Returns: a list of strings or a list of list of strings
Examples
>>> s='-col1- =col2= _col3_' >>> s | cuts(r'_(COL[^_]*)_') '' >>> s='-col1- =col2= _col3_' >>> s | cutsi(r'_(COL[^_]*)_') 'col3'
echo¶
- class textops.echo()¶
identity operation
it returns the same text, except that is uses textops Extended classes (StrExt, ListExt ...). This could be usefull in some cases to access str methods (upper, replace, ...) just after a pipe.
Returns: length of the string Return type: int Examples
>>> s='this is a string' >>> type(s) <type 'str'> >>> t=s | echo() >>> type(t) <class 'textops.base.StrExt'> >>> s.upper() 'THIS IS A STRING' >>> s | upper() Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'upper' is not defined >>> s | echo().upper() 'THIS IS A STRING' >>> s | strop.upper() 'THIS IS A STRING'
length¶
- class textops.length()¶
Returns the length of a string, list or generator
Returns: length of the string Return type: int Examples
>>> s='this is a string' >>> s | length() 16 >>> s=StrExt(s) >>> s.length() 16 >>> ['a','b','c'] | length() 3 >>> def mygenerator():yield 3; yield 2 >>> mygenerator() | length() 2
matches¶
- class textops.matches(pattern)¶
Tests whether a pattern is present or not
Uses re.match() to match a pattern against the string.
Parameters: pattern (str) – a regular expression string Returns: The pattern found Return type: re.RegexObject Note
Be careful : the pattern is tested from the beginning of the string, the pattern is NOT searched somewhere in the middle of the string.
Examples
>>> state=StrExt('good') >>> print 'OK' if state.matches(r'good|not_present|charging') else 'CRITICAL' OK >>> state=StrExt('looks like all is good') >>> print 'OK' if state.matches(r'good|not_present|charging') else 'CRITICAL' CRITICAL >>> print 'OK' if state.matches(r'.*(good|not_present|charging)') else 'CRITICAL' OK >>> state=StrExt('Error') >>> print 'OK' if state.matches(r'good|not_present|charging') else 'CRITICAL' CRITICAL
searches¶
- class textops.searches(pattern)¶
Search a pattern
Uses re.search() to find a pattern in the string.
Parameters: pattern (str) – a regular expression string Returns: The pattern found Return type: re.RegexObject Examples
>>> state=StrExt('good') >>> print 'OK' if state.searches(r'good|not_present|charging') else 'CRITICAL' OK >>> state=StrExt('looks like all is good') >>> print 'OK' if state.searches(r'good|not_present|charging') else 'CRITICAL' OK >>> print 'OK' if state.searches(r'.*(good|not_present|charging)') else 'CRITICAL' OK >>> state=StrExt('Error') >>> print 'OK' if state.searches(r'good|not_present|charging') else 'CRITICAL' CRITICAL
splitln¶
- class textops.splitln()¶
Transforms a string with newlines into a list of lines
It uses python str.splitlines() : newline separator can be \n or \r or both. They are removed during the process.
Returns: The splitted text Return type: list Example
>>> s='this is\na multi-line\nstring' >>> s | splitln() ['this is', 'a multi-line', 'string']