python 字符串探讨
本文内容基于python3
几乎所有有用的程序都会涉及到某些文本处理,不管是解析数据还是产生输出。字符串的学习是重点中的重点,这一节将重点关注文本的操作处理,比如提取字符串,搜索,替换以及解析等。大部分的问题都能简单的调用字符串的内建方法完成。但是,一些更为复杂的操作可能需要正则表达式或者强大的解析器来实现。
在python中,一切事物都是对象!
因此str是类,int是类,dict、list、tuple等等都是类,但是str却不能直接使用,因为它是抽象的表示了字符串这一类事物,并不能满足表示某个特定字符串的需求,我们必须要str1 = ''初始化一个对象,这时的str1具有str的属性,可以使用str中的方法。
字符串声明:
从字符串类中实例化出的对象
>>> str1 = 'jdq'
>>> str2 = str('jdq')
>>> type(str2)
<class 'str'>
查看类的所有方法:dir(类名)如下,就打印出了所有的类成员。
类成员分为:字段,属性,方法
>>> dir(str)
['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']
class str(basestring):
"""
str(object='') -> string Return a nice string representation of the object.
If the argument is a string, the return value is the same object.
"""
def capitalize(self):
""" 首字母变大写 """
"""
S.capitalize() -> string Return a copy of the string S with only its first character
capitalized.
"""
return "" def center(self, width, fillchar=None):
""" 内容居中,width:总长度;fillchar:空白处填充内容,默认无 """
"""
S.center(width[, fillchar]) -> string Return S centered in a string of length width. Padding is
done using the specified fill character (default is a space)
"""
return "" def count(self, sub, start=None, end=None):
""" 子序列个数 """
"""
S.count(sub[, start[, end]]) -> int Return the number of non-overlapping occurrences of substring sub in
string S[start:end]. Optional arguments start and end are interpreted
as in slice notation.
"""
return 0 def decode(self, encoding=None, errors=None):
""" 解码 """
"""
S.decode([encoding[,errors]]) -> object Decodes S using the codec registered for encoding. encoding defaults
to the default encoding. errors may be given to set a different error
handling scheme. Default is 'strict' meaning that encoding errors raise
a UnicodeDecodeError. Other possible values are 'ignore' and 'replace'
as well as any other name registered with codecs.register_error that is
able to handle UnicodeDecodeErrors.
"""
return object() def encode(self, encoding=None, errors=None):
""" 编码,针对unicode """
"""
S.encode([encoding[,errors]]) -> object Encodes S using the codec registered for encoding. encoding defaults
to the default encoding. errors may be given to set a different error
handling scheme. Default is 'strict' meaning that encoding errors raise
a UnicodeEncodeError. Other possible values are 'ignore', 'replace' and
'xmlcharrefreplace' as well as any other name registered with
codecs.register_error that is able to handle UnicodeEncodeErrors.
"""
return object() def endswith(self, suffix, start=None, end=None):
""" 是否以 xxx 结束 """
"""
S.endswith(suffix[, start[, end]]) -> bool Return True if S ends with the specified suffix, False otherwise.
With optional start, test S beginning at that position.
With optional end, stop comparing S at that position.
suffix can also be a tuple of strings to try.
"""
return False def expandtabs(self, tabsize=None):
""" 将tab转换成空格,默认一个tab转换成8个空格 """
"""
S.expandtabs([tabsize]) -> string Return a copy of S where all tab characters are expanded using spaces.
If tabsize is not given, a tab size of 8 characters is assumed.
"""
return "" def find(self, sub, start=None, end=None):
""" 寻找子序列位置,如果没找到,返回 -1 """
"""
S.find(sub [,start [,end]]) -> int Return the lowest index in S where substring sub is found,
such that sub is contained within S[start:end]. Optional
arguments start and end are interpreted as in slice notation. Return -1 on failure.
"""
return 0 def format(*args, **kwargs): # known special case of str.format
""" 字符串格式化,动态参数,将函数式编程时细说 """
"""
S.format(*args, **kwargs) -> string Return a formatted version of S, using substitutions from args and kwargs.
The substitutions are identified by braces ('{' and '}').
"""
pass def index(self, sub, start=None, end=None):
""" 子序列位置,如果没找到,报错 """
S.index(sub [,start [,end]]) -> int Like S.find() but raise ValueError when the substring is not found.
"""
return 0 def isalnum(self):
""" 是否是字母和数字 """
"""
S.isalnum() -> bool Return True if all characters in S are alphanumeric
and there is at least one character in S, False otherwise.
"""
return False def isalpha(self):
""" 是否是字母 """
"""
S.isalpha() -> bool Return True if all characters in S are alphabetic
and there is at least one character in S, False otherwise.
"""
return False def isdigit(self):
""" 是否是数字 """
"""
S.isdigit() -> bool Return True if all characters in S are digits
and there is at least one character in S, False otherwise.
"""
return False def islower(self):
""" 是否小写 """
"""
S.islower() -> bool Return True if all cased characters in S are lowercase and there is
at least one cased character in S, False otherwise.
"""
return False def isspace(self):
"""
S.isspace() -> bool Return True if all characters in S are whitespace
and there is at least one character in S, False otherwise.
"""
return False def istitle(self):
"""
S.istitle() -> bool Return True if S is a titlecased string and there is at least one
character in S, i.e. uppercase characters may only follow uncased
characters and lowercase characters only cased ones. Return False
otherwise.
"""
return False def isupper(self):
"""
S.isupper() -> bool Return True if all cased characters in S are uppercase and there is
at least one cased character in S, False otherwise.
"""
return False def join(self, iterable):
""" 连接 """
"""
S.join(iterable) -> string Return a string which is the concatenation of the strings in the
iterable. The separator between elements is S.
"""
return "" def ljust(self, width, fillchar=None):
""" 内容左对齐,右侧填充 """
"""
S.ljust(width[, fillchar]) -> string Return S left-justified in a string of length width. Padding is
done using the specified fill character (default is a space).
"""
return "" def lower(self):
""" 变小写 """
"""
S.lower() -> string Return a copy of the string S converted to lowercase.
"""
return "" def lstrip(self, chars=None):
""" 移除左侧空白 """
"""
S.lstrip([chars]) -> string or unicode Return a copy of the string S with leading whitespace removed.
If chars is given and not None, remove characters in chars instead.
If chars is unicode, S will be converted to unicode before stripping
"""
return "" def partition(self, sep):
""" 分割,前,中,后三部分 """
"""
S.partition(sep) -> (head, sep, tail) Search for the separator sep in S, and return the part before it,
the separator itself, and the part after it. If the separator is not
found, return S and two empty strings.
"""
pass def replace(self, old, new, count=None):
""" 替换 """
"""
S.replace(old, new[, count]) -> string Return a copy of string S with all occurrences of substring
old replaced by new. If the optional argument count is
given, only the first count occurrences are replaced.
"""
return "" def rfind(self, sub, start=None, end=None):
"""
S.rfind(sub [,start [,end]]) -> int Return the highest index in S where substring sub is found,
such that sub is contained within S[start:end]. Optional
arguments start and end are interpreted as in slice notation. Return -1 on failure.
"""
return 0 def rindex(self, sub, start=None, end=None):
"""
S.rindex(sub [,start [,end]]) -> int Like S.rfind() but raise ValueError when the substring is not found.
"""
return 0 def rjust(self, width, fillchar=None):
"""
S.rjust(width[, fillchar]) -> string Return S right-justified in a string of length width. Padding is
done using the specified fill character (default is a space)
"""
return "" def rpartition(self, sep):
"""
S.rpartition(sep) -> (head, sep, tail) Search for the separator sep in S, starting at the end of S, and return
the part before it, the separator itself, and the part after it. If the
separator is not found, return two empty strings and S.
"""
pass def rsplit(self, sep=None, maxsplit=None):
"""
S.rsplit([sep [,maxsplit]]) -> list of strings Return a list of the words in the string S, using sep as the
delimiter string, starting at the end of the string and working
to the front. If maxsplit is given, at most maxsplit splits are
done. If sep is not specified or is None, any whitespace string
is a separator.
"""
return [] def rstrip(self, chars=None):
"""
S.rstrip([chars]) -> string or unicode Return a copy of the string S with trailing whitespace removed.
If chars is given and not None, remove characters in chars instead.
If chars is unicode, S will be converted to unicode before stripping
"""
return "" def split(self, sep=None, maxsplit=None):
""" 分割, maxsplit最多分割几次 """
"""
S.split([sep [,maxsplit]]) -> list of strings Return a list of the words in the string S, using sep as the
delimiter string. If maxsplit is given, at most maxsplit
splits are done. If sep is not specified or is None, any
whitespace string is a separator and empty strings are removed
from the result.
"""
return [] def splitlines(self, keepends=False):
""" 根据换行分割 """
"""
S.splitlines(keepends=False) -> list of strings Return a list of the lines in S, breaking at line boundaries.
Line breaks are not included in the resulting list unless keepends
is given and true.
"""
return [] def startswith(self, prefix, start=None, end=None):
""" 是否起始 """
"""
S.startswith(prefix[, start[, end]]) -> bool Return True if S starts with the specified prefix, False otherwise.
With optional start, test S beginning at that position.
With optional end, stop comparing S at that position.
prefix can also be a tuple of strings to try.
"""
return False def strip(self, chars=None):
""" 移除两段空白 """
"""
S.strip([chars]) -> string or unicode Return a copy of the string S with leading and trailing
whitespace removed.
If chars is given and not None, remove characters in chars instead.
If chars is unicode, S will be converted to unicode before stripping
"""
return "" def swapcase(self):
""" 大写变小写,小写变大写 """
"""
S.swapcase() -> string Return a copy of the string S with uppercase characters
converted to lowercase and vice versa.
"""
return "" def title(self):
"""
S.title() -> string Return a titlecased version of S, i.e. words start with uppercase
characters, all remaining cased characters have lowercase.
"""
return "" def translate(self, table, deletechars=None):
"""
转换,需要先做一个对应表,最后一个表示删除字符集合
intab = "aeiou"
outtab = ""
trantab = maketrans(intab, outtab)
str = "this is string example....wow!!!"
print str.translate(trantab, 'xm')
""" """
S.translate(table [,deletechars]) -> string Return a copy of the string S, where all characters occurring
in the optional argument deletechars are removed, and the
remaining characters have been mapped through the given
translation table, which must be a string of length 256 or None.
If the table argument is None, no translation is applied and
the operation simply removes the characters in deletechars.
"""
return "" def upper(self):
"""
S.upper() -> string Return a copy of the string S converted to uppercase.
"""
return "" def zfill(self, width):
"""方法返回指定长度的字符串,原字符串右对齐,前面填充0。"""
"""
S.zfill(width) -> string Pad a numeric string S with zeros on the left, to fill a field
of the specified width. The string S is never truncated.
"""
return "" def _formatter_field_name_split(self, *args, **kwargs): # real signature unknown
pass def _formatter_parser(self, *args, **kwargs): # real signature unknown
pass def __add__(self, y):
""" x.__add__(y) <==> x+y """
pass def __contains__(self, y):
""" x.__contains__(y) <==> y in x """
pass def __eq__(self, y):
""" x.__eq__(y) <==> x==y """
pass def __format__(self, format_spec):
"""
S.__format__(format_spec) -> string Return a formatted version of S as described by format_spec.
"""
return "" def __getattribute__(self, name):
""" x.__getattribute__('name') <==> x.name """
pass def __getitem__(self, y):
""" x.__getitem__(y) <==> x[y] """
pass def __getnewargs__(self, *args, **kwargs): # real signature unknown
pass def __getslice__(self, i, j):
"""
x.__getslice__(i, j) <==> x[i:j] Use of negative indices is not supported.
"""
pass def __ge__(self, y):
""" x.__ge__(y) <==> x>=y """
pass def __gt__(self, y):
""" x.__gt__(y) <==> x>y """
pass def __hash__(self):
""" x.__hash__() <==> hash(x) """
pass def __init__(self, string=''): # known special case of str.__init__
"""
str(object='') -> string Return a nice string representation of the object.
If the argument is a string, the return value is the same object.
# (copied from class doc)
"""
pass def __len__(self):
""" x.__len__() <==> len(x) """
pass def __le__(self, y):
""" x.__le__(y) <==> x<=y """
pass def __lt__(self, y):
""" x.__lt__(y) <==> x<y """
pass def __mod__(self, y):
""" x.__mod__(y) <==> x%y """
pass def __mul__(self, n):
""" x.__mul__(n) <==> x*n """
pass @staticmethod # known case of __new__
def __new__(S, *more):
""" T.__new__(S, ...) -> a new object with type S, a subtype of T """
pass def __ne__(self, y):
""" x.__ne__(y) <==> x!=y """
pass def __repr__(self):
""" x.__repr__() <==> repr(x) """
pass def __rmod__(self, y):
""" x.__rmod__(y) <==> y%x """
pass def __rmul__(self, n):
""" x.__rmul__(n) <==> n*x """
pass def __sizeof__(self):
""" S.__sizeof__() -> size of S in memory, in bytes """
pass def __str__(self):
""" x.__str__() <==> str(x) """
pass str str
常用方法:
str.strip() #开始和结尾去除空格或者字符
str.lstrip() #开始
str.rstrip()#结尾
去两侧的空格及符号
>>> l = ' jdq '
>>> l.strip()
'jdq'
>>> l1 = '(jdq)'
>>> l1.strip('()') #去除字符串左右的括号
'jdq'
>>> l3.strip('q')
'jd'
字符串连接
方法1: 用字符串的join方法
a = ['a','b','c','d']
content = ''
content = ''.join(a)
print(content)
方法2: 用字符串的替换占位符替换
a = ['a','b','c','d']
content = ''
content = '%s%s%s%s' % tuple(a)
print (content)
引申出字符串格式化输出
我们格式化构建字符串可以有3种方法:
1 元组占位符
m = 'python'
astr = 'i love %s' % m
print(astr)
2 字符串的format方法
m = 'python'
astr = "i love {python}".format(python=m)
print(astr)
3 字典格式化字符串
m = 'python'
astr = "i love %(python)s " % {'python':m}
print(astr)
大家可以根据自己的实际情况来选择合适的方法,推荐用字符串的format方法或者字典的占位格式化,因为它不会受参数的位置影响,只需要参数名称相同就行。
字符串截取
我们可以通过索引来提取想要获取的字符,可以把python的字符串也做为字符串的列表就更好理解
python的字串列表有2种取值顺序
1是从左到右索引默认0开始的,最大范围是字符串长度少1
s = 'ilovepython'
s[0]的结果是i
2是从右到左索引默认-1开始的,最大范围是字符串开头
s = 'ilovepython'
s[-1]的结果是n
上面这个是取得一个字符,如果你的实际要取得一段子串的话,可以用到变量[头下标:尾下标],就可以截取相应的字符串,其中下标是从0开始算起,可以是正数或负数,下标可以为空表示取到头或尾。
比如
s = 'ilovepython'
s[1:5]的结果是love
当使用以冒号分隔的字符串,python返回一个新的对象,结果包含了以这对偏移标识的连续的内容,左边的开始是包含了下边界,比如
上面的结果包含了s[1]的值l,而取到的最大范围不包括上边界,就是s[5]的值p
字符串替换
字符串替换可以用内置的方法和正则表达式完成。
1用字符串本身的replace方法:
a = 'hello word'
b = a.replace('word','python')
print(b)
2用正则表达式来完成替换:
import re
a = 'hello word'
strinfo = re.compile('word')
b = strinfo.sub('python',a)
print(b)
字符串查找
python 字符串查找有4个方法,1 find,2 index方法,3 rfind方法,4 rindex方法。
1 find()方法:
info = 'abca'
print(info.find('a')) ##从下标0开始,查找在字符串里第一个出现的子串,返回结果:0
info = 'abca'
print(info.find('a',1)) ##从下标1开始,查找在字符串里第一个出现的子串:返回结果3
info = 'abca'
print(info.find('333')) ##返回-1,查找不到返回-1
2 index()方法:
python 的index方法是在字符串里查找子串第一次出现的位置,类似字符串的find方法,不过比find方法不同的是,如果查找不到子串,会抛出异常,而不是返回-1
info = 'abca'
print(info.index('a',1)) #3
print(info.index('33')) #ValueError: substring not found
字符串分割
字符串分割,可以用split,rsplit方法,通过相应的规则来切割成生成列表对象
记住:分割字符会被去除
info = 'name:haha,age:20$name:python,age:30$name:fef,age:55'
content = info.split('$')
print(content) #['name:haha,age:20', 'name:python,age:30', 'name:fef,age:55']
字符串翻转
通过步进反转[::-1]
a = 'abcd'
b = a[::-1]##[::-1]通过步进反转
print(b)
字符串编码
在新版本的python3中,取消了unicode类型,代替它的是使用unicode字符的字符串类型(str),字符串类型(str)成为基础类型如下所示,而编码后的变为了字节类型(bytes)但是两个函数的使用方法不变:
decode encode
bytes ------> str(unicode)------>bytes
a = '你好'
print(a.encode('utf-8').decode('utf-8'))
字符串大小写
通过下面的upper(),lower()等方法来转换大小写
S.upper()#S中的字母大写
S.lower() #S中的字母小写
S.capitalize() #首字母大写
S.istitle() #S是否是首字母大写的
S.isupper() #S中的字母是否便是大写
S.islower() #S中的字母是否全是小写
字符串其他方法
字符串相关的其他方法:count(),join()方法等。
S.center(width, [fillchar]) #中间对齐
S.count(substr, [start, [end]]) #计算substr在S中出现的次数
S.expandtabs([tabsize]) #把S中的tab字符替换没空格,每个tab替换为tabsize个空格,默认是8个
S.isalnum() #是否全是字母和数字,并至少有一个字符
S.isalpha() #是否全是字母,并至少有一个字符
S.isspace() #是否全是空白字符,并至少有一个字符
S.join()#S中的join,把列表生成一个字符串对象
S.ljust(width,[fillchar]) #输出width个字符,S左对齐,不足部分用fillchar填充,默认的为空格。
S.rjust(width,[fillchar]) #右对齐
S.splitlines([keepends]) #把S按照行分割符分为一个list,keepends是一个bool值,如果为真每行后而会保留行分割符。
S.swapcase() #大小写互换
partition(str) 方法用来根据指定的分隔符将字符串进行分割。
如果字符串包含指定的分隔符,则返回一个3元的元组,第一个为分隔符左边的子串,第二个为分隔符本身,第三个为分隔符右边的子串。
问题:你需要将一个字符串分割为多个字段,但是分隔符 (还有周围的空格) 并不是固定的。
string 对象的 split() 方法只适应于非常简单的字符串分割情形,它并不允许有多个分隔符或者是分隔符周围不确定的空格。当你需要更加灵活的切割字符串的时候,最好使用 re.split() 方法:
>>> line = 'asdf fjdk; afed, fjek,asdf, foo'
>>> import re
>>> re.split(r'[;,\s]\s*', line)
['asdf', 'fjdk', 'afed', 'fjek', 'asdf', 'foo']
函数re.split()是非常实用的,因为它允许你为分隔符指定多个正则模式。比如,在上面的例子中,分隔符可以是逗号,分号或者是空格,并且后面紧跟着任意个
的空格。只要这个模式被找到,那么匹配的分隔符两边的实体都会被当成是结果中的元素返回。返回结果为一个字段列表,这个跟str.split()返回值类型是一样的。
当你使用 re.split() 函数时候,需要特别注意的是正则表达式中是否包含一个括
号捕获分组。如果使用了捕获分组,那么被匹配的文本也将出现在结果列表中。比如,
观察一下这段代码运行后的结果:
>>> fields = re.split(r'(;|,|\s)\s*', line)
>>> fields
['asdf', ' ', 'fjdk', ';', 'afed', ',', 'fjek', ',', 'asdf', ',', 'foo']
>>>
当你使用re.split()函数时候,需要特别注意的是正则表达式中是否包含一个括号捕获分组。如果使用了捕获分组,那么被匹配的文本也将出现在结果列表中。比如,观察一下这段代码运行后的结果:
>>> fields = re.split(r'(;|,|\s)\s*', line)
>>> fields
['asdf', ' ', 'fjdk', ';', 'afed', ',', 'fjek', ',', 'asdf', ',', 'foo']
>>>
获取分割字符在某些情况下也是有用的。比如,你可能想保留分割字符串,用来在后面重新构造一个新的输出字符串:
>>> values = fields[::2]
>>> delimiters = fields[1::2] + ['']
>>> values
['asdf', 'fjdk', 'afed', 'fjek', 'asdf', 'foo']
>>> delimiters
[' ', ';', ',', ',', ',', '']
>>> ''.join(v+d for v,d in zip(values, delimiters))
'asdf fjdk;afed,fjek,asdf,foo'
>>>
问题:你需要通过指定的文本模式去检查字符串的开头或者结尾,比如文件名后缀,URLScheme 等等
检查字符串开头或结尾的一个简单方法是使用str.startswith()或者是str.endswith() 方法。比如:
>>> filename = 'spam.txt'
>>> filename.endswith('.txt')
True
>>> filename.startswith('file:')
False
>>> url = 'http://www.python.org
>>> url.startswith('http:')
True
如果你想检查多种匹配可能,只需要将所有的匹配项放入到一个元组中去,然后传给 startswith() 或者 endswith() 方法:
>>> import os
>>> filenames = os.listdir('.')
>>> filenames
[ 'Makefile', 'foo.c', 'bar.py', 'spam.c', 'spam.h' ]
>>> [name for name in filenames if name.endswith(('.c', '.h')) ]
['foo.c', 'spam.c', 'spam.h'
>>> any(name.endswith('.py') for name in filenames)
True
下面是另一个例子:
from urllib.request import urlopen
def read_data(name):
if name.startswith(('http:', 'https:', 'ftp:')):
return urlopen(name).read()
else:
with open(name) as f:
return f.read()
奇怪的是,这个方法中必须要输入一个元组作为参数。如果你恰巧有一个list 或者set类型的选择项,要确保传递参数前先调用tuple()将其转换为元组类型。比如:
>>> choices = ['http:', 'ftp:']
>>> url = 'http://www.python.org'
>>> url.startswith(choices)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: startswith first arg must be str or a tuple of str, not list
>>> url.startswith(tuple(choices))
True
最后提一下,当和其他操作比如普通数据聚合相结合的时候 startswith() 和endswith()方法是很不错的。比如,下面这个语句检查某个文件夹中是否存在指定的文件类型:
if any(name.endswith(('.c', '.h')) for name in listdir(dirname)):
...
问题:你想使用 Unix Shell 中常用的通配符 (比如 *.py , Dat[0-9]*.csv 等) 去匹配文本字符串
fnmatch 模块提供了两个函数—— fnmatch() 和 fnmatchcase() ,可以用来实现这样的匹配。用法如下:
>>> from fnmatch import fnmatch, fnmatchcase
>>> fnmatch('foo.txt', '*.txt')
True
>>> fnmatch('foo.txt', '?oo.txt')
True
>>> fnmatch('Dat45.csv', 'Dat[0-9]*')
True
>>> names = ['Dat1.csv', 'Dat2.csv', 'config.ini', 'foo.py']
>>> [name for name in names if fnmatch(name, 'Dat*.csv')]
['Dat1.csv', 'Dat2.csv']
>>>
fnmatch() 函数使用底层操作系统的大小写敏感规则 (不同的系统是不一样的) 来匹配模式。
如果你对这个区别很在意,可以使用 fnmatchcase() 来代替。它完全使用你的模式大小写匹配。比如:
>>> fnmatchcase('foo.txt', '*.TXT')
False
>>>
这两个函数通常会被忽略的一个特性是在处理非文件名的字符串时候它们也是很有用的。比如,假设你有一个街道地址的列表数据:
addresses = [
'5412 N CLARK ST',
'1060 W ADDISON ST',
'1039 W GRANVILLE AVE',
'2122 N CLARK ST',
'4802 N BROADWAY',
]
你可以像这样写列表推导:
>>> from fnmatch import fnmatchcase
>>> [addr for addr in addresses if fnmatchcase(addr, '* ST')]
['5412 N CLARK ST', '1060 W ADDISON ST', '2122 N CLARK ST']
>>> [addr for addr in addresses if fnmatchcase(addr, '54[0-9][0-9] *CLARK*')]
['5412 N CLARK ST']
>>>
fnmatch() 函数匹配能力介于简单的字符串方法和强大的正则表达式之间。如果在数据处理操作中只需要简单的通配符就能完成的时候,这通常是一个比较合理的方案。
字符串搜索和替换
对于简单的字面模式,直接使用 str.repalce() 方法即可,比如:
>>> text = 'yeah, but no, but yeah, but no, but yeah'
>>> text.replace('yeah', 'yep')
'yep, but no, but yep, but no, but yep'
>>>
对于复杂的模式,请使用 re 模块中的 sub() 函数。为了说明这个,假设你想将形式为 11/27/201 的日期字符串改成 2012-11-27 。示例如下:
>>> text = 'Today is 11/27/2012. PyCon starts 3/13/2013.'
>>> import re
>>> re.sub(r'(\d+)/(\d+)/(\d+)', r'\3-\1-\2', text)
'Today is 2012-11-27. PyCon starts 2013-3-13.'
>>>
sub() 函数中的第一个参数是被匹配的模式,第二个参数是替换模式。反斜杠数字比如 \3 指向前面模式的捕获组号。
最短匹配模式
这个问题一般出现在需要匹配一对分隔符之间的文本的时候 (比如引号包含的字符串)。为了说明清楚,考虑如下的例子:
>>> str_pat = re.compile(r'\"(.*)\"')
>>> text1 = 'Computer says "no."'
>>> str_pat.findall(text1)
['no.']
>>> text2 = 'Computer says "no." Phone says "yes."'
>>> str_pat.findall(text2)
['no." Phone says "yes.']
>>>
在这个例子中,模式 r'\"(.*)\"' 的意图是匹配被双引号包含的文本。但是在正则表达式中 * 操作符是贪婪的,因此匹配操作会查找最长的可能匹配。于是在第二个例子中搜索 text2 的时候返回结果并不是我们想要的。为了修正这个问题,可以在模式中的 * 操作符后面加上? 修饰符,就像这样:
>>> str_pat = re.compile(r'\"(.*?)\"')
>>> str_pat.findall(text2)
['no.', 'yes.']
>>>
这样就使得匹配变成非贪婪模式,从而得到最短的匹配,也就是我们想要的结果。
python 字符串探讨的更多相关文章
- 浅谈python字符串存储形式
http://blog.csdn.net/zhonghuan1992 钟桓 2014年8月31日 浅谈python字符串存储形式 记录一下自己今的天发现疑问而且给出自己现有知识有的回答. 长话短说,用 ...
- 关于python字符串连接的操作
python字符串连接的N种方式 注:本文转自http://www.cnblogs.com/dream397/p/3925436.html 这是一篇不错的文章 故转 python中有很多字符串连接方式 ...
- StackOverFlow排错翻译 - Python字符串替换: How do I replace everything between two strings without replacing the strings?
StackOverFlow排错翻译 - Python字符串替换: How do I replace everything between two strings without replacing t ...
- Python 字符串
Python访问字符串中的值 Python不支持单字符类型,单字符也在Python也是作为一个字符串使用. Python访问子字符串,可以使用方括号来截取字符串,如下实例: #!/usr/bin/py ...
- python字符串方法的简单使用
学习python字符串方法的使用,对书中列举的每种方法都做一个试用,将结果记录,方便以后查询. (1) s.capitalize() ;功能:返回字符串的的副本,并将首字母大写.使用如下: >& ...
- python字符串基础知识
1.python字符串可以用"aaa",'aaa',"""aaa""这三种方式来表示 2.python中的转义字符串为" ...
- Python 字符串格式化
Python 字符串格式化 Python的字符串格式化有两种方式: 百分号方式.format方式 百分号的方式相对来说比较老,而format方式则是比较先进的方式,企图替换古老的方式,目前两者并存 一 ...
- Python 字符串操作
Python 字符串操作(string替换.删除.截取.复制.连接.比较.查找.包含.大小写转换.分割等) 去空格及特殊符号 s.strip() .lstrip() .rstrip(',') 复制字符 ...
- 【C++实现python字符串函数库】strip、lstrip、rstrip方法
[C++实现python字符串函数库]strip.lstrip.rstrip方法 这三个方法用于删除字符串首尾处指定的字符,默认删除空白符(包括'\n', '\r', '\t', ' '). s.st ...
随机推荐
- MySQL数据库mysqlcheck的使用方法
MySQL数据库mysqlcheck的使用方法的相关知识是本文我们主要要介绍的内容,我们知道,mysqlcheck,是mysql自带的可以检查和修复MyISAM表,并且它还可以优化和分析表,mysql ...
- onload ready
确保在 <body> 元素的onload事件中没有注册函数,否则不会触发$(document).ready()事件. 可以在同一个页面中无限次地使用$(document).ready()事 ...
- HTTP协议中keep-alive
一 . http协议是有连接的协议,这样每一个连接过来都要重新打开一个tcp的http socket,短期内同一个host 对服务器的请求就会很慢,若是能够保持住连接,就可以节省socket open ...
- 解决PowerDesigner中DBMS设置的问题(Repost)
创建物理模型时DBMS下拉框是空的,没值,以前从来没遇到过这种现象,开始以为PowerDesigner安装软件的问题,不过装了又卸,卸了又装,结果还是那样,现在找到答案了:点击DBMS后面的黄色文件图 ...
- spoj TSUM - Triple Sums fft+容斥
题目链接 首先忽略 i < j < k这个条件.那么我们构造多项式$$A(x) = \sum_{1现在我们考虑容斥:1. $ (\sum_{}x)^3 = \sum_{}x^3 + 3\s ...
- Scala数组操作实战详解
增删改查,要注意的是,Array数组是定长数组,ArrayBuffer数组才是变长数组. 其他集合也存在可变不可变.例如,List,Set,Map 多维数组定义方法与Java类似.
- CentOS下Mysql安装调试
一.安装 yum安装:yum install -y mysql-server mysql mysql-devel 设置自启动:chkconfig mysqld on 启动MySQL:service ...
- ASP.NET 连接 SQL Server 和 Oracle 教程
临近期末,有很多同学都问我怎么关于ASP.NET 连接 SQL Server 和 Oracle 的问题.由于人太多了,我也不能一一去帮忙,就写了这篇博客.希望对大家有用处. 首先,前期准备是要安装数据 ...
- dropdownlist控件的几个属性selectedIndex、selectedItem、selectedValue、selectedItem.Text、selectedItem.value的区别
转自http://blog.csdn.net/iqv520/article/details/4419186 1. selectedIndex——指的是dropdownlist中选项的索引,为int,从 ...
- WIX 学习笔记 - 2 第一个WIX 项目 HelloWIX
程序员们都非常熟悉 Hello World!,基本上所有的语言书都以打印一个 Hello World! 作为第一个代码示例. 我们也要发扬代码界的优良传统,使用 Hello WIX! 作为我们的入门示 ...