Python的正则表达式re模块

　　　　　　　　　　　　　　Python的正则表达式（re模块）

　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　作者：尹正杰

　　Python使用re模块提供了正则表达式处理的能力。如果对正则表达式忘记的一干二净的话，可以花费几分钟时间在网上概览一下正则表达式基础，也可以参考我之前的笔记：https://www.cnblogs.com/yinzhengjie/p/11112046.html。

一.常量

多行模式：

　　re.M
　　re.MULTILINE

单行模式：

　　re.S
　　re.DOTAL

忽略大小写：

　　re.I
　　re.IGNORECASE

忽略表达式中的空白字符：
　　re.X
　　re.VERBOSE

二.单次匹配

1>.match方法

 #！/usr/bin/env python

 #_*_conding:utf-8_*_

 #@author :yinzhengjie

 #blog:http://www.cnblogs.com/yinzhengjie

 import re

 src = """bottle\nbag\nbig\napple"""

 for i,c in enumerate(src,1):

     print((i-1,c),end="\n" if i % 10 == 0 else " ")

 print()

 result = re.match("b",src)          #match匹配从字符串的开头匹配，找到第一个就不找了，返回match对象。

 print(1,result)

 result = re.match("a",src)          #没找到，返回None

 print(2,result)

 result = re.match("^a",src,re.M)    #依然从头开始找，多行模式没有用

 print(3,result)

 result = re.match("^a",src,re.S)    #依然从头开始找

 print(4,result)

 """

         设定flags，编译模式，返回正则表达式对象regex。第一个参数就是正则表达式字符串(pattern)，flags是选项。正则表达式需

     要被编译，为了提高效率，这些编译后的结果被保存，下次使用同样的pattern的时候，就不需要再次编译。

         re的其它方法为了提高效率都调用了编译方法，就是为了提速。

 """

 regex = re.compile("a",flags=0)     #先编译，然后使用正则表达式对象

 result = regex.match(src)           #依然从头开始找，regex对象match方法可以设置开始位置和结束位置，依旧返回match对象。

 print(5,regex)

 result = regex.match(src,15)        #把索引15作为开始找

 print(6,result)

 #以上代码执行结果如下:

 (0, 'b') (1, 'o') (2, 't') (3, 't') (4, 'l') (5, 'e') (6, '\n') (7, 'b') (8, 'a') (9, 'g')

 (10, '\n') (11, 'b') (12, 'i') (13, 'g') (14, '\n') (15, 'a') (16, 'p') (17, 'p') (18, 'l') (19, 'e')

 1 <re.Match object; span=(0, 1), match='b'>

 2 None

 3 None

 4 None

 5 re.compile('a')

 6 <re.Match object; span=(15, 16), match='a'>

2>.serach方法

 #！/usr/bin/env python

 #_*_conding:utf-8_*_

 #@author :yinzhengjie

 #blog:http://www.cnblogs.com/yinzhengjie

 import re

 src = """bottle\nbag\nbig\napple"""

 for i,c in enumerate(src,1):

     print((i-1,c),end="\n" if i % 10 == 0 else " ")

 print()

 result = re.search("a",src)     #从头搜索知道第一个匹配，返回mathc对象

 print(1,result)

 regex = re.compile("b")

 result = regex.search(src,1)

 print(2,result)

 regex = re.compile("^b",re.M)

 result = regex.search(src)      #regex对象search方法可以重设定开始位置和结束位置，返回mathc对象

 print(3,result)

 result = regex.search(src,8)

 print(4,result)

 #以上代码执行结果如下:

 (0, 'b') (1, 'o') (2, 't') (3, 't') (4, 'l') (5, 'e') (6, '\n') (7, 'b') (8, 'a') (9, 'g')

 (10, '\n') (11, 'b') (12, 'i') (13, 'g') (14, '\n') (15, 'a') (16, 'p') (17, 'p') (18, 'l') (19, 'e')

 1 <re.Match object; span=(8, 9), match='a'>

 2 <re.Match object; span=(7, 8), match='b'>

 3 <re.Match object; span=(0, 1), match='b'>

 4 <re.Match object; span=(11, 12), match='b'>

3>.fullmatch方法

 #！/usr/bin/env python

 #_*_conding:utf-8_*_

 #@author :yinzhengjie

 #blog:http://www.cnblogs.com/yinzhengjie

 import re

 src = """bottle\nbag\nbig\napple"""

 for i,c in enumerate(src,1):

     print((i-1,c),end="\n" if i % 10 == 0 else " ")

 print()

 result = re.fullmatch("bag",src)

 print(1,result)

 regex = re.compile("bag")

 result = regex.fullmatch(src)

 print(2,result)

 result = regex.fullmatch(src,7)

 print(3,result)

 result = regex.fullmatch(src,7,10)          #整个字符串和正则表达式匹配，多了少了都不行！当然，也可以指定搜索的起始，结束位置，找到了返回mathc对象，找不着就返回None

 print(4,result)

 #以上代码执行结果如下:

 (0, 'b') (1, 'o') (2, 't') (3, 't') (4, 'l') (5, 'e') (6, '\n') (7, 'b') (8, 'a') (9, 'g')

 (10, '\n') (11, 'b') (12, 'i') (13, 'g') (14, '\n') (15, 'a') (16, 'p') (17, 'p') (18, 'l') (19, 'e')

 1 None

 2 None

 3 None

 4 <re.Match object; span=(7, 10), match='bag'>

三.全文搜索

1>.findall方法

 #！/usr/bin/env python

 #_*_conding:utf-8_*_

 #@author :yinzhengjie

 #blog:http://www.cnblogs.com/yinzhengjie

 import re

 src = """bottle\nbag\nbig\napple"""

 for i,c in enumerate(src,1):

     print((i-1,c),end="\n" if i % 10 == 0 else " ")

 print()

 result = re.findall("b",src)        #对整个字符串，从左至右匹配，返回所有匹配项的列表

 print(1,result)

 regex = re.compile("^b")

 result = regex.findall(src)         #功能同上，只不过使用的是编译后的pattern，效率相对较高。

 print(2,result)

 regex = re.compile("^b",re.M)

 result = regex.findall(src,7)

 print(3,result)

 result = regex.findall(src,7,10)

 print(4,result)

 regex = re.compile("^b",re.S)

 result = regex.findall(src)

 print(5,result)

 #以上代码执行结果如下:

 (0, 'b') (1, 'o') (2, 't') (3, 't') (4, 'l') (5, 'e') (6, '\n') (7, 'b') (8, 'a') (9, 'g')

 (10, '\n') (11, 'b') (12, 'i') (13, 'g') (14, '\n') (15, 'a') (16, 'p') (17, 'p') (18, 'l') (19, 'e')

 1 ['b', 'b', 'b']

 2 ['b']

 3 ['b', 'b']

 4 ['b']

 5 ['b']

2>.finditer方法

 #！/usr/bin/env python

 #_*_conding:utf-8_*_

 #@author :yinzhengjie

 #blog:http://www.cnblogs.com/yinzhengjie

 import re

 src = """bottle\nbag\nbig\napple"""

 for i,c in enumerate(src,1):

     print((i-1,c),end="\n" if i % 10 == 0 else " ")

 print()

 regex = re.compile("^b",re.M)

 result = regex.finditer(src)            #对整个字符串，从左至右匹配，返回所有匹配项，返回迭代器，注意每次迭代返回的是match对象。

 print(type(result))

 r = next(result)

 print(type(r),r)

 print(r.start(),r.end(),src[r.start():r.end()])

 r = next(result)

 print(type(r),r)

 print(r.start(),r.end(),src[r.start():r.end()])

 #以上代码执行结果如下:

 (0, 'b') (1, 'o') (2, 't') (3, 't') (4, 'l') (5, 'e') (6, '\n') (7, 'b') (8, 'a') (9, 'g')

 (10, '\n') (11, 'b') (12, 'i') (13, 'g') (14, '\n') (15, 'a') (16, 'p') (17, 'p') (18, 'l') (19, 'e')

 <class 'callable_iterator'>

 <class 're.Match'> <re.Match object; span=(0, 1), match='b'>

 0 1 b

 <class 're.Match'> <re.Match object; span=(7, 8), match='b'>

 7 8 b

四.匹配替换

1>.sub方法

 #！/usr/bin/env python

 #_*_conding:utf-8_*_

 #@author :yinzhengjie

 #blog:http://www.cnblogs.com/yinzhengjie

 import re

 src = """bottle\nbag\nbig\napple"""

 for i,c in enumerate(src,1):

     print((i-1,c),end="\n" if i % 10 == 0 else " ")

 print()

 regex = re.compile("b\wg")

 result = regex.sub("yinzhengjie",src)       #使用pattern对字符串string进行匹配，对匹配项使用"yinzhengjie"替换。我们替换的数据类型可以为string,bytes,function

 print(1,result)

 print("*" * 20 + "我是分割线" + "*" * 20)

 result = regex.sub("jason",src,1)　　　　　　 #我们这里可以指定只替换1次哟~

 print(2,result)

 #以上代码执行结果如下:

 (0, 'b') (1, 'o') (2, 't') (3, 't') (4, 'l') (5, 'e') (6, '\n') (7, 'b') (8, 'a') (9, 'g')

 (10, '\n') (11, 'b') (12, 'i') (13, 'g') (14, '\n') (15, 'a') (16, 'p') (17, 'p') (18, 'l') (19, 'e')

 1 bottle

 yinzhengjie

 yinzhengjie

 apple

 ********************我是分割线********************

 2 bottle

 jason

 big

 apple

2>.subn方法

 #！/usr/bin/env python

 #_*_conding:utf-8_*_

 #@author :yinzhengjie

 #blog:http://www.cnblogs.com/yinzhengjie

 import re

 src = """bottle\nbag\nbig\napple"""

 for i,c in enumerate(src,1):

     print((i-1,c),end="\n" if i % 10 == 0 else " ")

 print()

 regex = re.compile("\s+")

 result = regex.subn("\t",src)       #功能和sub类似，只不过它返回的是一个元组，即被替换后的字符串及替换次数的元组。

 print(result)

 #以上代码执行结果如下:

 (0, 'b') (1, 'o') (2, 't') (3, 't') (4, 'l') (5, 'e') (6, '\n') (7, 'b') (8, 'a') (9, 'g')

 (10, '\n') (11, 'b') (12, 'i') (13, 'g') (14, '\n') (15, 'a') (16, 'p') (17, 'p') (18, 'l') (19, 'e')

 ('bottle\tbag\tbig\tapple', 3)

五.分割字符串

 #！/usr/bin/env python

 #_*_conding:utf-8_*_

 #@author :yinzhengjie

 #blog:http://www.cnblogs.com/yinzhengjie

 import re

 src = """

 os.path.abspath(path)

 normpath(join(os.getcwd(),path)).

 """

 print(src.split())                      #字符串的分割函数split，太难用，不能指定多个字符串进行分割。

 print(re.split("[\.()\s,]+",src))       #正则表达式的re模块就可以支持多个字符串进行分割哟~

 #以上代码执行结果如下:

 ['os.path.abspath(path)', 'normpath(join(os.getcwd(),path)).']

 ['', 'os', 'path', 'abspath', 'path', 'normpath', 'join', 'os', 'getcwd', 'path', '']

六.分组

1>.分组概述

　　使用小括号的pattern捕获的数据被放到了组group中。
　　match，search函数可以返回match对象；findall返回字符串列表；finditer返回一个个match对象。
　　如果pattern中使用了分组，如果有匹配的结果，会在match对象中。
　　　　1>.使用group(N)方式返回对应分组，1到N是对应的分组，0返回整个匹配的字符串，N不写缺省为0；
　　　　2>.如果使用了命名分组，可以使用group('name')的方式取分组
　　　　3>.也可以使用groups返回所有组
　　　　4>.groupdict()返回所有命名的分组

2>.分组代码案例

 #！/usr/bin/env python

 #_*_conding:utf-8_*_

 #@author :yinzhengjie

 #blog:http://www.cnblogs.com/yinzhengjie

 import re

 src = """bottle\nbag\nbig\napple"""

 for i,c in enumerate(src,1):

     print((i-1,c),end="\n" if i % 10 == 0 else " ")

 print()

 regex = re.compile("(b\w+)")

 result = regex.match(src)                   #从头匹配一次

 print(type(result))

 print(1,"match",result.groups())

 result = regex.search(src,1)                #从指定位置向后匹配一次

 print(2,"search",result.groups())

 #以上代码执行结果如下:

 (0, 'b') (1, 'o') (2, 't') (3, 't') (4, 'l') (5, 'e') (6, '\n') (7, 'b') (8, 'a') (9, 'g')

 (10, '\n') (11, 'b') (12, 'i') (13, 'g') (14, '\n') (15, 'a') (16, 'p') (17, 'p') (18, 'l') (19, 'e')

 <class 're.Match'>

 1 match ('bottle',)

 2 search ('bag',)

3>.命名分组

 #！/usr/bin/env python

 #_*_conding:utf-8_*_

 #@author :yinzhengjie

 #blog:http://www.cnblogs.com/yinzhengjie

 import re

 src = """bottle\nbag\nbig\napple"""

 for i,c in enumerate(src,1):

     print((i-1,c),end="\n" if i % 10 == 0 else " ")

 print()

 regex = re.compile("(b\w+)\n(?P<name2>b\w+)\n(?P<name3>b\w+)")

 result = regex.match(src)

 print(1,"match",result)

 print(2,result.group(3),result.group(2),result.group(1))

 print(3,result.group(0).encode())                           #有没有分组，都可以使用Match对象的group(0),因为0返回整个匹配字符串。

 print(4,result.group("name2"),result.group("name3"))

 print(5,result.groups())

 print(6,result.groupdict())

 print("*" * 20 + "我是分割线" + "*" * 20)

 result = regex.findall(src)             #如果有分组，fandall返回的是分组的内容，而不是匹配的字符串

 for item in result:

     print(type(item),item)

 regex = re.compile("(?P<head>b\w+)")

 result = regex.finditer(src)

 print("*" * 20 + "我是分割线" + "*" * 20)

 for item in result:

     print(type(item),item,item.group(),item.group("head"))

 #以上代码执行结果如下:

 (0, 'b') (1, 'o') (2, 't') (3, 't') (4, 'l') (5, 'e') (6, '\n') (7, 'b') (8, 'a') (9, 'g')

 (10, '\n') (11, 'b') (12, 'i') (13, 'g') (14, '\n') (15, 'a') (16, 'p') (17, 'p') (18, 'l') (19, 'e')

 1 match <re.Match object; span=(0, 14), match='bottle\nbag\nbig'>

 2 big bag bottle

 3 b'bottle\nbag\nbig'

 4 bag big

 5 ('bottle', 'bag', 'big')

 6 {'name2': 'bag', 'name3': 'big'}

 ********************我是分割线********************

 <class 'tuple'> ('bottle', 'bag', 'big')

 ********************我是分割线********************

 <class 're.Match'> <re.Match object; span=(0, 6), match='bottle'> bottle bottle

 <class 're.Match'> <re.Match object; span=(7, 10), match='bag'> bag bag

 <class 're.Match'> <re.Match object; span=(11, 14), match='big'> big big

Python的正则表达式re模块的更多相关文章

python中正则表达式re模块详解
正则表达式是处理字符串的强大工具,它有自己特定的语法结构,有了它,实现字符串的检索,替换,匹配验证都不在话下. 当然,对于爬虫来说,有了它,从HTML里提取想要的信息就非常方便了. 先看一下常用的匹配 ...
第二十一天python3 python的正则表达式re模块学习
python的正则表达式 python使用re模块提供了正则表达式处理的能力: 常量 re.M re.MULTILINE 多行模式 re.S re.DOTALL 单行模式 re.I re.IGNORE ...
Python：正则表达式 re 模块
正则是处理字符串最常用的方法,我们编码中到处可见正则的身影. 正则大同小异,python 中的正则跟其他语言相比略有差异: 1.替换字符串时,替换的字符串可以是一个函数 2.split 函数可以指定分 ...
Python：正则表达式—— re 模块
一.什么是正则表达式(Regular Expression) 正则表达式本身是一种小型的.高度专业化的编程语言,它内嵌在Python中,并通过 re(regular expression)模块实现.使 ...
python/pandas 正则表达式 re模块
目录正则解说中文字符集 re模块常用方法 1.正则解说数量词的贪婪模式与非贪婪模式正则表达式通常用于在文本中查找匹配的字符串.Python里数量词默认是贪婪的(在少数语言里也可能是默认非贪婪) ...
python ---24 正则表达式 re模块
一.正则表达式 1.字符组 ① [abc] 匹配a或b或c ② [a-z] 匹配a到z之间的所有字⺟ [0-9]匹配所有阿拉伯数字 2.元字符 3.量词 4.重要搭配 ① .*? ② .*?x ...
Python之正则表达式（re模块）
本节内容 re模块介绍使用re模块的步骤 re模块简单应用示例关于匹配对象的说明说说正则表达式字符串前的r前缀 re模块综合应用实例正则表达式(Regluar Expressions)又称规则 ...
Python中的re模块--正则表达式
Python中的re模块--正则表达式使用match从字符串开头匹配以匹配国内手机号为例,通常手机号为11位,以1开头.大概是这样13509094747,(这个号码是我随便写的,请不要拨打),我们 ...
python正则表达式Re模块备忘录
title: python正则表达式Re模块备忘录 date: 2019/1/31 18:17:08 toc: true --- python正则表达式Re模块备忘录备忘录 python中的数量词为 ...

随机推荐

LeetCode_453. Minimum Moves to Equal Array Elements
453. Minimum Moves to Equal Array Elements Easy Given a non-empty integer array of size n, find the ...
Chaikin Curves in Processing
转自:https://sighack.com/post/chaikin-curves In this post, we’ll look at what Chaikin curves are, how ...
微信JS从1.0.0升级到1.1.2的一个坑
因为1.0.0不支持电脑端日期Picker滚动,升级成了1.1.2,结果发现日期选择不起作用了经过跟踪发现通过控制台查看 resut[1].toString()居然是number类型,修改代码为() ...
我瞅瞅源码系列之---drf
我瞅瞅源码系列之---drf restful规范从cbv到drf的视图 / 快速了解drf 视图版本认证权限节流 jwt 持续更新中...
ComPtr的介绍以及使用
ComPtr是为COM而设计的智能指针.它支持WindowsRT,也支持传统Win32.相比ATL里的CComPtr类,它有了一些提升. ComPtr包含在Windows 8.x SDK and Wi ...
在.net core中数据操作的两种方式（Db first && Code first）
在开发过程中我们通常使用的是Db first这种模式,而在.net core 中推荐使用的却是 code first 反正我是很不习惯这种开发模式于是就搜寻整个微软的官方文档,终于找到了有关.net ...
Date类的相关方法记录
1.Date类中的时间单位是毫秒,System.currentTimeMills()方法就是获取当前时间到1970年1月1日0时0分0秒(西方时间)的毫秒数. public class Test6 { ...
2019 鹏博士java面试笔试题（含面试题解析）
本人5年开发经验.18年年底开始跑路找工作,在互联网寒冬下成功拿到阿里巴巴.今日头条.鹏博士等公司offer,岗位是Java后端开发,因为发展原因最终选择去了鹏博士,入职一年时间了,之前面试了很多 ...
P1347 排序 (拓扑排序,tarjan)
题目 P1347 排序解析打开一看拓扑排序,要判环. 三种情况有环(存在矛盾) 没环但在拓扑排序时存在有两个及以上的点入度为0(关系无法确定) 除了上两种情况(关系可确定) 本来懒了一下,直接在 ...
IOWebSocketChannel.connect handle errors
https://github.com/dart-lang/web_socket_channel/issues/38 yes, my workaround is to create a WebSocke ...

Python的正则表达式re模块

Python的正则表达式re模块的更多相关文章

随机推荐

热门专题