python之re正则简单够用

1.参考

string	re	备注
	re.match(pattern, string, flags=0)	at the start of the string
S.find(sub [,start [,end]]) -> int	re.search(pattern, string, flags=0)	Scan through string looking for a match
S.replace(old, new[, count]) -> string	re.findall(pattern, string, flags=0)	re.finditer

2.分组 m.group()

In [560]: m.group?

Docstring:

group([group1, ...]) -> str or tuple.

Return subgroup(s) of the match by indices or names.

For 0 returns the entire match.

Type:      builtin_function_or_method

In [542]: m=re.search(r'(-{1,2}(gr))','pro---gram-files')

In [543]: m.group()  #自带

Out[543]: '--gr'

In [544]: m.group(0)  #自带，返回整个匹配到的字符串 For 0 returns the entire match. 注意 m.string 是被检索的完整原文。。。

Out[544]: '--gr'

In [545]: m.group(1)

Out[545]: '--gr'

In [546]: m.group(2)

Out[546]: 'gr'

In [547]: m.group(3)  #加的 （ 不满足则报错

---------------------------------------------------------------------------

IndexError                                Traceback (most recent call last)

<ipython-input-547-71a2c7935517> in <module>()

----> 1 m.group(3)

IndexError: no such group

In [548]: m.group(1,2)  #选择多个分组，返回tuple

Out[548]: ('--gr', 'gr')

In [549]: m.groups()  #选择所有分组

Out[549]: ('--gr', 'gr')

m.groupdict 用于命名分组

In [557]: m.groupdict?

Docstring:

groupdict([default=None]) -> dict.

Return a dictionary containing all the named subgroups of the match,

keyed by the subgroup name. The default argument is used for groups

that did not participate in the match

Type:      builtin_function_or_method

In [558]: m=re.search(r'(-{1,2}(?P<GR>gr))','pro---gram-files')

In [559]: m.groupdict()

Out[559]: {'GR': 'gr'}

3.提取 re.findall()

re.findall(pattern, string, flags=0)

In [97]: text = "He was carefully disguised but captured quickly by police."

In [98]: re.findall(r"\w+ly", text)  #相当于 m.group(0)

Out[98]: ['carefully', 'quickly']

In [99]: re.findall(r"(\w+)ly", text)  #手动加单个括号限定内容，相当于返回 m.group(1)

Out[99]: ['careful', 'quick']

In [100]: re.findall(r"((\w+)(ly))", text)  #多个括号，从左到右数 (，相当于返回 m.groups()

Out[100]: [('carefully', 'careful', 'ly'), ('quickly', 'quick', 'ly')]

In [102]: re.findall(r"((1\w+)(ly))", text)
Out[102]: []

4.替换 re.sub()

re.sub(pattern, repl, string, count=0, flags=0)

repl 里面的前向引用 Backreferences, such as \6, are replaced with the substring matched by group 6 in the pattern. 也可以通过 func 实现。

注意 mysql regexp 不支持 \1

https://stackoverflow.com/questions/4122393/negative-backreferences-in-mysql-regexp 提到 unless you can install/use LIB_MYSQLUDF_PREG.

https://stackoverflow.com/questions/7058209/reference-to-groups-in-a-mysql-regex

In [158]: def func(m):

     ...:     return m.group('DEF')+' '+m.group(2)  #别名

     ...:

In [159]: re.sub(r'(?P<DEF>def)\s+([a-z]+)\s*\(\s*\):', func, 'def func(): def f():')

Out[159]: 'def func def f'

In [160]: re.sub(r'(?P<DEF>def)\s+([a-z]+)\s*\(\s*\):', r'\1 \2', 'def func(): def f():')  #不支持 \别名

Out[160]: 'def func def f'

5. Backreferences 前向引用在 pattern

5.1扑克牌找对子

In [204]: re.search(r'(.).*\1','ab123')

In [205]: re.search(r'(.).*\1','ab121')

Out[205]: <_sre.SRE_Match at 0x71ca120>

In [206]: _.group()

Out[206]: ''

5.2连续多个相同

In [207]: re.search(r'.{3}','')  #错误

Out[207]: <_sre.SRE_Match at 0x71b94a8>

In [208]: re.search(r'(.){3}','') #错误

Out[208]: <_sre.SRE_Match at 0x71ca198>

In [209]: re.search(r'(.)\1\1','') #正确

In [210]: re.search(r'(.)\1\1','')

Out[210]: <_sre.SRE_Match at 0x71ca210>

In [211]: re.search(r'(.)\1{2}','')

Out[211]: <_sre.SRE_Match at 0x71ca288>

In [212]: _.group()

Out[212]: ''

python之re正则简单够用的更多相关文章

Python正则简单实例分析
Python正则简单实例分析本文实例讲述了Python正则简单用法.分享给大家供大家参考,具体如下: 悄悄打入公司内部UED的一个Python爱好者小众群,前两天一位牛人发了条消息: 小的测试题: ...
python浅谈正则的常用方法
python浅谈正则的常用方法覆盖范围70%以上上一次很多朋友写文字屏蔽说到要用正则表达,其实不是我不想用(我正则用得不是很多,看过我之前爬虫的都知道,我直接用BeautifulSoup的网页标签去 ...
python匹配ip正则
python匹配ip正则 #!/usr/bin/env python # -*- coding:utf-8 -*- import re ip_str = "asdad1.1.1.1sdfwe ...
python 多线程就这么简单（续）
之前讲了多线程的一篇博客,感觉讲的意犹未尽,其实,多线程非常有意思.因为我们在使用电脑的过程中无时无刻都在多进程和多线程.我们可以接着之前的例子继续讲.请先看我的上一篇博客. python 多线程就这 ...
python模块介绍- HTMLParser 简单的HTML和XHTML解析器
python模块介绍- HTMLParser 简单的HTML和XHTML解析器 2013-09-11 磁针石 #承接软件自动化实施与培训等gtalk:ouyangchongwu#gmail.comqq ...
基于Python使用SVM识别简单的字符验证码的完整代码开源分享
关键字:Python,SVM,字符验证码,机器学习,验证码识别 1 概述基于Python使用SVM识别简单的验证字符串的完整代码开源分享. 因为目前有了更厉害的新技术来解决这类问题了,但是本文作 ...
Python的变量及简单数据类型
Python的变量及简单类型 1. 变量在Python编程中,变量是用来存放值或对像的容器.变量的名称可以自定义,但需遵循一定的规范,否则可能会引发一些错误.Python的变量可以分为数字.字符和 ...
Python与C的简单比较(Python3.0)
Python可以说是目前最火的语言之一了,人工智能的兴起让Python一夜之间变得家喻户晓,Python号称目前最最简单易学的语言,现在有不少高校开始将Python作为大一新生的入门语言.本萌新也刚开 ...
Python 基于Python及zookeeper实现简单分布式任务调度系统设计思路及核心代码实现
基于Python及zookeeper实现简单分布式任务调度系统设计思路及核心代码实现 by:授客 QQ:1033553122 测试环境功能需求实现思路代码实践(关键技术点实现) 代码模块组织 ...

随机推荐

LODOP设置超文本不自动分页的方法
在LODOP中,超文本超过打印项高度会自动分页,自动分页有两种情况:超过设置的打印项高度,超过纸张.这里是指高度,超过纸张宽度的超文本不会显示,会隐藏掉. 如果你不了解什么是LODOP中的超文本打印项 ...
Lodop条形码竖条和值右端不对齐的解决方法
当Lodop条形码设置的宽度比较短,数值比较多的时候,会出现条码的竖条和右端不对齐.个人测试了一下,发现解决办法有三种:1.增加条形码的宽度.2.隐藏条码本身的值,用text文本代替.3.修改条形码下 ...
red()、redinle()、redlines()三者之间的关系
# 关于read()方法: # 1.读取整个文件,将文件内容放到一个字符串变量中 # 2.如果文件大于可用内存,不可能使用这种处理 file_object = open("a.txt&quo ...
Announcing the public preview of Azure Dev Spaces
Today, we are excited to announce the public preview of Azure Dev Spaces, a cloud-native development ...
【XSY3147】子集计数 DFT 组合数学
题目大意给定一个集合 \(\{1,2,\ldots,n\}\),要求你从中选出 \(m\) 个数,且这 \(m\) 个数的和是 \(k\).问方案数 \(\bmod 998244353\) \(0\ ...
apt-get软件包管理命令和 apt-key命令
apt-get命令是Debian Linux发行版中的APT软件包管理工具. 所有基于Debian的发行都使用这个包管理系统.deb包可以把一个应用的文件包在一起,大体就如同Windows上的安装文件 ...
C#图片操作公共库
存一下,以后找起来方便包括图片加载.压缩.base64等 public static class ImageFun { #region 图片 public static EncoderParamet ...
RSA Encrypting/Decrypting、RSA+AES Encrypting/Decrypting
catalogue . CryptoAPI介绍 . RSA Encrypting/Decrypting File 1. CryptoAPI介绍 0x1: Cryptography Service Pr ...
windows linux 文件编码转换
查看文件编码在Linux中查看文件编码可以通过以下几种方式:1.在Vim中可以直接查看文件编码:set fileencoding即可显示文件编码格式.如果你只是想查看其它编码格式的文件或者想解决用Vi ...
Docker：Docker 性质及版本选择 [三]
一.Docker的性质 Docker的组成其实很简单.你需要搭建registry,专属于你自己的私有仓库,然后就是docker的镜像和docker的容器.Docker的镜像,就类似与windos的系统 ...