【翻译】LPeg编程指南

原文：http://www.inf.puc-rio.br/~roberto/lpeg/lpeg.html

译者序：

这个是官方的LPeg的文档。这段时间学习LPeg的时候发现国内关于LPeg的文章很少，所以决定把文档翻译一下。

翻译的不是很完整，只是常用的一部分，会慢慢的翻译下去，有同学能帮我补全的话就太感谢了。

介绍：

LPeg是lua中一个新的模式匹配（pattern-matching）的库，基于 Parsing Expression Grammars (PEGs)。本文是一个关于LPeg库的参考手册。关于更详细的文档，请看see A Text Pattern-Matching Tool based on Parsing Expression Grammars.，这里有关于实现的更详细的讨论。

根据 Snobol的传统，LPeg定义patterns作为第一级别对象，也就是说 patterns 可以作为常规的lua变量(represented by userdata)。这个库提供了多种方式来创建和组合patterns。通过使用元方法，个别的一些函数可以提供类似中缀运算符或前缀运算符。一方面，相对于一般的正则表达式，LPeg匹配的结果通常更为详细。另一方面，第一级别的patterns可以更好的描写和扩展正则关系，我们可以定义函数来创建和组合patterns。

Operator	Description
`lpeg.P(string)`	匹配字符串
`lpeg.P(n)`	匹配n个字符串
`lpeg.S(string)`	匹配字符串中任意一个字符 (Set)
`lpeg.R("xy")`	匹配x和y之间的任意一个字符(Range)
`patt^n`	`匹配至少n个patt`
`patt^-n`	`匹配最多n个patt`
`patt1 * patt2`	先匹配`patt1` 然后接着匹配 `patt2`
`patt1 + patt2`	匹配满足`patt1` 或者满足`patt2` (二选一)
`patt1 - patt2`	匹配满足patt1而且不满足patt2
`-patt`	和 `("" - patt)一样`
`#patt`	Matches `patt` but consumes no input
`lpeg.B(patt)`	Matches `patt` behind the current position, consuming no input

举一个很简单的例子， lpeg.R("09")^1创建了一个pattern，这个pattern的作用是匹配一个非空的数字序列。再举一个稍微复杂一点的例子，-lpeg.P(1)匹配一个不能有任何字符的空字符串，这个通常用在匹配规则的最后。

Functions

lpeg.match (pattern, subject [, init])

匹配函数。它试图通过一个给定的pattern来对目标字符串进行匹配。如果匹配成功，则返回匹配成功子串的第一个字符的位置，或者返回捕获的值（如果成功捕获到值的话）。

一个可选的数字参数 init 作为匹配目标字符串的起始位置。和通常的Lua库一样，如果参数是一个负数，则从目标字符串的最后一个字符开始向前计算，得到起始位置。

和典型的匹配函数不同， match 仅仅在一个固定的模式下工作；也就是说，它试着从目标字符串的前缀字符开始匹配，而不是匹配任意的子串。.所以，如果我们想匹配一个任意位置的子串，就必须用Lua写一个循环来把目标字符串的每一个位置作为起始位置匹配，或者写一个pattern来匹配任意字符。两种方法对比来说，第二种非常方便、快捷和高效，可以以看看下面的例子。

lpeg.type (value)

如果value是一个pattern，则返回一个字符串 "pattern".，否则返回nil。

lpeg.version ()

返回LPeg的字符串版本号。

lpeg.setmaxstack (max)

设置堆栈的上限，默认是400。

Basic Constructions

lpeg.P (value)

用下面的规则将一个给定的值转换成一个合适的pattern：

如果参数是一个pattern，则返回参数pattern。
如果参数是一个string，则返回匹配这个字符串的pattern。
如果参数是一个非负整数 n, 则返回一个匹配正好是n个字符的字符串的pattern。
如果参数是一个负整数 -n, 则只有在输入的字符串还剩下不到n个字符才会成。 lpeg.P(-n) 等同于 -lpeg.P(n) (see the unary minus operation).
如果参数是一个 boolean, the result is a pattern that always succeeds or always fails (according to the boolean value), without consuming any input.
如果参数是一个table, 则被解读为一个grammar (see Grammars)。
如果参数是一个function, 则返回一个pattern，等价于一个 match-time capture 用一个空字符串匹配.

lpeg.B(patt)

Returns a pattern that matches only if the input string at the current position is preceded by patt. Pattern patt must match only strings with some fixed length, and it cannot contain captures.

Like the and predicate, this pattern never consumes any input, independently of success or failure.

lpeg.R ({range})

返回一个在给定的范围内任何一个字符。范围是一个长度为2的字符串xy，返回的所有字符都是x和y对应ASCII编码之间（包括x和y）。

举个例子， pattern lpeg.R("09") 匹配所有的数字，lpeg.R("az", "AZ") 匹配所有的ASCII字母。

lpeg.S (string)

返回一个pattern匹配一个字符，这个字符是给定的string中的任何一个字符。 (The S stands for Set.)

举个例子， pattern lpeg.S("+-*/") 匹配任何一个算术运算符。

注意，如果s是一个字符，那么 lpeg.P(s) 等价于 lpeg.S(s)。

lpeg.V (v)

This operation creates a non-terminal (a variable) for a grammar. The created non-terminal refers to the rule indexed by v in the enclosing grammar. (See Grammars for details.)

lpeg.locale ([table])

Returns a table with patterns for matching some character classes according to the current locale. The table has fields named alnum, alpha, cntrl, digit, graph, lower, print, punct, space, upper, and xdigit, each one containing a correspondent pattern. Each pattern matches any single character that belongs to its class.

If called with an argument table, then it creates those fields inside the given table and returns that table.

#patt

Returns a pattern that matches only if the input string matches patt, but without consuming any input, independently of success or failure. (This pattern is called an and predicate and it is equivalent to &patt in the original PEG notation.)

This pattern never produces any capture.

-patt

返回一个pattern，这个pattern要求输入的字符串不匹配patt。它不消耗任何的输入，只是成功或者失败。 (This pattern is equivalent to !patt in the original PEG notation.)

举个例子，pattern -lpeg.P(1) 匹配字符串的末尾。

这个pattern 从来不产生任何捕获，因为不是 patt失败就是 -patt 失败。 (一个失败的 pattern 从来不产生任何捕获 )

patt1 + patt2

返回一个符合 patt1 或者 patt2的pattern。

如果 patt1 和 patt2 都是字符集合, 则得到的结果是两个的并集。

lower = lpeg.R("az")

upper = lpeg.R("AZ")

letter = lower + upper

patt1 - patt2

相当于 !patt2 patt1。这个pattern 意思是不匹配 patt2 且匹配 patt1。

如果成功了，则最后捕获到的是patt1的内容。这个pattern不会从patt2中捕获任何信息 (as either patt2 fails or patt1 - patt2 fails).

如果 patt1 和 patt2 都是字符集合，那么这个运算就相当于集合差。注意 -patt等价于 "" - patt (or 0 - patt). 如果 patt 是一个字符集合， 1 - patt是它的补集。

patt1 * patt2

返回一个pattern，这个pattern先匹配patt1，patt1匹配完成之后，从匹配完成的下一个字符开始匹配patt2。 The identity element for this operation is the pattern lpeg.P(true), which always succeeds.

(LPeg uses the * operator [instead of the more obvious ..] both because it has the right priority and because in formal languages it is common to use a dot for denoting concatenation.)

patt^n

如果 n 是一个非负数，这个pattern等价于 pattn patt*。它匹配的条件是至少n个 patt。

另外，如果n 是负数，这个 pattern 等价于 (patt?)-n: 它匹配的条件是最多 |n| 个 patt。

在个别情况下，在原始的 PEG 中，patt^0 等价于 patt*, patt^1 等价于 patt+， patt^-1 等价于 patt?。

在所有的情况下， the resulting pattern is greedy with no backtracking (also called a possessive repetition).注意，patt^n只会匹配最长的序列。

Grammar

在lua的环境下，可以自定义一些patterns，让新定义的pattern可以使用已经定义过的旧的pattern，然而，这些技巧不允许定义循环的patterns。 For recursive patterns, we need real grammars.

LPeg通过使用table来定义gramar， table的每个条目是一条规则。

Captures

capture 是一个pattern匹配成功之后捕获的值。 LPeg提供多种捕获方式，基于pattern的匹配和组合来产生不同的捕获值。

下面是捕获的基本概述：

Operation	What it Produces
`lpeg.C(patt)`	所有pattern捕获的子串
`lpeg.Carg(n)`	the value of the n^th extra argument to `lpeg.match` (matches the empty string)
`lpeg.Cb(name)`	the values produced by the previous group capture named `name` (matches the empty string)
`lpeg.Cc(values)`	the given values (matches the empty string)
`lpeg.Cf(patt, func)`	捕获的结果将作为参数依次被func调用
`lpeg.Cg(patt [, name])`	把patt所有的返回值作为一个返回值并指定一个名字
`lpeg.Cp()`	捕获的位置
`lpeg.Cs(patt)`	创建一个替代捕获
`lpeg.Ct(patt)`	把patt中所有的返回值按照父子关系放到一个数组里返回
`patt / string`	`string`, with some marks replaced by captures of `patt`
`patt / number`	the n-th value captured by `patt`, or no value when `number` is zero.
`patt / table`	`table[c]`, where `c` is the (first) capture of `patt`
`patt / function`	the returns of `function` applied to the captures of `patt`
`lpeg.Cmt(patt, function)`	the returns of `function` applied to the captures of `patt`; the application is done at match time

lpeg.C (patt)

返回匹配到的子字符串以及patt内部子patt的返回值。

lpeg.Carg (n)

Creates an argument capture. This pattern matches the empty string and produces the value given as the nth extra argument given in the call to lpeg.match.

lpeg.Cb (name)

Creates a back capture. This pattern matches the empty string and produces the values produced by the most recent group capture named name (where name can be any Lua value).

Most recent means the last complete outermost group capture with the given name. A Complete capture means that the entire pattern corresponding to the capture has matched. An Outermost capture means that the capture is not inside another complete capture.

lpeg.Cc ([value, ...])

Creates a constant capture. This pattern matches the empty string and produces all given values as its captured values.

lpeg.Cf (patt, func)

创建一个折叠的捕获，假设patt有n个返回值,C1,C2,C3,那么Cf返回 f(f( f(C1),C2), C3)。

举个例子，一个用逗号隔开的数字序列，计算出数字串中每个数字相加的结果：

-- matches a numeral and captures its numerical value

number = lpeg.R""^ / tonumber

-- matches a list of numbers, capturing their values

list = number * ("," * number)^

-- auxiliary function to add two numbers

function add (acc, newvalue) return acc + newvalue end

-- folds the list of numbers adding them

sum = lpeg.Cf(list, add)

-- example of use

print(sum:match("10,30,43"))  --> 83

lpeg.Cg (patt [, name])

创建一个捕获的集合，这组返回的所有值型成一个捕获。集合可能是匿名(如果没有名字)或命名的(可以是任何非nil值Lua值)。

lpeg.Cp ()

Creates a position capture. It matches the empty string and captures the position in the subject where the match occurs. The captured value is a number.

lpeg.Cs (patt)

Creates a substitution capture, which captures the substring of the subject that matches patt, with substitutions. For any capture inside patt with a value, the substring that matched the capture is replaced by the capture value (which should be a string). The final captured value is the string resulting from all replacements.

lpeg.Ct (patt)

创建一个捕获的数组。创建一个表捕获;这个捕获将创建一个表,将匿名的捕获保存到表中,索引从1开始.对于命名组捕获,以组名为key。

注：下面的内容由于开源中国已经翻译完成故不再翻译http://www.oschina.net/translate/lpeg-syntax

【翻译】LPeg编程指南的更多相关文章

Spark编程指南V1.4.0(翻译)
Spark编程指南V1.4.0 · 简单介绍 · 接入Spark · Spark初始化 · 使用Shell · 在集群上部署代码 ...
iOS ---Extension编程指南
当iOS 8.0和OS X v10.10发布后,一个全新的概念出现在我们眼前,那就是应用扩展.顾名思义,应用扩展允许开发者扩展应用的自定义功能和内容,能够让用户在使用其他app时使用该项功能.你可以开 ...
KVC/KVO原理详解及编程指南
一.简介 1.KVC简介 2.KVO简介二.KVC相关技术 1.Key和Key Path 2.点语法和KVC 3.一对多关系(To-Many)中的集合访问器方法 4.键值验证(Key-Value V ...
Core Animation编程指南
本文是<Core Animation Programming Guide>2013-01-28更新版本的译文.本文略去了原文中关于OS X平台上Core Animation相关内容.因为原 ...
App Extension编程指南（iOS8/OS X v10.10）中文版
http://www.cocoachina.com/ios/20141023/10027.html 当iOS 8.0和OS X v10.10发布后,一个全新的概念出现在我们眼前,那就是应用扩展.顾名思 ...
【转】 KVC/KVO原理详解及编程指南
原文地址:http://blog.csdn.net/wzzvictory/article/details/9674431 前言: 1.本文基本不讲KVC/KVO的用法,只结合网上的资料说说对这种技术的 ...
iOS多线程编程指南（一）关于多线程编程（转）
原文:http://www.dreamingwish.com/article/ios-multi-threaded-programming-a-multi-threaded-programming.h ...
高级Bash脚本编程指南(27)：文本处理命令（三）
高级Bash脚本编程指南(27):文本处理命令(三) 成于坚持,败于止步处理文本和文本文件的命令 tr 字符转换过滤器. 必须使用引用或中括号, 这样做才是合理的. 引用可以阻止shell重新解释出 ...
转：KVC/KVO原理详解及编程指南
作者:wangzz 原文地址:http://blog.csdn.net/wzzvictory/article/details/9674431 转载请注明出处如果觉得文章对你有所帮助,请通过留言或 ...

随机推荐

linux 下 tomcat 之配置静态资源路径
1.找到配置文件找到tomcat\conf\server.xml 2.找到Host 3. 添加 Context <Host name="localhost" appBase ...
Windows API 函数浏览
AbortDoc 终止一项打印作业是是 ...
#图# #最大生成树# #kruskal# ----- OpenJudge 799:Heavy Transportation
OpenJudge 799:Heavy Transportation 总时间限制: 3000ms 内存限制: 65536kB 描述BackgroundHugo Heavy is happy. Afte ...
搭建Minisipserve服务器实现局域网内IOS客户端idoubs的通信
idoubs是IOS设备开发的第一款全功能并开放源码的3GPP IMS客户端,它同时专为IOS平台开发设计的voIP测试版客户端,以doubango为框架,能实现当前最先进的多媒体功能,主要功能有:语 ...
P2P直播承载平台与CDN直播承载平台比较
收看软件不一样:CDN直播收看无需安装第三方收看软件,一般操作系统已带播放器软件:P2P直播收看需要安装厂家自己的播放器软件,每家P2P的软件不兼容,收看者要装多套软件才能收看不同内容. 收看人数不一 ...
Spring生命周期各种接口使用
1,BeanPostProcessor接口:不能在POJO上面使用,需要单独找一个类进行使用:如果在POJO上面实现了此接口,在实现了其他*Aware接口之后,这个接口方法将不会被调用:2, POJO ...
repeater控件自定义Url分页带参数
repeater控件的效果图如下: 该页面实现的功能如下: 1.上下分页,(也可以带首页和末页,我只是禁掉了没用) 2.根据用户输入的指定分页索引进行跳转 3.根据筛选数据的参数进行URL分页的参数传 ...
HTTPS=HTTP + SSL / TLS
以下的两个链接作为本次编辑的参考 https://www.bennythink.com/school-1.htmlhttps://www.bennythink.com/school-2.html 应一 ...
java Runtime类
public class Test { public static void main(String[] args) throws UnsupportedEncodingException { Run ...
Charlse 使用小记
抓包神器Fiddler 是基于微软的 .Net 技术开发的,没办法直接在 Mac/Linux 下使用,而Charlse是Mac下常用的网络封包截取工具.是一个HTTP代理服务器,HTTP监视器,反转代 ...

【翻译】LPeg编程指南

【翻译】LPeg编程指南的更多相关文章

随机推荐

热门专题