Extended Backus–Naur Form
From Wikipedia, the free encyclopedia
In computer science, Extended Backus–Naur Form (EBNF) is a family of metasyntax notations, any of which can be used to express a context-free grammar. EBNF is used to make a formal description of a formal language which can be a computer programming language. They are extensions of the basic Backus–Naur Form (BNF) metasyntax notation.
The earliest EBNF was originally developed by Niklaus Wirth incorporating some of the concepts (with a different syntax and notation) from Wirth syntax notation. However, many variants of EBNF are in use. The International Organization for Standardization has adopted an EBNF standard (ISO/IEC 14977). This article uses EBNF as specified by the ISO for examples applying to all EBNFs. Other EBNF variants use somewhat different syntactic conventions.
Contents
[hide]
- 1 Basics
- 2 Advantages over BNF
- 3 Conventions
- 4 Extensibility
- 5 Related work
- 6 See also
- 7 References
- 8 External links
Basics[edit]
EBNF is a code that expresses the grammar of a formal language. An EBNF consists of terminal symbols and non-terminal production rules which are the restrictions governing how terminal symbols can be combined into a legal sequence. Examples of terminal symbols include alphanumeric characters, punctuation marks, and white space characters.
The EBNF defines production rules where sequences of symbols are respectively assigned to a nonterminal:
digit excluding zero = "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
digit = "0" | digit excluding zero ;
This production rule defines the nonterminal digit which is on the left side of the assignment. The vertical bar represents an alternative and the terminal symbols are enclosed with quotation marks followed by a semicolon as terminating character. Hence a Digit is a 0 or a digit excluding zero that can be 1 or 2 or3 and so forth until 9.
A production rule can also include a sequence of terminals or nonterminals, each separated by a comma:
twelve = "1", "2" ;
two hundred one = "2", "0", "1" ;
three hundred twelve = "3", twelve ;
twelve thousand two hundred one = twelve, two hundred one ;
Expressions that may be omitted or repeated can be represented through curly braces { ... }:
natural number = digit excluding zero, { digit } ;
In this case, the strings 1, 2, ...,10,...,12345,... are correct expressions. To represent this, everything that is set within the curly braces may be repeated arbitrarily often, including not at all.
An option can be represented through squared brackets [ ... ]. That is, everything that is set within the square brackets may be present just once, or not at all:
integer = "0" | [ "-" ], natural number ;
Therefore an integer is a zero (0) or a natural number that may be preceded by an optional minus sign.
EBNF also provides, among other things, the syntax to describe repetitions of a specified number of times, to exclude some part of a production, or to insert comments in an EBNF grammar.
Table of symbols[edit]
The following represents a proposed ISO/IEC 14977 standard, by R. S. Scowen, page 7, table 1.
Usage
Notation
definition
=
termination
;
termination
. [1]
option
[ ... ]
repetition
{ ... }
grouping
( ... )
terminal string
" ... "
terminal string
' ... '
comment
(* ... *)
special sequence
? ... ?
exception
-
Examples[edit]
Even EBNF can be described using EBNF. Consider the sketched grammar below:
letter = "A" | "B" | "C" | "D" | "E" | "F" | "G"
| "H" | "I" | "J" | "K" | "L" | "M" | "N"
| "O" | "P" | "Q" | "R" | "S" | "T" | "U"
| "V" | "W" | "X" | "Y" | "Z" ;
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
symbol = "[" | "]" | "{" | "}" | "(" | ")" | "<" | ">"
| "'" | '"' | "=" | "|" | "." | "," | ";" ;
character = letter | digit | symbol | "_" ; identifier = letter , { letter | digit | "_" } ;
terminal = "'" , character , { character } , "'"
| '"' , character , { character } , '"' ; lhs = identifier ;
rhs = identifier
| terminal
| "[" , rhs , "]"
| "{" , rhs , "}"
| "(" , rhs , ")"
| rhs , "|" , rhs
| rhs , "," , rhs ; rule = lhs , "=" , rhs , ";" ;
grammar = { rule } ;
A Pascal-like programming language that allows only assignments can be defined in EBNF as follows:
(* a simple program syntax in EBNF − Wikipedia *)
program = 'PROGRAM', white space, identifier, white space,
'BEGIN', white space,
{ assignment, ";", white space },
'END.' ;
identifier = alphabetic character, { alphabetic character | digit } ;
number = [ "-" ], digit, { digit } ;
string = '"' , { all characters - '"' }, '"' ;
assignment = identifier , ":=" , ( number | identifier | string ) ;
alphabetic character = "A" | "B" | "C" | "D" | "E" | "F" | "G"
| "H" | "I" | "J" | "K" | "L" | "M" | "N"
| "O" | "P" | "Q" | "R" | "S" | "T" | "U"
| "V" | "W" | "X" | "Y" | "Z" ;
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
white space = ? white space characters ? ;
all characters = ? all visible characters ? ;
A syntactically correct program then would be:
PROGRAM DEMO1
BEGIN
A:=3;
B:=45;
H:=-100023;
C:=A;
D123:=B34A;
BABOON:=GIRAFFE;
TEXT:="Hello world!";
END.
The language can easily be extended with control flows, arithmetical expressions, and Input/Output instructions. Then a small, usable programming language would be developed.
Advantages over BNF[edit]
Any grammar defined in EBNF can also be represented in BNF, though representations in the latter are generally lengthier. E.g., options and repetitions cannot be directly expressed in BNF and require the use of an intermediate rule or alternative production defined to be either nothing or the optional production for option, or either the repeated production of itself, recursively, for repetition. The same constructs can still be used in EBNF.
The BNF uses the symbols (<, >, |, ::=) for itself, but does not include quotes around terminal strings. This prevents these characters from being used in the languages, and requires a special symbol for the empty string. In EBNF, terminals are strictly enclosed within quotation marks ("..." or '...'). The angle brackets ("<...>") for nonterminals can be omitted.
BNF syntax can only represent a rule in one line, whereas in EBNF a terminating character, the semicolon, marks the end of a rule.
Furthermore, EBNF includes mechanisms for enhancements, defining the number of repetitions, excluding alternatives, comments, etc.
Conventions[edit]
1. The following conventions are used:
- Each meta-identifier of Extended BNF is written as one or more words joined together by hyphens
- A meta-identifier ending with -symbol is the name of a terminal symbol of Extended BNF.
2. The normal character representing each operator of Extended BNF and its implied precedence is (highest precedence at the top):
* repetition-symbol
- except-symbol
, concatenate-symbol
| definition-separator-symbol
= defining-symbol
; terminator-symbol
. terminator-symbol
3. The normal precedence is overridden by the following bracket pairs:
' first-quote-symbol first-quote-symbol '
" second-quote-symbol second-quote-symbol "
(* start-comment-symbol end-comment-symbol *)
( start-group-symbol end-group-symbol )
[ start-option-symbol end-option-symbol ]
{ start-repeat-symbol end-repeat-symbol }
? special-sequence-symbol special-sequence-symbol ?
The first-quote-symbol is the apostrophe as defined by ISO/IEC 646:1991, that is to say Unicode U+0027 ('); the font used in ISO/IEC 14977:1996(E) renders it very much like the acute, Unicode U+00B4 (´), so confusion sometimes arises. However, the ISO Extended BNF standard invokes ISO/IEC 646:1991, "ISO 7-bit coded character set for information interchange", as a normative reference and makes no mention of any other character sets, so formally, there is no confusion with Unicode characters outside the 7-bit ASCII range.
As examples, the following syntax rules illustrate the facilities for expressing repetition:
aa = "A";
bb = 3 * aa, "B";
cc = 3 * [aa], "C";
dd = {aa}, "D";
ee = aa, {aa}, "E";
ff = 3 * aa, 3 * [aa], "F";
gg = {3 * aa}, "G";
Terminal strings defined by these rules are as follows:
aa: A
bb: AAAB
cc: C AC AAC AAAC
dd: D AD AAD AAAD AAAAD etc.
ee: AE AAE AAAE AAAAE AAAAAE etc.
ff: AAAF AAAAF AAAAAF AAAAAAF
gg: G AAAG AAAAAAG etc.
Extensibility[edit]
According to the ISO 14977 standard EBNF is meant to be extensible, and two facilities are mentioned. The first is part of EBNF grammar, the special sequence, which is arbitrary text enclosed with question marks. The interpretation of the text inside a special sequence is beyond the scope of the EBNF standard. For example, the space character could be defined by the following rule:
space = ? US-ASCII character 32 ?;
The second facility for extension is using the fact that parentheses cannot in EBNF be placed next to identifiers (they must be concatenated with them). The following is valid EBNF:
something = foo, ( bar );
The following is not valid EBNF:
something = foo ( bar );
Therefore, an extension of EBNF could use that notation. For example, in a Lisp grammar, function application could be defined by the following rule:
function application = list( symbol, { expression } );
Related work[edit]
- The W3C used a different EBNF to specify the XML syntax.
- The British Standards Institution published a standard for an EBNF: BS 6154 in 1981.
- The IETF uses Augmented BNF (ABNF), specified in RFC 5234.
See also[edit]
- Regular expression
- Spirit Parser Framework
- Phrase structure rules, the direct equivalent of EBNF in natural languages.
References[edit]
- Niklaus Wirth: What can we do about the unnecessary diversity of notation for syntactic definitions? CACM, Vol. 20, Issue 11, November 1977, pp. 822–823.
- Roger S. Scowen: Extended BNF — A generic base standard. Software Engineering Standards Symposium 1993.
- The International standard (ISO 14977) that defines the EBNF is now freely available as Zip-compressed PDF file.
- Jump up^ See page 8 of http://www.cl.cam.ac.uk/~mgk25/iso-14977.pdf
External links[edit]
- Article "EBNF: A Notation to Describe Syntax (PDF)" by Richard E. Pattis describing the functions and syntax of EBNF
- Article "BNF and EBNF: What are they and how do they work?" by Lars Marius Garshol
- Article "The Naming of Parts" by John E. Simpson
- ISO/IEC 14977 : 1996(E)
- RFC 4234 – Augmented BNF for Syntax Specifications: ABNF (obsoleted by RFC 5234 — Augmented BNF for Syntax Specifications: ABNF)
- BNF/EBNF variants – a table by Pete Jinks comparing several syntaxes.
- Create syntax diagrams from EBNF
- EBNF Parser & Renderer
- Parser & Renderer in PHP5
- Writing Parsers with PetitParser (in Smalltalk)
This article is based on material taken from the Free On-line Dictionary of Computing prior to 1 November 2008 and incorporated under the "relicensing" terms of the GFDL, version 1.3 or later.
Extended Backus–Naur Form的更多相关文章
- boost之词法解析器spirit
摘要:解析器就是编译原理中的语言的词法分析器,可以按照文法规则提取字符或者单词.功能:接受扫描器的输入,并根据语法规则对输入流进行匹配,匹配成功后执行语义动作,进行输入数据的处理. C++ 程序员需要 ...
- The DOT Language
CSDN新首页上线啦,邀请你来立即体验! 立即体验 博客 学院 下载 更多 登录注册 The DOT Language 翻译 2014年04月15日 11:27:07 标签: EBNF / 语言 / ...
- 如何编译和调试Python内核源码?
目录 写在前面 获取源代码 源代码的组织 windows下编译CPython 调试CPython 小结 参考 博客:blog.shinelee.me | 博客园 | CSDN 写在前面 如果对Pyth ...
- C#中反射接受的字符串需要满足的Backus-Naur Form语法
MSDN的Specifying Fully Qualified Type Names指明了C#中反射接受的字符串需要满足如下的Backus-Naur Form语法. BNF grammar of fu ...
- Yacc 与 Lex 快速入门
Yacc 与 Lex 快速入门 Lex 与 Yacc 介绍 Lex 和 Yacc 是 UNIX 两个非常重要的.功能强大的工具.事实上,如果你熟练掌握 Lex 和 Yacc 的话,它们的强大功能使创建 ...
- 正则语言引擎:一个简单LEX和YACC结合运用的实例
本文先描述了LEX与YACC的书写方法.然后利用LEX与YACC编写了一个简单正则语言的引擎(暂时不支持闭包与或运算),生成的中间语言为C语言. 正则引擎应直接生成NFA或DFA模拟器的输入文件,但在 ...
- Yacc 与 Lex 快速入门(词法分析和语法分析)
我们知道,高级语言,一般的如c,Java等是不能直接运行的,它们需要经过编译成机器认识的语言.即编译器的工作. 编译器工作流程:词法分析.语法分析.语义分析.IR(中间代码,intermediate ...
- 软件理论基础—— 第一章命题逻辑系统L
逻辑 语法 语义 推理系统 公理 推理规则 MP A,A->B =>B HS A->B,B->C => A->C 命题逻辑公式 ::= BNF backus ...
- Formal Grammars of English -10 chapter(Speech and Language Processing)
determiner 限定词 DET propernoun 专有名词 NP (or noun phrase) mass noun 不可数名词 Det Nouns 限定词名词 relative pro ...
随机推荐
- vconsole h5应用ajax请求抓包
<!DOCTYPE html> <html> <head> <meta charset="utf-8" /> <meta co ...
- 6.requests编写企查查爬虫
(为编写完善能拿下来数据) 企查查代码数据如下: #encoding:utf-8 import requests from lxml import etree import random import ...
- 2. select下拉框获取选中的值
1.获取select选中的value值: $("#select1ID").find("option:selected").val(); --select1ID ...
- C# ADO.NET 封装的增删改查
using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.T ...
- 列表(ul ol dl)
Title 1 2 3 1 2 3 a 1 2 b 1 2 <!DOCTYPE html> <html lang="en"> <head> &l ...
- 使用innodb_force_recovery解决MySQL崩溃无法重启问题
因为日志已经损坏,这里采用非常规手段,首先修改innodb_force_recovery参数,使mysqld跳过恢复步骤,将mysqld 启动,将数据导出来然后重建数据库.innodb_force_r ...
- LeetCode关于数组中求和的题型
15. 3Sum:三数之和为0的组合 Given an array S of n integers, are there elements a, b, c in S such that a + b + ...
- spring ioc xml配置
一个完整的spring xml配置:是把action,service,dao以及其它的资源性配置(如basedao)和公共性配置(如连接数据库)配置在resource.xml中,这样就有四个xml配置 ...
- 实现Action的三种方式
实现Action的三种方式: 1.普通类 一般采用此种方法 2.实现Action接口 3.继承ActionSupport类
- conductor FAQ
在一段时间后(如1小时,1天等),您如何安排将任务放入队列中? 轮询任务后,更新任务的状态IN_PROGRESS并将其callbackAfterSeconds设置为所需的时间.任务将保留在队列中,直到 ...