4.8 Using Ambiguous Grammars
4.8 Using Ambiguous Grammars
It is a fact that every ambiguous grammar fails to be LR and thus is not in any of the classes of grammars discussed in the previous two sections. However, certain types of ambiguous grammars are quite useful in the specification and implementation of languages. For language constructs like expressions, an ambiguous grammar provides a shorter, more natural specification than any equivalent unambiguous grammar. Another use of ambiguous grammars is in isolating commonly occurring syntactic constructs for special-case optimization. With an ambiguous grammar, we can specify the special-case constructs by carefully adding new productions to the grammar.
Although the grammars we use are ambiguous, in all cases we specify disambiguating rules that allow only one parse tree for each sentence. In this way, the overall language specification becomes unambiguous, and sometimes it becomes possible to design an LR parser that follows the same ambiguity-resolving choices. We stress that ambiguous constructs should be used sparingly and in a strictly controlled fashion, otherwise, there can be no guarantee as to what language is recognized by a parser.
4.8.1 Precedence and Associativity to Resolve Conflicts
Consider the ambiguous grammar (4.3) for expressions with operators + and *, repeated here for convenience:
E→E + E | E * E | (E) | id
This grammar is ambiguous because it does not specify the associativity or precedence of the operators + and *. The unambiguous grammar (4.1), which includes productions E→E + T and T→T * F, generates the same language, but gives + lower precedence than *, and makes both operators left associative.
There are two reasons why we might prefer to use the ambiguous grammar. First, as we shall see, we can easily change the associativity and precedence of the operators + and * without disturbing the productions of (4.3) or the number of states in the resulting parser. Second, the parser for the unambiguous grammar will spend a substantial fraction of its time reducing by the productions E→T and T→F, whose sole function is to enforce associativity and precedence. The parser for the ambiguous grammar (4.3) will not waste time reducing by these single productions (productions whose body consists of a single nonterminal).
The sets of LR(0) items for the ambiguous expression grammar (4.3) augmented by E 0→E are shown in Fig. 4.48. Since grammar (4.3) is ambiguous, there will be parsing-action conflicts when we try to produce an LR parsing table from the sets of items. The states corresponding to sets of items I7 and I8 generate these conflicts. Suppose we use the SLR approach to constructing the parsing action table. The conflict generated by I7 between reduction by E→E + E and shift on + or * cannot be resolved, because + and * are each in FOLLOW (E). Thus both actions would be called for on inputs + and *. A similar conflict is generated by I8, between reduction by E→E * E and shift on inputs + and *. In fact, each of our LR parsing table-construction methods will generate these conflicts.
However, these problems can be resolved using the precedence and associativity information for + and *. Consider the input id + id * id, which causes a parser based on Fig. 4.48 to enter state 7 after processing id + id, in particular the parser reaches a configuration
|
PREFIX |
STACK |
INPUT |
E + E 0 1 4 7 * id $
For convenience, the symbols corresponding to the states 1, 4, and 7 are also shown under PREFIX.
If * takes precedence over +, we know the parser should shift * onto the stack, preparing to reduce the * and its surrounding id symbols to an expression.
This choice was made by the SLR parser of Fig. 4.37, based on an unambiguous grammar for the same language. On the other hand, if + takes precedence over *, we know the parser should reduce E + E to E. Thus the relative precedence
I0: E 0→*E I5: E→E **E
E→*E + E E→*E + E
E→*E * E E→*E * E
E→*(E) E→*(E)
E→*id E→*id
I1: E’→E * I6: E→(E*)
E→E * + E E→E * + E
E→E * * E E→E ** E
I2: E→(*E) I7: E→E + E *
E→*E + E E→E * + E
E→*E * E E→E ** E
E→*(E)
E→*id I8: E→E * E *
E→E * + E
I3: E→id* E→E ** E
I4: E→E + *E I9: E→(E)*
E→*E + E
E→*E * E
E→*(E)
E→*id
Figure 4.48: Sets of LR(0) items for an augmented expression grammar of + followed by * uniquely determines how the parsing action conflict between reducing E→E + E and shifting on * in state 7 should be resolved.
If the input had been id + id + id instead, the parser would still reach a configuration in which it had stack 0 1 4 7 after processing input id + id. On input + there is again a shift/reduce conflict in state 7. Now, however, the associativity of the + operator determines how this conflict should be resolved.
If + is left associative, the correct action is to reduce by E→E + E. That is, the id symbols surrounding the first + must be grouped first. Again this choice coincides with what the SLR parser for the unambiguous grammar would do.
In summary, assuming + is left associative, the action of state 7 on input + should be to reduce by E→E + E, and assuming that * takes precedence over +, the action of state 7 on input * should be to shift. Similarly, assuming that * is left associative and takes precedence over +, we can argue that state 8, which can app ear on top of the stack only when E * E are the top three grammar symbols, should have the action reduce E→E * E on both + and * inputs. In the case of input +, the reason is that * takes precedence over +, while in the case of input *, the rationale is that * is left associative. Proceeding in this way, we obtain the LR parsing table shown in Fig. 4.49. Productions 1 through 4 are E→E + E, E→E *, E→(E), and E → id, respectively. It is interesting that a similar parsing action table would be produced by eliminating the reductions by the single productions E→T and T→F from the SLR table for the unambiguous expression grammar (4.1) shown in Fig. 4.37. Ambiguous grammars like the one for expressions can be handled in a similar way in the context of LALR and canonical LR parsing.
STATE
ACTION GOTO
id + * ( ) $ E
0 s3 s2 1
1 s4 s5 acc
2 s3 s2 6
3 r4 r4 r4 r4
4 s3 s2 7
5 s3 s2 8
6 s4 s5 s9
7 r1 s5 r1 r1
8 r2 r2 r2 r2
9 r3 r3 r3 r3
Figure 4.49: Parsing table for grammar (4.3)
4.8.2 The “Dangling-Else” Ambiguity
Consider again the following grammar for conditional statements:
stmt→if expr then stmt else stmt
| if expr then stmt
| other
As we noted in Section 4.3.2, this grammar is ambiguous because it does not resolve the dangling-else ambiguity. To simplify the discussion, let us consider an abstraction of this grammar, where i stands for if expr then, e stands for else, and a stands for “all other productions.” We can then write the grammar, with augmenting production S’→S, as S’→S
S→i S e S | i S | a
(4.67)
The sets of LR(0) items for grammar (4.67) are shown in Fig. 4.50. The ambiguity in (4.67) gives rise to a shift/reduce conflict in I4. There, S→iS eS calls for a shift of e and, since FOLLOW (S) = {e, $}, item S→iS calls for reduction
by S→iS on input e.
Translating back to the if-then-else terminology, given
I0: S’→S
S→iS eS
S→iS
S→a
I1: S’→S
I2: S→iS eS
S→iS
S→iS eS
S→iS
S→a
I3: S→a
I4: S→iS eS
I5: S→iS eS
S→iS eS
S→iS
S→a
I6: S→iS eS
Figure 4.50: LR(0) states for augmented grammar (4.67)
if expr then stmt
on the stack and else as the first input symbol, should we shift else onto the stack (i.e., shift e) or reduce if expr then stmt (i.e., reduce by S→iS )? The answer is that we should shift else, because it is “associated” with the previous then. In the terminology of grammar (4.67), the e on the input, standing for else, can only form part of the body beginning with the iS now on the top of the stack. If what follows e on the input cannot be parsed as an S, completing body iS eS , then it can be shown that there is no other parse possible.
We conclude that the shift/reduce conflict in I 4 should be resolved in favor of shift on input e. The SLR parsing table constructed from the sets of items of Fig. 4.50, using this resolution of the parsing-action conflict in I4 on input e, is shown in Fig. 4.51. Productions 1 through 3 are S→iS eS, S→iS, and S→a, respectively.
STATE
ACTION GOTO
i e a $ S
0 s2 s3 1
1 acc
2 s2 s3 4
3 r3 r3
4 s5 r2
5 s2 s3 6
6 r1 r1
Figure 4.51: LR parsing table for the \dangling-else" grammar
For example, on input iiaea, the parser makes the moves shown in Fig. 4.52, corresponding to the correct resolution of the “dangling-else.” At line (5), state 4 selects the shift action on input e, whereas at line (9), state 4 calls for reduction by S→iS on input $.
STACK SYMBOLS INPUT ACTION
(1) 0 i i a e a $ shift
(2) 0 2 i i a e a $ shift
(3) 0 2 2 i i a e a $ shift
(4) 0 2 2 3 i i a e a $ shift
(5) 0 2 2 4 i i S e a $ reduce by S→a
(6) 0 2 2 4 5 i i S e a $ shift
(7) 0 2 2 4 5 3 i i S e a $ reduce by S→a
(8) 0 2 2 4 5 6 i i S e S $ reduce by S→iS eS
(9) 0 2 4 i S $ reduce by S→iS
(10) 0 1 S $ accept
Figure 4.52: Parsing actions on input iiaea
By way of comparison, if we are unable to use an ambiguous grammar to specify conditional statements, then we would have to use a bulkier grammar along the lines of Example 4.16.
4.8.3 Error Recovery in LR Parsing
An LR parser will detect an error when it consults the parsing action table and finds an error entry. Errors are never detected by consulting the goto table. An LR parser will announce an error as soon as there is no valid continuation for the portion of the input thus far scanned. A canonical LR parser will not make even a single reduction before announcing an error. SLR and LALR parsers may make several reductions before announcing an error, but they will never shift an erroneous input symbol onto the stack.
In LR parsing, we can implement panic-mode error recovery as follows. We scan down the stack until a states with a goto on a particular nonterminal A is found. Zero or more input symbols are then discarded until a symbol a is found that can legitimately follow A. The parser then stacks the state GOTO(s, A) and resumes normal parsing. There might be more than one choice for the nonterminal A. Normally these would be nonterminals representing major program pieces, such as an expression, statement, or block. For example, if A is the nonterminal stmt, a might be semicolon or g, which marks the end of a statement sequence.
This method of recovery attempts to eliminate the phrase containing the syntactic error. The parser determines that a string derivable from A contains an error. Part of that string has already been processed, and the result of this processing is a sequence of states on top of the stack. The remainder of the string is still in the input, and the parser attempts to skip over the remainder of this string by looking for a symbol on the input that can legitimately follow A. By removing states from the stack, skipping over the input, and pushing GOTO(s, A) on the stack, the parser pretends that it has found an instance of A and resumes normal parsing.
Phrase-level recovery is implemented by examining each error entry in the LR parsing table and deciding on the basis of language usage the most likely programmer error that would give rise to that error. An appropriate recovery procedure can then be constructed, presumably the top of the stack and/or first input symbols would be modified in a way deemed appropriate for each error entry.
In designing specific error-handling routines for an LR parser, we can ll in each blank entry in the action field with a pointer to an error routine that will take the appropriate action selected by the compiler designer. The actions may include insertion or deletion of symbols from the stack or the input or both, or alteration and transposition of input symbols. We must make our choices so that the LR parser will not get into an in finite loop. A safe strategy will assure that at least one input symbol will be removed or shifted eventually, or that the stack will eventually shrink if the end of the input has been reached.
Popping a stack state that covers a nonterminal should be avoided, because this modification eliminates from the stack a construct that has already been successfully parsed.
Example 4.68: Consider again the expression grammar
E→E + E | E * E | (E) | id
Figure 4.53 shows the LR parsing table from Fig. 4.49 for this grammar, modified for error detection and recovery. We have changed each state that calls for a particular reduction on some input symbols by replacing error entries in that state by the reduction. This change has the erect of postponing the error detection until one or more reductions are made, but the error will still be caught before any shift move takes place. The remaining blank entries from Fig. 4.49 have been replaced by calls to error routines.
The error routines are as follows.
e1: This routine is called from states 0, 2, 4 and 5, all of which expect the beginning of an operand, either an id or a left parenthesis. Instead, +, * or the end of the input was found.
push state 3 (the goto of states 0, 2, 4 and 5 on id),
issue diagnostic “missing operand.”
e2: Called from states 0, 1, 2, 4 and 5 on finding a right parenthesis.
remove the right parenthesis from the input,
issue diagnostic “unbalanced right parenthesis.”
STATE
ACTION GOTO
id + * ( ) $ E
0 s3 e1 e1 s2 e2 e1 1
1 e3 s4 s5 e3 e2 acc
2 s3 e1 e1 s2 e2 e1 6
3 r4 r4 r4 r4 r4 r4
4 s3 e1 e1 s2 e2 e1 7
5 s3 e1 e1 s2 e2 e1 8
6 e3 s4 s5 e3 s9 e4
7 r1 r1 s5 r1 r1 r1
8 r2 r2 r2 r2 r2 r2
9 r3 r3 r3 r3 r3 r3
Figure 4.53: LR parsing table with error routines
e3: Called from states 1 or 6 when expecting an operator, and an id or right parenthesis is found.
push state 4 (corresponding to symbol +) onto the stack, issue diagnostic “missing operator.”
e4: Called from state 6 when the end of the input is found.
push state 9 (for a right parenthesis) onto the stack, issue diagnostic “missing right parenthesis.”
On the erroneous input id + ), the sequence of configurations entered by the parser is shown in Fig. 4.54.
□
4.8 Using Ambiguous Grammars的更多相关文章
- 4.2 Context-Free Grammars
4.2 Context-Free Grammars Grammars were introduced in Section 2.2 to systematically describe the syn ...
- [Erlang 0105] Erlang Resources 小站 2013年1月~6月资讯合集
很多事情要做,一件一件来; Erlang Resources 小站 2013年1月~6月资讯合集,方便检索. 小站地址: http://site.douban.com/204209/ ...
- 4.9 Parser Generators
4.9 Parser Generators This section shows how a parser generator can be used to facilitate the constr ...
- Qt5.3编译错误——call of overloaded ‘max(int int)’is ambiguous
错误描述: 今天在使用Qt写一个C++函数模板的测试程序的时候,编译的时候,编译的时候出现如下错误: 错误描述为:在main函数中,进行函数max()重载时,出现(ambiguous)含糊的,不明确的 ...
- 解决ambiguous symbol命名空间中类名、变量名冲突的问题
最近在将一个复杂的工程集成到现有的项目中.编译时发现,有的变量名冲突了,提示就是xxxx ambiguous symbol,并且在编译输出时,指明了两个文件当中特定的变量名或者类名相同.出现这个编译错 ...
- column 'id' in field list is ambiguous
column 'id' in field list is ambiguous 这个错误,是因为你查询语句里面有id字段的时候,没有说明是哪个表的id字段,应该加上表名(或者别名)来区分.
- FbxDataType is ambiguous
??? 使用fbx自定义的类型的时候,比如 FbxIntDT 会有link error 根本原因是 FbxDataType is ambiguous solution: 把fbx的lib换成 libf ...
- 函数重载二义性:error C2668: 'pow' : ambiguous call to overloaded function
2013-07-08 14:42:45 当使用的函数时重载函数时,若编译器不能判断出是哪个函数,就会出现二义性,并给出报错信息. 问题描述: 在.cpp代码中用到pow函数,如下: long int ...
- Ambiguous mapping found. Cannot map 'xxxxController' bean method
1.背景 今天要做一个demo,从github上clone一个springmvc mybatis的工程(https://github.com/komamitsu/Spring-MVC-sample-u ...
随机推荐
- Python:webshell 跳板机审计服务器
1.修改paramiko源码包实现 https://github.com/paramiko/paramiko/tree/1.10.1 下载源码包 unzip paramiko-1.10.1.zip p ...
- 【转载】分布式系列文章——Paxos算法原理与推导
转载:http://linbingdong.com/2017/04/17/%E5%88%86%E5%B8%83%E5%BC%8F%E7%B3%BB%E5%88%97%E6%96%87%E7%AB%A0 ...
- 集训第六周 数学概念与方法 概率 数论 最大公约数 G题
Description There is a hill with n holes around. The holes are signed from 0 to n-1. A rabbit must h ...
- 集训第四周(高效算法设计)A题 Ultra-QuickSort
原题poj 2299:http://poj.org/problem?id=2299 题意,给你一个数组,去统计它们的逆序数,由于题目中说道数组最长可达五十万,那么O(n^2)的排序算法就不要再想了,归 ...
- codeforce 810B Summer sell-off (贪心 排序)
题意: 商店准备用n天售货(每天的货物都是一样的),第i天会卖ki件货物,并且会有li个顾客来买. 如果货物没卖完, 那么每个顾客一定会买一件. 如果货物有剩, 不会保存到第二天. 现在给定一个f, ...
- sql判断以逗号分隔的字符串中是否包含某个字符串--------MYSQL中利用select查询某字段中包含以逗号分隔的字符串的记录方法
sql判断以逗号分隔的字符串中是否包含某个字符串---------------https://blog.csdn.net/wttykj/article/details/78520933 MYSQL中利 ...
- 九度oj 题目1473:二进制数(stack)
题目1473:二进制数 时间限制:1 秒 内存限制:128 兆 特殊判题:否 提交:9371 解决:2631 题目描述: 大家都知道,数据在计算机里中存储是以二进制的形式存储的. 有一天,小明学了C语 ...
- Android BGABadgeView:BGABadgeFrameLayout(5)
Android BGABadgeView:BGABadgeFrameLayout(5) BGABadgeView除了有自己的线性布局,相对布局外(见附录文章7,8),还实现了FrameLayou ...
- Bzoj3038 上帝造题的七分钟2 并查集
Time Limit: 3 Sec Memory Limit: 128 MBSubmit: 1135 Solved: 509 Description XLk觉得<上帝造题的七分钟>不太 ...
- Servlet开发(1)
Servlet开发 Servlet引入: 百度百科详细servlet介绍: 重点概括:servlet运行在服务器上,处理用户请求. 我们使用response来获取一个写方法 PrintWriter o ...