Next: LR Parsers Up: Parsing (II) Previous: Parsing (II)

## Bottom-up parsing

KEY IDEAS.

• The weakness of top-down LL(k) parsing techniques is that they must predict which production to use, having seen only the first k tokens in the right side.
• The more powerful techniques of bottom-up LR(k) parsing is able to postpone the decision until it has seen
• input tokens corresponding to the entire right side of the production
• and k lookahead tokens beyond

Example 21   Consider the grammar (with terminals a, b, c, d, e and nonterminals S, A, B)
 S aABe A Abc  |  b B d
Consider the input word w = abbcde. We reduce w to S by repeating the following algorithm.
1. Choose a string such that
• is the leftmost factor of w such that
• there exists a production A .
2. Replace by A in w.
Warning! This algorithm is a first approach. (True algorithms for botton-up parsing do not follow exactly this scheme.) Well, our first-approach algorithms leads to the following sequence of transformations in the first column of the table below. The second column gives the production that could be used in order to rewrite the current word backwards. The last column gives the chosen production according to the above algorithm.
 Current string Qualifying productions Chosen production abbcde A b B d A b aAbcde A Abc A b B d A Abc aAde B d B d aABe S aABe S aABe S
Observe that we have

 S  aABe  aAde  aAbcde  abbcde (44)

where    means that derives from by a rightmost derivation.

BOTTOM-UP PARSING constructs a parse tree for an input string beginning at the leaves and working up towards the root.

• To do so, bottom-up parsing tries to find a rightmost derivation of a given string backwards.
• Bottom-up parsing is also called shift-and-reduce parsing where
• shift means read the next token,
• reduce means that a substring matching the right side of a production A is replaced by A.

HANDLE. Let G = (VT, VN, S, P) be a context-free grammar. Let n be a positive integer, let w, be two strings of symbols and let A be a nonterminal such that A is a production of G. We say that (A , n) is a handle for w if there exists a string of symbols such that

 S  A    =  w (45)

and such that n = | | + 1.

More informally, a handle of w is a substring that matches a right side of a production A such that the reduction of the handle corresponds to a step of the reverse of a rightmost derivation.

Theorem 4   Let G = (VT, VN, S, P) be a context-free grammar. If G is unambiguous then for each string of symbols w such that S  w there exists exactly one handle.

Example 22   Consider the grammar
 S aSb  |
• The handle of aabb is (S , 3)
• The handle of aaSbb is (S aSb, 2)
• The handle of aSb is (S aSb, 1)

Example 23   Consider the grammar
 S AC  |  BC A a B a C c
The string ac has handles (A a, 1) and (B a, 1). So the grammar is ambiguous.

Example 24   Consider the grammar
 S AB  |  A A a B b
N.B. In the sentential form Ab the handle is (B b, 2) and not (S A, 1).

HANDLE PRUNING is the general approach used in shift-and-reduce parsing. The implementation of handle pruning involves the following data-structures:

- a stack
to hold the grammar symbols,
- an input buffer
containing the remaining input,
- a table
to decide handles.
Initially the situation is
 Stack Input $w$
where w is the sentence to parse. The possible actions of a shift-and-reduce parser are:
- shift
where the next symbol is shifted onto the top of the stack
- reduce
where the parser
• knows that the right end of the handle is at the top of the stack,
• locates the left end of the handle within the stack and
• decide with what nonterminal to replace the handle.
- accept
where the parser announces successful completion of parsing.
- error
where the parser discovers that a syntax error has occured and calls an error recovery routine.
The final configuration must be
 Stack Input $S$

Example 25   Consider the grammar G
 E E + E E E*E E (E) E
A rigthmost derivation of the sentence w = + * is

 S E + E E + E*E E + E* E + * + *
(46)

Let us sketch the actions a shift-and-reduce parser might take in parsing w to obtain the above rigthmost derivation. (Another shift-and-reduce parser might obtain another rigthmost derivation of w w.r.t. G.)
 Stack Input Action to take $+ *$ shift $+ *$ reduce with E $E + *$ shift $E + *$ shift $E + *$ reduce with E $E + E *$ shift $E + E*$ shift $E + E*$ reduce with E $E + E*E$ reduce with E E*E $E + E$ reduce with E E + E $E$ accept
Of course the decisions (especially for the 4th shift) made in the above table are not explained at this point.

Handle pruning can be implemented using a stack because of the following fact.

Proposition 1   Let G = (VT, VN, S, P) be an unambiguous context-free grammar. In a shift-and-reduce parser for G implemented with a stack, the right end of the handle will always appear on top the stack.

Proof. This fact becomes clear when considering the possible forms of two successive steps in any rightmost derivation. These two steps can be of the form (type 1)

 S   A z    B y z     y z (47)

or the form (type 2)

 S   B x A z   B x y z    x y z (48)

• In type 1, first the rightmost nonterminal A is replaced by the right side of the production A  B y and then the rightmost nonterminal B is replaced by the right side of the production B .
• In type 2, the rightmost nonterminal A is replaced first by use of the rule A y (where y is a string of one or more terminals) and then B is replaced by using B .
We proceed by induction. Let us assume that the handle B appears on top of the stack. Then let us show that the handle A  B y in type 1 (and the handle A y in type 2) appears on top also. Let us consider type 1 in reverse, where a shift-and-reduce parser has just reached the following configuration.
 Stack Input y z
The parser now reduces the handle.
 Stack Input B y z
Since B is the rightmost nonterminal, the right end of the handle of w =   Byz cannot be to the left of B. Therefore the parser can shift y.
 Stack Input B y z
For case 2 in reverse, assume the following configuration has been reached
 Stack Input x y z