Next: Non-recursive implementation of predictive parsing Up: Parsing Previous: Top-down parsing

## Predictive parsing

THE IDEA. Predictive parsing relies on information about what first symbols can be generated by the right side of a production. The lookahead symbol guides the selection of the production A to be used:
• if starts with a token, then the production can be used when the lookahead symbol matches this token
• if starts with a nonterminal B, then the production can be used if the lookahead symbol can be generated from B.

THE METHOD.

(1)
For each nonterminal A for each production A let FIRST() be the subset of VT   {} defined by the following rules
• for every a VT we have

 a     ( (VT VN) * )  a (40)

• FIRST(
In other words the token a belongs to FIRST() iff there exists a string deriving from such that a is the first symbol of . Moreover belongs to FIRST() iff derives from .
(2)
Consider the following procedure in pseudo-code of Algorithm 1.

Algorithm 1

So match(a) moves the cursor lookahead one symbol forward iff lookahead points to a. Otherwise an error is produced.
(3)
A procedure proc (given by Algorithm 2) is associated with every nonterminal A.

Algorithm 2

In other words, given a nonterminal A the call proc chooses an A-production A    such that lookahead belomgs to FIRST(). If such a production exists then for each symbol X in (reading from left to right)
• if X is a terminal, then call match(X)
• if X is a nonterminal, then call proc
If no A-production A    satisfies lookahead FIRST() then either A    is a production and the procedure terminates normally, or an error is produced.
(4)
Initially lookahead is the first symol of the input string and we run procS where S is the start symbol.
Observations:
• In Algorithm 2 return is an escape statement without error whereas error is an interruption with error.
• The grammar must have no left recursive derivations, otherwise the parsing may lead to an infinite loop.
• Back-tracking is avoided provided that for every nonterminal A and for every string of symbols ,

 FIRST()  FIRST()  =  . (41)

• The last statement of Algorithm 2 means that the A-production A can be chosen if the lookahead symbol is not in any of the FIRST() for A with .
Hint: You should understand the production A as a way to say that the nonterminal A can be canceled (if no other A-production is convenient to use).

Example 15   Consider the following grammar (with terminals a, b, c, d and nonterminals S, A)
 S cAd A ab  | a
Here we have FIRST(cAd) = {c}, FIRST(ab) = {a} and FIRST(a) = {a}. Unfortunately, if we have lookahead = a we cannot tell each A-production to use. So left factoring would be needed to process this example further (what we will not do).

Example 16   The following grammar generates a subset of the types in the PASCAL language.
 type simple | id | array[simple] of type simple integer | char | num dotdot num
where
• all bold names and the characters ,, [, ] are tokens,
• type and simple are nonterminals,
• dotdot stands for ...
• and num for a type of small integers.
Here's a string of tokens generated by the above grammar.
array [ num dotdot num ] of integer
We give now the FIRST sets of every left side of the above productions.
 FIRST(simple) = { integer, char, num } FIRST(id) = { } FIRST(array[simple] of type) = { array } FIRST(integer) = { integer } FIRST(char) = { char } FIRST(num dotdot num) = { num }
Algorithms 3 and 4 define the procedures proc_simple and proc_type.

Algorithm 3

Algorithm 4

Next: Non-recursive implementation of predictive parsing Up: Parsing Previous: Top-down parsing
Marc Moreno Maza
2004-12-02