next up previous
Next: Non-recursive implementation of predictive parsing Up: Parsing Previous: Top-down parsing

Predictive parsing

THE IDEA. Predictive parsing relies on information about what first symbols can be generated by the right side of a production. The lookahead symbol guides the selection of the production A $ \longmapsto$ $ \alpha$ to be used:

THE METHOD.

(1)
For each nonterminal A for each production A $ \longmapsto$ $ \alpha$ let FIRST($ \alpha$) be the subset of VT  $ \cup$  {$ \varepsilon$} defined by the following rules In other words the token a belongs to FIRST($ \alpha$) iff there exists a string $ \gamma$ deriving from $ \alpha$ such that a is the first symbol of $ \gamma$. Moreover $ \varepsilon$ belongs to FIRST($ \alpha$) iff $ \varepsilon$ derives from $ \alpha$.
(2)
Consider the following procedure in pseudo-code of Algorithm 1.

Algorithm 1  

\fbox{
\begin{minipage}{10 cm}
\begin{tabbing}
\quad \= \quad \= \quad \= \quad ...
...{\em nexttoken} \\
\> \> \> {\bf else} {\bf error}
\end{tabbing}\end{minipage}}

So match(a) moves the cursor lookahead one symbol forward iff lookahead points to a. Otherwise an error is produced.
(3)
A procedure proc$ \_A$ (given by Algorithm 2) is associated with every nonterminal A.

Algorithm 2  

\fbox{
\begin{minipage}{12 cm}
\begin{tabbing}
\quad \= \quad \= \quad \= \quad ...
...n}$\ {\bf then} {\bf return} {\bf else} {\bf error}
\end{tabbing}\end{minipage}}

In other words, given a nonterminal A the call proc$ \_A$ chooses an A-production A  $ \longmapsto$  $ \alpha$ such that lookahead belomgs to FIRST($ \alpha$). If such a production exists then for each symbol X in $ \alpha$ (reading $ \alpha$ from left to right) If no A-production A  $ \longmapsto$  $ \alpha$ satisfies lookahead $ \in$ FIRST($ \alpha$) then either A  $ \longmapsto$  $ \varepsilon$ is a production and the procedure terminates normally, or an error is produced.
(4)
Initially lookahead is the first symol of the input string and we run procS where S is the start symbol.
Observations: Hint: You should understand the production A $ \longmapsto$ $ \varepsilon$ as a way to say that the nonterminal A can be canceled (if no other A-production is convenient to use).

Example 15   Consider the following grammar (with terminals a, b, c, d and nonterminals S, A)
S $ \longmapsto$ cAd
A $ \longmapsto$ ab  | a
Here we have FIRST(cAd) = {c}, FIRST(ab) = {a} and FIRST(a) = {a}. Unfortunately, if we have lookahead = a we cannot tell each A-production to use. So left factoring would be needed to process this example further (what we will not do).

Example 16   The following grammar generates a subset of the types in the PASCAL language.
type $ \longmapsto$ simple | $ \uparrow$id | array[simple] of type
simple $ \longmapsto$ integer | char | num dotdot num
where Here's a string of tokens generated by the above grammar.
array [ num dotdot num ] of integer
We give now the FIRST sets of every left side of the above productions.
FIRST(simple) = { integer, char, num }
FIRST($ \uparrow$id) = { $ \uparrow$ }
FIRST(array[simple] of type) = { array }
FIRST(integer) = { integer }
FIRST(char) = { char }
FIRST(num dotdot num) = { num }
Algorithms 3 and 4 define the procedures proc_simple and proc_type.

Algorithm 3  

\fbox{
\begin{minipage}{13 cm}
\begin{tabbing}
\quad \= \quad \= \quad \= \quad ...
...\> \> {\bf else} \\
\> \> \> \> \> \> {\bf error}
\end{tabbing}\end{minipage}}

Algorithm 4  

\fbox{
\begin{minipage}{12 cm}
\begin{tabbing}
\quad \= \quad \= \quad \= \quad ...
... \> \> \> {\bf else} \\
\> \> \> \> \> {\bf error}
\end{tabbing}\end{minipage}}


next up previous
Next: Non-recursive implementation of predictive parsing Up: Parsing Previous: Top-down parsing
Marc Moreno Maza
2004-12-02