Next: Left factoring Up: Context-free grammars Previous: Elimination of ambiguity.

## Elimination of left recursion

LEFT RECURSION. Let G be a context-free grammar. A production of G is said left recursive if it has the form

 A   A (20)

where A is a nonterminal and is a string of grammar symbols. A nonterminal A of G is said left recursive if there exists a string of grammar symbols such that we have

 A    A (21)

Such derivation is also said left recursive. The grammar G is left recursive if it has at least one left recursive nonterminal.

Remark 4   Top-down parsing is one of the methods that we will study for generating parse trees. This method cannot handle left recursive grammars. We present now an algorithm for transforming a left recursive grammar G into a grammar G' which is not left recursive and which generates the same language as G.

THE BASIC TRICK is to replace the production

 A   A  |      by (22)

where is assumed. Observe that this trick eliminates left recursive productions. Applying this trick is not enough to remove all derivations of the form A  A. However this trick is sufficient for many grammars.

Example 10   The following grammar which generates arithmetic expressions

 E E + T  |  T T T*F  |  F F (E)  |
(23)

has two left recursive productions. Applying the above trick leads to

 E TE' E' + TE'  | T FT' T' *FT'  | F (E)  |
(24)

THE CASE OF SEVERAL LEFT RECURSIVE A-PRODUCTIONS. Assume that the set of all A-productions has the form

 A   A  |  A  |   ... A  |    |    |   ... (25)

where no begins with an A and where holds for every i = 1 ... m. Then we repalce these A-productions by

 A A'  |  A'  |   ...  A' A' A'  |  A'  |   ...  A'  |
(26)

Remark 5   Let us consider the following grammar.

 S Aa  |  b A Ac  |   Sd  |
(27)

The nonterminal S is left recursive since we have

 S   Aa   Sda (28)

However there is no left recursive S-productions. We show now how to deal with these left recursive derivations.

REMOVING ALL LEFT RECURSIVE DERIVATIONS. Let us assume that non-terminals are numbered: A1, A2,..., An. The goal is to remove any possible circuit for all 1 j < i n

 AiAjAi. (29)

The idea is for i = 1 ... n
• To make sure that every Ai-production does not have a form Ai Aj for some j < i.
• To remove any left recursive Ai-production.
The method in more detail:
• remove all left recursive A1-productions (by the above trick)
• remove A1 from the right-hand side of each A2-production of the form A2 A1 (by applying all A1-productions)
• remove all left recursive A2-productions
• remove Aj from the right-hand side of each A3-production of the form A3 Aj for j = 1, 2.
• remove all left recursive A3-productions
• ...
• remove Aj from the right-hand side of each Ai-production of the form Ai Aj (for every j such that 1 j < i n)
• remove all left recursive Ai-productions
• ...
Observations:
• To remove Aj from the right-hand side of the Ai-production Ai Aj (where 1 j < i n) replace

 Ai Aj  by  Ai   |   ...   |    where  Aj   |   ...   | (30)

are all Aj-productions and for i = 1 ... k.
• This construction may not work if G constains initially a production of the form A   . See Example 11. However we can always remove such productions. Explain how!
• Also, this construction may fail if the grammar G has a cycle, that is, if there exists a nonterminal A such that A  A. This can happen with the following grammar fragment

 A1 A2  | A2 A1  |
(31)

where
• is a string of grammar symbols not starting with A1 (what we can assume by application of the basic trick given in Relation 22)
• is a string of grammar symbols not starting with A1 (what we can assume by application of the A1-productions).
Then we can replace the above fragment by

 A1 | A2 |
(32)

without changing the generated language. Unfortunately a cycle may be hidden in a more subtle manner. Indeed may write A2 and may write A1 with  ,  ,   and  . However this may only happen if there are productions of the form A .
• In conclusion, the above construction for removing all left recursive derivations should be performed after
• removing all productions of the form A ,
• removing cycles, that is derivations of the form A  A.
Now observe that the basic trick given in Relation 22 introduces productions of the form A . However one can show that the above construction cannot reintroduce cycles.

Example 11   Consider the following grammar.

 A3 A2A1 A2 A1 A2A1
(33)

Applying the previous algorithm for removing all possible left-recursive derivations fail on this grammar, because of the -production.

Example 12   Consider again the following grammar.

 S Aa  |  b A Ac  |  Sd  |
(34)

We order the nonterminals: S < A. There is no left recursive S-productions. So the next step is to remove S from the right-hand side of the A-productions of the form A S. Hence we obtain

 S Aa  |  b A Ac  |  Aad  |  bd  |
(35)

Then we remove the left recursive A-productions.

 S Aa  |  b A bdA'  |  A' A' cA'  |  adA'  |
(36)

Next: Left factoring Up: Context-free grammars Previous: Elimination of ambiguity.
Marc Moreno Maza
2004-12-02