Next: Chomsky classification Up: Grammars and Parsing Previous: An introduction to the notion

## The notion of a grammar

A GRAMMAR is quadruple (VT, VN, S, P) such that

• VT is a finite set of symbols called terminals,
• VN is a finite set of symbols called non-terminals,
• S is a distinguished non-terminal called start-symbol,
• P is a finite set of couples (,) called productions where
• and belong to (VT   VN) * and
• does not belong (VT) * .
In other words a production is made of two words over the alphabet VT   VN such that the first word contains at least one non-terminal symbol. The set VT   VN is called the set of grammar symbols.

Notation 1   Here are some usual notational conventions about grammars.
• It is convenient to denote a production instead of (,).
• By default, the following symbols are terminals: lower-case letters, arithmetic operators, punctuation symbols, digits and boldface strings (if, id, ...).
• By default, the following symbols are non-terminals: upper-case letters early in the alphabet such as A, B, C,..., the letter S (for the start-symbol), lower-case italic names such as stmt, expr.
• By default, the following symbols are grammar symbols: upper-case letters late in the alphabet such as X, Y, Z
• By default, strings of terminals are denoted by lower-case letters late in the alphabet such as u, v, w.
• By default, strings of grammar symbols are denoted by lower-case Greek letters early in the alphabet such as ,,.
• By default, the start symbol is the left side of the first production.
• If A , A , ... A are all the productions with A as their left side (these productions are called A-productions) then we can write A | | ... | . The sequence | | ... | is called the alternatives of A.

Example 2   Using the conventions of Notation 1 the grammar of Example 1 can be stated as follows
 E E A E | (E) | -E | id A + | - | * | / |

Remark 2   Grammars offer significant advantages to both language designers and compiler writers. Among them
• A grammar gives a precise and easy to understand syntactic specification of a programming language.
• From certain classes of grammars we can automatically construct an efficient parser that determines if a source program is syntactically well formed.
• Developing a language given by a grammar (adding or removing constructs) is easy.

DERIVATIONS. Let G = (VT, VN, S, P) be a grammar and let and be two strings of grammar symbols for G. We say that derives from in one step and we write if there exist strings of grammar symbols , , , such that

• = ,
• = ,
• is a production of G.
The transitive and reflexive closure of the map (,) over the sets of grammar symbols is denoted by (,) . Thus for two grammar symbols and we have

 (4)

Intuitively, means that derives from in a finite number of steps. A sequence of strings of grammar symbols (,,...,) is a DERIVATION if we have

 ...   . (5)

THE LANGUAGE GENERATED BY A GRAMMAR. Let G = (VT, VN, S, P) be a grammar. The language over VT generated by G and denoted by L(G) is the set of the words w of VT * such that Sw. The words of L(G) are called the sentences of G. More generally any string of grammar symbols such that S is called a sentential form of G.

A language L over the alphabet = VT is said formal if there exists a grammar G such that L is generated by G. Observe that

• Some languages are not formal like natural languages.
• Two different sequences of derivations-in-one-step can lead to the same sentential form.
• Moreover, two different grammars can generate the same language.
These last two observations are illustrated by Example 3.

Example 3   Let us consider again arithmetic expressions. For simplicity we consider only two operations + and *. Our language L can be generated by the grammar G1:
 expr expr + expr | expr * expr | (expr) | id
And also by the grammar G2:
 expr expr + term | term term term * factor | factor factor (expr) | id

Consider now the arithmetic expression + *. Two different sequences of derivations-in-one-step of G1 can lead to it:

(6)

This is sketched by the trees of Figure 2, called parse trees.
With G2 several derivations-in-one-step can lead to + *. However they all correspond to the same tree shown on Figure 3.
This notion of a parse tree is formalized later.

NON-TERMINALS can be seen as syntactic variables that denote the sets of strings that can be derived from them. They also impose a hierarchical structure on the language that is useful for both syntax analysis and translation. Moreover Example 3 shows that an accurate use of non-terminals can reduce ambiguity.

Next: Chomsky classification Up: Grammars and Parsing Previous: An introduction to the notion
Marc Moreno Maza
2004-12-02