Next: Chomsky classification
Up: Grammars and Parsing
Previous: An introduction to the notion
A GRAMMAR is quadruple
(V_{T}, V_{N}, S, P) such that
 V_{T} is a finite set of symbols called terminals,
 V_{N} is a finite set of symbols called nonterminals,
 S is a distinguished nonterminal called startsymbol,
 P is a finite set of couples
(,)
called productions where
 and belong
to
(V_{T} V_{N})^{ * } and
 does not belong
(V_{T})^{ * }.
In other words a production is made of two words over
the alphabet
V_{T} V_{N} such that
the first word contains at least one nonterminal symbol.
The set
V_{T} V_{N} is called the set
of grammar symbols.
Notation 1
Here are some usual notational conventions about grammars.
 It is convenient to denote a production
instead of
(,).
 By default, the following symbols are terminals:
lowercase letters, arithmetic operators, punctuation symbols,
digits and boldface strings (if, id, ...).
 By default, the following symbols are nonterminals:
uppercase letters early in the alphabet such as
A, B, C,...,
the letter S (for the startsymbol), lowercase italic names
such as stmt, expr.
 By default, the following symbols are grammar symbols:
uppercase letters late in the alphabet such as X, Y, Z
 By default, strings of terminals are denoted by
lowercase letters late in the alphabet such as u, v, w.
 By default, strings of grammar symbols are denoted by
lowercase Greek letters early in the alphabet
such as
,,.
 By default, the start symbol is the left side of the first production.
 If
A ,
A , ...
A are all the productions with A
as their left side (these productions are called Aproductions)
then we can write
A   ^{ ... }  .
The sequence
  ^{ ... } 
is called the alternatives of A.
Example 2
Using the conventions of Notation
1
the grammar of Example
1
can be stated as follows
E 

E A E  (E)  E  id 
A 

+    *  /  
Remark 2
Grammars offer significant advantages to both language
designers and compiler writers. Among them
 A grammar gives a precise and easy to understand syntactic specification
of a programming language.
 From certain classes of grammars we can automatically
construct an efficient parser that determines if a source
program is syntactically well formed.
 Developing a language given by a grammar
(adding or removing constructs) is easy.
DERIVATIONS. Let
G = (V_{T}, V_{N}, S, P) be a grammar
and let and be two strings of grammar symbols for G.
We say that derives from in one step
and we write
if there exist strings of grammar symbols
,
, , such that

= ,

= ,

is a production of G.
The transitive and reflexive closure of the
map
(,)
over the sets of grammar symbols is denoted by
(,) .
Thus for two grammar symbols and we have
Intuitively,
means that derives from in a finite number of steps.
A sequence of strings of grammar symbols
(,,...,)
is a DERIVATION if we have
THE LANGUAGE GENERATED BY A GRAMMAR. Let
G = (V_{T}, V_{N}, S, P) be a grammar.
The language over V_{T} generated by G and denoted by L(G)
is the set of the words w of
V_{T}^{ * } such that
Sw.
The words of L(G) are called the sentences of G.
More generally any string of grammar symbols
such that
S is called
a sentential form of G.
A language L over the alphabet
= V_{T} is said
formal if there exists a grammar G such that
L is generated by G.
Observe that
 Some languages are not formal like natural languages.
 Two different sequences of derivationsinonestep can
lead to the same sentential form.
 Moreover, two different grammars can generate the same language.
These last two observations are illustrated by
Example 3.
Example 3
Let us consider again arithmetic expressions.
For simplicity we consider only two operations + and *.
Our language L can be generated by the grammar G_{1}:
expr 

expr + expr  expr * expr  (expr)  id 
And also by the grammar G_{2}:
expr 

expr + term  term 
term 

term * factor  factor 
factor 

(expr)  id 
Consider now the arithmetic expression
+ *.
Two different sequences of derivationsinonestep of G_{1} can lead to it:
This is sketched
by the trees of Figure 2, called parse trees.
Figure 2:
Two parse trees for the same sentence.

With G_{2} several derivationsinonestep can lead to
+ *.
However they all correspond to the same tree
shown on Figure 3.
Figure 3:
The parse tree of
+ * with G_{2}.

This notion of a parse tree is formalized later.
NONTERMINALS can be seen as syntactic variables
that denote the sets of strings that can be derived from them.
They also impose a hierarchical structure on the language
that is useful for both syntax analysis and translation.
Moreover Example 3
shows that an accurate use of nonterminals can reduce
ambiguity.
Next: Chomsky classification
Up: Grammars and Parsing
Previous: An introduction to the notion
Marc Moreno Maza
20041202