next up previous
Next: Chomsky classification Up: Grammars and Parsing Previous: An introduction to the notion

The notion of a grammar

A GRAMMAR is quadruple (VT, VN, S, P) such that

In other words a production is made of two words over the alphabet VT  $ \cup$  VN such that the first word contains at least one non-terminal symbol. The set VT  $ \cup$  VN is called the set of grammar symbols.

Notation 1   Here are some usual notational conventions about grammars.

Example 2   Using the conventions of Notation 1 the grammar of Example 1 can be stated as follows
E $ \longmapsto$ E A E | (E) | -E | id
A $ \longmapsto$ + | - | * | / | $ \uparrow$

Remark 2   Grammars offer significant advantages to both language designers and compiler writers. Among them

DERIVATIONS. Let G = (VT, VN, S, P) be a grammar and let $ \alpha$ and $ \beta$ be two strings of grammar symbols for G. We say that $ \beta$ derives from $ \alpha$ in one step and we write $ \alpha$ $ \Longrightarrow$ $ \beta$ if there exist strings of grammar symbols $ \alpha_{{1}}^{{}}$, $ \alpha_{{2}}^{{}}$ $ \alpha_{{3}}^{{}}$, $ \beta_{{2}}^{{}}$, such that

The transitive and reflexive closure of the map ($ \alpha$,$ \beta$) $ \longmapsto$ $ \alpha$ $ \Longrightarrow$ $ \beta$ over the sets of grammar symbols is denoted by ($ \alpha$,$ \beta$) $ \longmapsto$ $ \alpha$$ \;\stackrel{{\ast}}{{\Longrightarrow}}\;$$ \beta$. Thus for two grammar symbols $ \alpha$ and $ \beta$ we have

$\displaystyle \alpha$$\displaystyle \;\stackrel{{\ast}}{{\Longrightarrow}}\;$$\displaystyle \beta$    $\displaystyle \iff$    $\displaystyle \left\{\vphantom{ \begin{array}{c} {\alpha} = {\beta} \\  {\rm or...} \ {\rm and} \ {\gamma} {\Longrightarrow} {\beta}) \\  \end{array} }\right.$$\displaystyle \begin{array}{c} {\alpha} = {\beta} \\  {\rm or} \\  (\exists \, ...
...row} {\gamma} \ {\rm and} \ {\gamma} {\Longrightarrow} {\beta}) \\  \end{array}$ (4)

Intuitively, $ \alpha$$ \;\stackrel{{\ast}}{{\Longrightarrow}}\;$$ \beta$ means that $ \beta$ derives from $ \alpha$ in a finite number of steps. A sequence of strings of grammar symbols ($ \alpha_{{1}}^{{}}$,$ \alpha_{{2}}^{{}}$,...,$ \alpha_{{n}}^{{}}$) is a DERIVATION if we have

$\displaystyle \alpha_{{1}}^{{}}$  $\displaystyle \Longrightarrow$  $\displaystyle \alpha_{{2}}^{{}}$  $\displaystyle \Longrightarrow$   ...   $\displaystyle \Longrightarrow$ $\displaystyle \alpha_{{n}}^{{}}$. (5)

THE LANGUAGE GENERATED BY A GRAMMAR. Let G = (VT, VN, S, P) be a grammar. The language over VT generated by G and denoted by L(G) is the set of the words w of VT * such that S$ \;\stackrel{{\ast}}{{\Longrightarrow}}\;$w. The words of L(G) are called the sentences of G. More generally any string of grammar symbols $ \alpha$ such that S$ \;\stackrel{{\ast}}{{\Longrightarrow}}\;$$ \alpha$ is called a sentential form of G.

A language L over the alphabet $ \Sigma$ = VT is said formal if there exists a grammar G such that L is generated by G. Observe that

These last two observations are illustrated by Example 3.

Example 3   Let us consider again arithmetic expressions. For simplicity we consider only two operations + and *. Our language L can be generated by the grammar G1:
expr $ \longmapsto$ expr + expr | expr * expr | (expr) | id
And also by the grammar G2:
expr $ \longmapsto$ expr + term | term
term $ \longmapsto$ term * factor | factor
factor $ \longmapsto$ (expr) | id

Consider now the arithmetic expression $ \bf id$ + $ \bf id$*$ \bf id$. Two different sequences of derivations-in-one-step of G1 can lead to it:

$\displaystyle \begin{array}{lll} expr & \Rightarrow & expr + expr \\  & \Righta...
...f id} * expr \\  & \Rightarrow & {\bf id} + {\bf id} * {\bf id} \\  \end{array}$ $\displaystyle \begin{array}{lll} expr & \Rightarrow & expr * expr \\  & \Righta...
...f id} * expr \\  & \Rightarrow & {\bf id} + {\bf id} * {\bf id} \\  \end{array}$

This is sketched by the trees of Figure 2, called parse trees.
Figure 2: Two parse trees for the same sentence.
With G2 several derivations-in-one-step can lead to $ \bf id$ + $ \bf id$*$ \bf id$. However they all correspond to the same tree shown on Figure 3.
Figure 3: The parse tree of $ \bf id$ + $ \bf id$*$ \bf id$ with G2.
This notion of a parse tree is formalized later.

NON-TERMINALS can be seen as syntactic variables that denote the sets of strings that can be derived from them. They also impose a hierarchical structure on the language that is useful for both syntax analysis and translation. Moreover Example 3 shows that an accurate use of non-terminals can reduce ambiguity.

next up previous
Next: Chomsky classification Up: Grammars and Parsing Previous: An introduction to the notion
Marc Moreno Maza