next up previous
Next: Context-free grammars Up: Grammars and Parsing Previous: The notion of a grammar

Chomsky classification

The grammars of Example 3 have the nice following property: every production has the form A $ \longmapsto$ $ \alpha$ where A is a non-terminal symbol and $ \alpha$ is a string of grammar symbols. These grammars are called context-free grammars and will be studied in the next section. They are one of the classes of the classification of Chomsky that we present now.


TYPE 1. The grammar G = (VT, VN, S, P) is of type 1 if every production has the form

$\displaystyle \alpha_{{1}}^{{}}$A$\displaystyle \alpha_{{2}}^{{}}$  $\displaystyle \longmapsto$ $\displaystyle \alpha_{{1}}^{{}}$$\displaystyle \beta$$\displaystyle \alpha_{{2}}^{{}}$  where  $\displaystyle \left\{\vphantom{ \begin{array}{l} A \in {V}_N \\  {\alpha}_1, {\alpha}_2, {\beta} \in ({V}_T \, {\cup} \, {V}_N)^{\ast} \\  \end{array} }\right.$$\displaystyle \begin{array}{l} A \in {V}_N \\  {\alpha}_1, {\alpha}_2, {\beta} \in ({V}_T \, {\cup} \, {V}_N)^{\ast} \\  \end{array}$ (7)

Such a grammar is also called a context-sensitive grammar.


TYPE 2. The grammar G = (VT, VN, S, P) is of type 2 if every production has the form

A  $\displaystyle \longmapsto$  $\displaystyle \alpha$  where  $\displaystyle \left\{\vphantom{ \begin{array}{l} A \in {V}_N \\  {\alpha} \in ({V}_T \, {\cup} \, {V}_N)^{\ast} \end{array} }\right.$$\displaystyle \begin{array}{l} A \in {V}_N \\  {\alpha} \in ({V}_T \, {\cup} \, {V}_N)^{\ast} \end{array}$ (8)

Such a grammar is also called a context-free grammar.


TYPE 3. The grammar G = (VT, VN, S, P) is said right regular if every production has the form

A  $\displaystyle \longmapsto$  aB  or  A  $\displaystyle \longmapsto$  a    where    $\displaystyle \left\{\vphantom{ \begin{array}{l} A , B\in {V}_N \\  a \in {V}_T \, {\cup} \, \{ {\varepsilon} \} \end{array} }\right.$$\displaystyle \begin{array}{l} A , B\in {V}_N \\  a \in {V}_T \, {\cup} \, \{ {\varepsilon} \} \end{array}$ (9)

The grammar G = (VT, VN, S, P) is said left regular if every production has the form

A  $\displaystyle \longmapsto$  Ba  or  A  $\displaystyle \longmapsto$  a    where    $\displaystyle \left\{\vphantom{ \begin{array}{l} A , B\in {V}_N \\  a \in {V}_T \, {\cup} \, \{ {\varepsilon} \} \end{array} }\right.$$\displaystyle \begin{array}{l} A , B\in {V}_N \\  a \in {V}_T \, {\cup} \, \{ {\varepsilon} \} \end{array}$ (10)

The grammar G is said regular or of type 3 if it is either right regular or left regular.


TYPE 0. The grammar G = (VT, VN, S, P) is of type 0 if it is not of type 1, 2 or 3.


THE TYPE OF A LANGUAGE. Let n be an integer in the range 0, 1, 2, 3. A language L over an alphabet $ \Sigma$ is said of type n if it is generated by a grammar of type n and cannot be generated by a grammar of type n + 1. A language of type 2 is also said context-free (or algebraic) and a language of type 3 is also said regular.


REGULAR LANGUAGES. Let $ \cal {A}$ = ($ \Sigma$, S, s0, F,$ \delta$) be a DFA. We define the grammar G = ($ \Sigma$, S, s0, P) such that

It is not hard to prove that the language recognized by $ \cal {A}$ is the language generated by G. Observe that G is a regular grammar. Conversly, one can build a FA from a regular grammar such that their associated languages match. Hence we can state the following theorem.

Theorem 1   A grammar G is regular iff the language generated by G is recognized by finite automata.

Example 4   Consider $ \Sigma$ = {a, b} and the language L over $ \Sigma$ denoted by the regular expression r = (a + b) * abb. From r one can easily build the NFA shown on Figure 4. Then applying the subset algorithm one obtains the DFA shown on the same picture.
Figure 4: a NFA and a DFA denoted by (a + b) * abb.
\begin{figure}\htmlimage
\centering\includegraphics[scale=.5]{RegularGrammarAndFA.eps}
.
\end{figure}
The leads us to the following grammar:

s0 $\displaystyle \longmapsto$ as01  |  bs0
s01 $\displaystyle \longmapsto$ as01  |  bs02
s02 $\displaystyle \longmapsto$ as01  |  bs03
s03 $\displaystyle \longmapsto$ as01  |  bs0  |  $\displaystyle \varepsilon$
(11)


CONTEXT-FREE LANGUAGES. We start with two examples of context-free languages that we met already. Then we give a pumping-lemma for context-free languages as Theorem 2.

Example 5   Consider $ \Sigma$ = {(,)} and let L be the language of the well parenthesized expressions over $ \Sigma$. So these expressions are only made of parentheses and satisfy Condition 2 of Remark 1. We consider the grammar G = ($ \Sigma$,{S}, S, P) where the set of productions P is given by

S $\displaystyle \longmapsto$ (S)S  |  $\displaystyle \varepsilon$
(12)

Let L' be the language generated by G. Let us show that L is L'. By induction (on the length of a generated word) it is easy to prove that every word in L' is a well parenthesized expression, so L'  $ \subseteq$  L. Conversly by induction on the length of a well parenthesized expression, it is easy to prove that all such expressions can be generated by G, so L  $ \subseteq$  L'. Hence L = L'.

Example 6   Consider $ \Sigma$ = {a, b} and L = {anbn  |  n $ \in$ $ \mbox{${\mathbb N}$}$}. The language L is of type 2. Indeed we already know that L is not recognized by FA and thus it is not of type 3. It is easy to show that L can be generated by the following context-free grammar.
S $ \longmapsto$ aSb | $ \varepsilon$

Theorem 2 (Bar-Hillel, Perles, Shamir)   Let G = (VT, VN, S, P) be a context-free grammar. There exists an integer N such that for every word m generated by G with length | m | $ \geq$ N there exist words u, v, w, x, y over VT such that

m  =  u v w x y    with    $\displaystyle \left\{\vphantom{ \begin{array}{l} v \neq {\varepsilon} \ \ {\rm ...
...\forall k \in {\vert N}) \ u \, v^k \, w \, x^k \, y \in L \end{array} }\right.$$\displaystyle \begin{array}{l} v \neq {\varepsilon} \ \ {\rm or} \ \ x \neq {\v...
...< N \\  (\forall k \in {\vert N}) \ u \, v^k \, w \, x^k \, y \in L \end{array}$ (13)

Example 7   Consider the alphabet $ \Sigma$ = {a, b, c} = VT and the language

L  =   {anbncn  |  n $\displaystyle \in$ $\displaystyle \mbox{${\mathbb N}$}$} (14)

over $ \Sigma$. We define VN = {S, T, U, W, X, Y, Z, B}. The grammar G = (VT, VN, S, P) with P given by

S $\displaystyle \longmapsto$ abc
S $\displaystyle \longmapsto$ aTBc
TB $\displaystyle \longmapsto$ UB
UB $\displaystyle \longmapsto$ UW
UW $\displaystyle \longmapsto$ BW
BW $\displaystyle \longmapsto$ BT
Tc $\displaystyle \longmapsto$ XBcc
BX $\displaystyle \longmapsto$ BY
ZY $\displaystyle \longmapsto$ ZB
ZB $\displaystyle \longmapsto$ XB
aX $\displaystyle \longmapsto$ aaT
aX $\displaystyle \longmapsto$ aa
B $\displaystyle \longmapsto$ b
BY $\displaystyle \longmapsto$ ZY
(15)

generates the language L. Theorem 2 can be used to prove that L is not a context-free language. Therefore L is a language of type 1.


next up previous
Next: Context-free grammars Up: Grammars and Parsing Previous: The notion of a grammar
Marc Moreno Maza
2004-12-02