next up previous
Next: Regular expressions Up: Finite Automata Previous: Finite Automata.

Languages recognized by finite automata

LANGUAGES RECOGNIZED BY FA. Let $ \Sigma$ be an alphabet. A language L over $ \Sigma$ is recognized by FA if there exists a finite automaton $ \cal {A}$ such that L is the language recognized by $ \cal {A}$.


THE PUNPING LEMMA. A natural question is

To answer the first part of this question we start with the following remark.

Proposition 1   Every language over $ \Sigma$ with a finite number of words is recognized by FA.

Indeed let L = {w1, w2,..., wn} be such a language. We can easily make FAs $ \cal {A}$1, $ \cal {A}$1, ... $ \cal {A}$n recognizing the languages L1 = {w1}, L2 = {w1}, ..., Ln = {wn}, respectively. Then by using $ \varepsilon$-transitions we can construct a single FA $ \cal {A}$ recognizing

L  =   L1  $\displaystyle \cup$  L2  $\displaystyle \cup$   ...   $\displaystyle \cup$  Ln. (3)

Therefore, if a language L is not recognized by FA then it must contain an infinite number of words. This leads us to the following remark.

Proposition 2   Let L be a language recognized by FA and with an infinite number of words. Let $ \cal {A}$ = ($ \Sigma$, S, I, T,$ \delta$) be a DFA accepting L. Then $ \cal {A}$ possesses a circuit. In other words, there exist a positive integer n together with n states s1, s2,..., sn and n - 1 letters x1, x2,..., xn-1 (not necessarily pairwise different) such that
(i)
si+1 = $ \delta$(si, xi) for every i = 1 ... n - 1,
(ii)
si $ \neq$ sj for i, j = 1 ... n - 1 and i $ \neq$ j,
(iii)
s1 = sn.

Indeed, let s1 be the initial state, q be the number of states of $ \cal {A}$ and p the number of letters. Since L is infinite, there exists a word w = x1 ... x$\scriptstyle \ell$ in L with lenght $ \ell$ > q. Recognizing w requires at least q + 1 transitions:

s2 = $\displaystyle \delta$(s1, x1), s3 = $\displaystyle \delta$(s3, x3),...s$\scriptstyle \ell$ = $\displaystyle \delta$(s$\scriptstyle \ell$-1, x$\scriptstyle \ell$-1), s$\scriptstyle \ell$+1 = $\displaystyle \delta$(s$\scriptstyle \ell$, x$\scriptstyle \ell$). (4)

At least two of the visited states si must be equal and the proposition is proved. Proposition 3 gives a more precise statement of the above statement and Figure 11 sketches its proof.

Proposition 3 (Pumping Lemma)   Let L be a language recognized by FA and with an infinite number of words. Then there exists a positive integer N such that for every word m $ \in$ L with length $ \ell$ > N there exist three words u, v, w such that
(i)
m = u v w,
(ii)
v $ \neq$ $ \varepsilon$,
(iii)
for every positive integer k we have u vk w $ \in$ L.

Figure 11: Sketch of the pumping Lemma.
\begin{figure}\htmlimage
\centering\includegraphics[scale=.5]{pumpingLemma.eps}
\end{figure}

Example 1   Let us apply Proposition 3 to the language L = {anbn} over $ \Sigma$ = {a, b}. Let us assume that L is recognized by FA. Let N, m, u, v, w be as in the proposition. If v counts more a than b then for k big enough u vk w will also have more a than b and thus cannot belong to L. Similarly if v counts more b than a then u vk w cannot belong to L for k big enough. So v must be of the form anbn. But then for k > 1 the word u vk w does not belong to L. Finally we are led to a contradiction and the language L is recognized by FA.

CONSTRUCTION OF LANGUAGES RECOGNIZED BY FA We would like now to characterize those languages which are recognized by FA. To address this question we give a series of four propositions and one theorem. Each of these four propositions provide a mechanism (or rule) to build languages recognized by FA. Theorem 3 states that these rules allow us to build all possible languages recognized by FAs.

Proposition 4   For every x $ \in$ $ \Sigma$  $ \cup$   the language L = {x} consisting of the single word w = x is recognized by FA.

Figure 12: A DFA accepting a language consisting of a single letter.
\begin{figure}\htmlimage
\centering\includegraphics[scale=.5]{singleLetterLanguage.eps}
\end{figure}
To illustrate Proposition 5, 6 and 7 we will use the concept of a normalized FA.


NORMALIZED FINITE AUTOMATA A finite automaton $ \cal {A}$ = ($ \Sigma$, S, I, F,$ \delta$) is normalized if it satisfies to the following properties

Normalized FA are generally depicted as shown on Figure 13.
Figure 13: A normalized FA.
\begin{figure}\htmlimage
\centering\includegraphics[scale=.5]{normalizedFADiagram.eps}
\end{figure}
Figure 14: Normalized FAs for R0 languages.
\begin{figure}\htmlimage
\centering\includegraphics[scale=.5]{normalizedFAofR0Language.eps}
\end{figure}


THE UNION OF TWO LANGUAGES RECOGNIZED BY FA is a language recognized by FA. Proposition 5 formulates this statement with DFAs and Figure 15 illustrates it with normalized FAs.

Figure 15: A normalized FA accepting the sum of two other normalized FAs.
\begin{figure}\htmlimage
\centering\includegraphics[scale=.4]{sumOfNormalizedFA.eps}
\end{figure}

Proposition 5   Let L1 and L2 be two languages over $ \Sigma$ recognized by the DFAs $ \cal {A}$1 = ($ \Sigma$, S1, s1, F1,$ \delta_{{1}}^{{}}$) and $ \cal {A}$2 = ($ \Sigma$, S2, s2, F2,$ \delta_{{2}}^{{}}$) respectively. Let us assume that S1 $ \cap$ S2 = $ \emptyset$. (If this is not the case then we can rename the states of $ \cal {A}$2.)

Then, the language L1 $ \cup$ L2 is recognized by the NFA

$\displaystyle \cal {A}$1+2 = ($\displaystyle \Sigma$, S1 $\displaystyle \cup$ S2,{s1, s2}, F1 $\displaystyle \cup$ F2,$\displaystyle \delta_{{{1+2}}}^{{}}$) (5)

where the transition function $ \delta_{{{1+2}}}^{{}}$ is defined as follows for every x $ \in$ $ \Sigma$ and for every s $ \in$ S1 $ \cup$ S2

$\displaystyle \delta_{{{1+2}}}^{{}}$(s, x)  =  $\displaystyle \left\{\vphantom{ \begin{array}{rcl} {\delta}_1 (s,x) & {\rm if} & s \in S_1 \\  {\delta}_1 (s,x) & {\rm if} & s \in S_2. \end{array} }\right.$$\displaystyle \begin{array}{rcl} {\delta}_1 (s,x) & {\rm if} & s \in S_1 \\  {\delta}_1 (s,x) & {\rm if} & s \in S_2. \end{array}$ (6)


THE PRODUCT OF TWO LANGUAGES RECOGNIZED BY FA is a language recognized by FA. Proposition 6 formulates this statement with DFAs and Figure 16 illustrates it with normalized FAs.

Figure 16: A normalized FA accepting the sum of two other normalized FAs.
\begin{figure}\htmlimage
\centering\includegraphics[scale=.4]{productOfNormalizedFA.eps}
\end{figure}

Proposition 6   Let L1 and L2 be two languages over $ \Sigma$ recognized by the DFAs $ \cal {A}$1 = ($ \Sigma$, S1, s1, F1,$ \delta_{{1}}^{{}}$) and $ \cal {A}$2 = ($ \Sigma$, S2, s2, F2,$ \delta_{{2}}^{{}}$) respectively. Again, let us assume that S1 $ \cap$ S2 = $ \emptyset$.

We denote by L1L2 (or L1.L2) the language consisting of all words of the form w1w2 (concatenation of w1 and w2) where w1 and w2 belong to L1 and L2 respectively. This language is called the PRODUCT LANGUAGE of L1 by L2

Then, the language L1L2 is recognized by the NFA

$\displaystyle \cal {A}$1.2 = ($\displaystyle \Sigma$, S1 $\displaystyle \cup$ S2,{s1}, F2,$\displaystyle \delta_{{{1.2}}}^{{}}$) (7)

where the transition function $ \delta_{{{1.2}}}^{{}}$ is defined as follows for every x $ \in$ $ \Sigma$ $ \cup$ {$ \varepsilon$} and for every s $ \in$ S1 $ \cup$ S2

$\displaystyle \delta_{{{1.2}}}^{{}}$(s, x)  =  $\displaystyle \left\{\vphantom{ \begin{array}{rcl} {\delta}_1 (s,x) & {\rm if} ...
...\varepsilon}} \\  {\delta}_2 (s,x) & {\rm if} & s \in S_2. \end{array} }\right.$$\displaystyle \begin{array}{rcl} {\delta}_1 (s,x) & {\rm if} & s \in (S_1 \setm...
... \ x = {{\varepsilon}} \\  {\delta}_2 (s,x) & {\rm if} & s \in S_2. \end{array}$ (8)


KLEENE CLOSURE OF A LANGUAGE. Let $ \Sigma$ be an alphabet and let w be a word over $ \Sigma$. Let n be a non-negative integer. First we define the n-th power of w, denoted by wn, as follows:

wn  =  $\displaystyle \left\{\vphantom{ \begin{array}{rcl} {{\varepsilon}} & {\rm if} & n = 0 \\  w.w^{n-1} & {\rm if} & n > 0. \\  \end{array} }\right.$$\displaystyle \begin{array}{rcl} {{\varepsilon}} & {\rm if} & n = 0 \\  w.w^{n-1} & {\rm if} & n > 0. \\  \end{array}$ (9)

Let L be a language over $ \Sigma$. Now we define the n-th power of L, denoted by Ln, as follows:

Ln  =  $\displaystyle \left\{\vphantom{ \begin{array}{rcl} \{ {{\varepsilon}} \} & {\rm if} & n = 0 \\  L.L^{n-1} & {\rm if} & n > 0. \\  \end{array} }\right.$$\displaystyle \begin{array}{rcl} \{ {{\varepsilon}} \} & {\rm if} & n = 0 \\  L.L^{n-1} & {\rm if} & n > 0. \\  \end{array}$ (10)

Then we define the star (or Kleene closure) of L, denoted by L * , as the union of all powers of L, that is

L *   =  $\displaystyle \bigcup_{{{n \geq 0}}}^{{}}$ Ln. (11)

Therefore, a word w over $ \Sigma$ belongs to L * if
Figure 17: A DFA accepting a language L and a NFAIT accepting the star of L.
\begin{figure}\htmlimage
\centering\includegraphics[scale=.4]{starOfAnAutomaton.eps}
\end{figure}
We give now some useful formulas. Let L and M be two languages over $ \Sigma$. Observe that

THE KLEENE CLOSURE OF A LANGUAGE RECOGNIZED BY FA is a language recognized by FA. Proposition 7 formulates this statement with DFAs and Figure 18 illustrates it with normalized FAs.

Figure 18: A normalized FA accepting the star of another normalized FA.
\begin{figure}\htmlimage
\centering\includegraphics[scale=.4]{starOfNormalizedFA.eps}
\end{figure}

Proposition 7   Let L be a language over $ \Sigma$ recognized by the DFA $ \cal {A}$ = ($ \Sigma$, S, s0, F,$ \delta$).

Then L * is recognized by the NFA

$\displaystyle \cal {A}$ * = ($\displaystyle \Sigma$, S, s, F,$\displaystyle \delta^{{{\ast}}}_{{}}$) (12)

where the transition function $ \delta^{{{\ast}}}_{{}}$ is defined as follows for every x $ \in$ $ \Sigma$ $ \cup$ {$ \varepsilon$} and for every s $ \in$ S

$\displaystyle \delta^{{{\ast}}}_{{}}$(s, x)  =  $\displaystyle \left\{\vphantom{ \begin{array}{rcl} {\delta}(s,x) & {\rm if} & x...
...& {\rm if} & s \in F \ \ {\rm and} \ \ x = {{\varepsilon}} \end{array} }\right.$$\displaystyle \begin{array}{rcl} {\delta}(s,x) & {\rm if} & x \neq {{\varepsilo...
...{ s_0 \} & {\rm if} & s \in F \ \ {\rm and} \ \ x = {{\varepsilon}} \end{array}$ (13)


KLEENE THEOREM. We denote by R0($ \Sigma$) the set of all languages of the form {x} for x $ \in$ $ \Sigma$  $ \cup$  {$ \varepsilon$}. Then for a positive integer n we define Rn($ \Sigma$) as the set of languages over $ \Sigma$ consisting of

  1. the languages of Rn-1($ \Sigma$),
  2. the languages of the form L + M for L, M $ \in$ Rn-1($ \Sigma$),
  3. the languages of the form L.M for L, M $ \in$ Rn-1($ \Sigma$),
  4. the languages of the form L * for L $ \in$ Rn-1($ \Sigma$).
Obviously, for every non-negative integer n, every language member of Rn($ \Sigma$) is recognized by FA. The theorem of Kleene studies the opposite direction.

Theorem 3 (Kleene)   Let L be a language recognized by FA. Then there exists a non-negative integer n such that L belongs to Rn($ \Sigma$).

This suggests the introduction of the notion of a regular expression.


next up previous
Next: Regular expressions Up: Finite Automata Previous: Finite Automata.
Marc Moreno Maza
2004-12-02