Languages recognized by finite automata

Next: Regular expressions Up: Finite Automata Previous: Finite Automata.

Languages recognized by finite automata

LANGUAGES RECOGNIZED BY FA. Let $\Sigma$ be an alphabet. A language L over $\Sigma$ is recognized by FA if there exists a finite automaton $\cal {A}$ such that L is the language recognized by $\cal {A}$ .

THE PUNPING LEMMA. A natural question is

are all languages over $\Sigma$ recognized by FA?
and if no can we characterize those languages which are recognized by FA?

To answer the first part of this question we start with the following remark.

Proposition 1 Every language over $\Sigma$ with a finite number of words is recognized by FA.

Indeed let L = {w₁, w₂,..., w_n} be such a language. We can easily make FAs $\cal {A}$ ₁, $\cal {A}$ ₁, ... $\cal {A}$ _n recognizing the languages L₁ = {w₁}, L₂ = {w₁}, ..., L_n = {w_n}, respectively. Then by using $\varepsilon$ -transitions we can construct a single FA $\cal {A}$ recognizing

L = L₁ $\displaystyle \cup$ L₂ $\displaystyle \cup$ ^... $\displaystyle \cup$ L_n.

(3)

Therefore, if a language L is not recognized by FA then it must contain an infinite number of words. This leads us to the following remark.

Proposition 2 Let L be a language recognized by FA and with an infinite number of words. Let $\cal {A}$ = ( $\Sigma$ , S, I, T, $\delta$ ) be a DFA accepting L. Then $\cal {A}$ possesses a circuit. In other words, there exist a positive integer n together with n states s₁, s₂,..., s_n and n - 1 letters x₁, x₂,..., x_n-1 (not necessarily pairwise different) such that

(i): s_i+1 = $\delta$ (s_i, x_i) for every i = 1^...n - 1,
(ii): s_i $\neq$ s_j for i, j = 1^...n - 1 and i $\neq$ j,
(iii): s₁ = s_n.

Indeed, let s₁ be the initial state, q be the number of states of $\cal {A}$ and p the number of letters. Since L is infinite, there exists a word w = x₁^...x in L with lenght $\ell$ > q. Recognizing w requires at least q + 1 transitions:

s₂ = $\displaystyle \delta$ (s₁, x₁), s₃ = $\displaystyle \delta$ (s₃, x₃),...s = $\displaystyle \delta$ (s_-1, x_-1), s₊₁ = $\displaystyle \delta$ (s, x).

(4)

At least two of the visited states s_i must be equal and the proposition is proved. Proposition 3 gives a more precise statement of the above statement and Figure 11 sketches its proof.

Proposition 3 (Pumping Lemma) Let L be a language recognized by FA and with an infinite number of words. Then there exists a positive integer N such that for every word m $\in$ L with length $\ell$ > N there exist three words u, v, w such that

(i): m = u v w,
(ii): v $\neq$ $\varepsilon$ ,
(iii): for every positive integer k we have u v^k w $\in$ L.

**Figure 11:** Sketch of the pumping Lemma.
$\begin{figure}\htmlimage \centering\includegraphics[scale=.5]{pumpingLemma.eps} \end{figure}$

Example 1 Let us apply Proposition 3 to the language L = {aⁿbⁿ} over $\Sigma$ = {a, b}. Let us assume that L is recognized by FA. Let N, m, u, v, w be as in the proposition. If v counts more a than b then for k big enough u v^k w will also have more a than b and thus cannot belong to L. Similarly if v counts more b than a then u v^k w cannot belong to L for k big enough. So v must be of the form aⁿbⁿ. But then for k > 1 the word u v^k w does not belong to L. Finally we are led to a contradiction and the language L is recognized by FA.

CONSTRUCTION OF LANGUAGES RECOGNIZED BY FA We would like now to characterize those languages which are recognized by FA. To address this question we give a series of four propositions and one theorem. Each of these four propositions provide a mechanism (or rule) to build languages recognized by FA. Theorem 3 states that these rules allow us to build all possible languages recognized by FAs.

Proposition 4 expresses the fact that languages consisting of a single word consisting itself of a single letter are recognized by FA. This statement is an obvious consequence of the previous Proposition 1 and is illustrated by Figure 12.
Proposition 5 expresses the fact that the union of two languages recognized by FA is itself recognized by FA. This statement is also trivial and was illustrated by Figure 4.
Proposition 6 expresses the fact that the concatenation (or product) of two languages recognized by FA is also recognized by FA. This statement is also not difficult to prove and we have already used this result implicitely with most examples.
Proposition 7 expresses the fact that the star of a language recognized by FA is also recognized by FA. The star (or Kleene closure) of a language is defined and illustrated below.

Proposition 4 For every x $\in$ $\Sigma$ $\cup$ the language L = {x} consisting of the single word w = x is recognized by FA.

**Figure 12:** A DFA accepting a language consisting of a single letter.
$\begin{figure}\htmlimage \centering\includegraphics[scale=.5]{singleLetterLanguage.eps} \end{figure}$

To illustrate Proposition 5, 6 and 7 we will use the concept of a normalized FA.

NORMALIZED FINITE AUTOMATA A finite automaton $\cal {A}$ = ( $\Sigma$ , S, I, F, $\delta$ ) is normalized if it satisfies to the following properties

$\cal {A}$ has only one initial state, say s,
$\cal {A}$ has only one final (= accepting) state, say f,
no transition leads to the initial state s, or more formally, for every t $\in$ S and every x $\in$ $\Sigma$ $\cup$ { $\varepsilon$ } we have s $\notin$ $\delta$ (t, x),
no transition leaves from the final state f, or more formally, for every x $\in$ $\Sigma$ $\cup$ { $\varepsilon$ } we have $\delta$ (f, x) = $\emptyset$ .

Normalized FA are generally depicted as shown on Figure 13.

**Figure 13:** A normalized FA.
$\begin{figure}\htmlimage \centering\includegraphics[scale=.5]{normalizedFADiagram.eps} \end{figure}$

**Figure 14:** Normalized FAs for R0 languages.
$\begin{figure}\htmlimage \centering\includegraphics[scale=.5]{normalizedFAofR0Language.eps} \end{figure}$

THE UNION OF TWO LANGUAGES RECOGNIZED BY FA is a language recognized by FA. Proposition 5 formulates this statement with DFAs and Figure 15 illustrates it with normalized FAs.

**Figure 15:** A normalized FA accepting the sum of two other normalized FAs.
$\begin{figure}\htmlimage \centering\includegraphics[scale=.4]{sumOfNormalizedFA.eps} \end{figure}$

Proposition 5 Let L₁ and L₂ be two languages over $\Sigma$ recognized by the DFAs $\cal {A}$ ₁ = ( $\Sigma$ , S₁, s₁, F₁, $\delta_{{1}}^{{}}$ ) and $\cal {A}$ ₂ = ( $\Sigma$ , S₂, s₂, F₂, $\delta_{{2}}^{{}}$ ) respectively. Let us assume that S₁ $\cap$ S₂ = $\emptyset$ . (If this is not the case then we can rename the states of $\cal {A}$ ₂.)

Then, the language L₁ $\cup$ L₂ is recognized by the NFA

$\displaystyle \cal {A}$ ₁₊₂ = ( $\displaystyle \Sigma$ , S₁ $\displaystyle \cup$ S₂,{s₁, s₂}, F₁ $\displaystyle \cup$ F₂, $\displaystyle \delta_{{{1+2}}}^{{}}$ )

(5)

where the transition function $\delta_{{{1+2}}}^{{}}$ is defined as follows for every x $\in$ $\Sigma$ and for every s $\in$ S₁ $\cup$ S₂

$\displaystyle \delta_{{{1+2}}}^{{}}$ (s, x) = $\displaystyle \left\{\vphantom{ \begin{array}{rcl} {\delta}_1 (s,x) & {\rm if} & s \in S_1 \\ {\delta}_1 (s,x) & {\rm if} & s \in S_2. \end{array} }\right.$ $\displaystyle \begin{array}{rcl} {\delta}_1 (s,x) & {\rm if} & s \in S_1 \\ {\delta}_1 (s,x) & {\rm if} & s \in S_2. \end{array}$

(6)

THE PRODUCT OF TWO LANGUAGES RECOGNIZED BY FA is a language recognized by FA. Proposition 6 formulates this statement with DFAs and Figure 16 illustrates it with normalized FAs.

**Figure 16:** A normalized FA accepting the sum of two other normalized FAs.
$\begin{figure}\htmlimage \centering\includegraphics[scale=.4]{productOfNormalizedFA.eps} \end{figure}$

Proposition 6 Let L₁ and L₂ be two languages over $\Sigma$ recognized by the DFAs $\cal {A}$ ₁ = ( $\Sigma$ , S₁, s₁, F₁, $\delta_{{1}}^{{}}$ ) and $\cal {A}$ ₂ = ( $\Sigma$ , S₂, s₂, F₂, $\delta_{{2}}^{{}}$ ) respectively. Again, let us assume that S₁ $\cap$ S₂ = $\emptyset$ .

We denote by L₁L₂ (or L₁.L₂) the language consisting of all words of the form w₁w₂ (concatenation of w₁ and w₂) where w₁ and w₂ belong to L₁ and L₂ respectively. This language is called the PRODUCT LANGUAGE of L₁ by L₂

Then, the language L₁L₂ is recognized by the NFA

$\displaystyle \cal {A}$ _1.2 = ( $\displaystyle \Sigma$ , S₁ $\displaystyle \cup$ S₂,{s₁}, F₂, $\displaystyle \delta_{{{1.2}}}^{{}}$ )

(7)

where the transition function $\delta_{{{1.2}}}^{{}}$ is defined as follows for every x $\in$ $\Sigma$ $\cup$ { $\varepsilon$ } and for every s $\in$ S₁ $\cup$ S₂

$\displaystyle \delta_{{{1.2}}}^{{}}$ (s, x) = $\displaystyle \left\{\vphantom{ \begin{array}{rcl} {\delta}_1 (s,x) & {\rm if} ... ...\varepsilon}} \\ {\delta}_2 (s,x) & {\rm if} & s \in S_2. \end{array} }\right.$ $\displaystyle \begin{array}{rcl} {\delta}_1 (s,x) & {\rm if} & s \in (S_1 \setm... ... \ x = {{\varepsilon}} \\ {\delta}_2 (s,x) & {\rm if} & s \in S_2. \end{array}$

(8)

KLEENE CLOSURE OF A LANGUAGE. Let $\Sigma$ be an alphabet and let w be a word over $\Sigma$ . Let n be a non-negative integer. First we define the n-th power of w, denoted by wⁿ, as follows:

wⁿ = $\displaystyle \left\{\vphantom{ \begin{array}{rcl} {{\varepsilon}} & {\rm if} & n = 0 \\ w.w^{n-1} & {\rm if} & n > 0. \\ \end{array} }\right.$ $\displaystyle \begin{array}{rcl} {{\varepsilon}} & {\rm if} & n = 0 \\ w.w^{n-1} & {\rm if} & n > 0. \\ \end{array}$

(9)

Let L be a language over $\Sigma$ . Now we define the n-th power of L, denoted by Lⁿ, as follows:

Lⁿ = $\displaystyle \left\{\vphantom{ \begin{array}{rcl} \{ {{\varepsilon}} \} & {\rm if} & n = 0 \\ L.L^{n-1} & {\rm if} & n > 0. \\ \end{array} }\right.$ $\displaystyle \begin{array}{rcl} \{ {{\varepsilon}} \} & {\rm if} & n = 0 \\ L.L^{n-1} & {\rm if} & n > 0. \\ \end{array}$

(10)

Then we define the star (or Kleene closure) of L, denoted by L^*, as the union of all powers of L, that is

L^* = $\displaystyle \bigcup_{{{n \geq 0}}}^{{}}$ Lⁿ. (11)

Therefore, a word w over $\Sigma$ belongs to L^* if

either w is the empty word $\varepsilon$
or there exists a positive integer n and a word u such that w = uⁿ.

**Figure 17:** A DFA accepting a language L and a NFAIT accepting the star of L.
$\begin{figure}\htmlimage \centering\includegraphics[scale=.4]{starOfAnAutomaton.eps} \end{figure}$

We give now some useful formulas. Let L and M be two languages over $\Sigma$ . Observe that

$\Sigma^{{{\ast}}}_{{}}$ is the set of all words over $\Sigma$ .
Let L⁺ be the union of all Lⁿ with n > 0. Then L⁺ = L.L^*.
If L $\subseteq$ M then L^* $\subseteq$ M^*.
For every non-negative integer n we ahve L^{* n} = L^*.
L^{* *} = L^*.
(L^* + M^*)^* = (L^*.M^*)^*.
(L^* + M^*)^* = (L + M)^*.

THE KLEENE CLOSURE OF A LANGUAGE RECOGNIZED BY FA is a language recognized by FA. Proposition 7 formulates this statement with DFAs and Figure 18 illustrates it with normalized FAs.

**Figure 18:** A normalized FA accepting the star of another normalized FA.
$\begin{figure}\htmlimage \centering\includegraphics[scale=.4]{starOfNormalizedFA.eps} \end{figure}$

Proposition 7 Let L be a language over $\Sigma$ recognized by the DFA $\cal {A}$ = ( $\Sigma$ , S, s₀, F, $\delta$ ).

Then L^* is recognized by the NFA

$\displaystyle \cal {A}$ ^* = ( $\displaystyle \Sigma$ , S, s, F, $\displaystyle \delta^{{{\ast}}}_{{}}$ ) (12)

where the transition function $\delta^{{{\ast}}}_{{}}$ is defined as follows for every x $\in$ $\Sigma$ $\cup$ { $\varepsilon$ } and for every s $\in$ S

$\displaystyle \delta^{{{\ast}}}_{{}}$ (s, x) = $\displaystyle \left\{\vphantom{ \begin{array}{rcl} {\delta}(s,x) & {\rm if} & x... ...& {\rm if} & s \in F \ \ {\rm and} \ \ x = {{\varepsilon}} \end{array} }\right.$ $\displaystyle \begin{array}{rcl} {\delta}(s,x) & {\rm if} & x \neq {{\varepsilo... ...{ s_0 \} & {\rm if} & s \in F \ \ {\rm and} \ \ x = {{\varepsilon}} \end{array}$

(13)

KLEENE THEOREM. We denote by R₀( $\Sigma$ ) the set of all languages of the form {x} for x $\in$ $\Sigma$ $\cup$ { $\varepsilon$ }. Then for a positive integer n we define R_n( $\Sigma$ ) as the set of languages over $\Sigma$ consisting of

the languages of R_n-1( $\Sigma$ ),
the languages of the form L + M for L, M $\in$ R_n-1( $\Sigma$ ),
the languages of the form L.M for L, M $\in$ R_n-1( $\Sigma$ ),
the languages of the form L^* for L $\in$ R_n-1( $\Sigma$ ).

Obviously, for every non-negative integer n, every language member of R_n( $\Sigma$ ) is recognized by FA. The theorem of Kleene studies the opposite direction.

Theorem 3 (Kleene) Let L be a language recognized by FA. Then there exists a non-negative integer n such that L belongs to R_n( $\Sigma$ ).

This suggests the introduction of the notion of a regular expression.

Next: Regular expressions Up: Finite Automata Previous: Finite Automata.

Marc Moreno Maza
2004-12-02