Next: Regular expressions Up: Finite Automata Previous: Finite Automata.

## Languages recognized by finite automata

LANGUAGES RECOGNIZED BY FA. Let be an alphabet. A language L over is recognized by FA if there exists a finite automaton such that L is the language recognized by .

THE PUNPING LEMMA. A natural question is

• are all languages over recognized by FA?
• and if no can we characterize those languages which are recognized by FA?

Proposition 1   Every language over with a finite number of words is recognized by FA.

Indeed let L = {w1, w2,..., wn} be such a language. We can easily make FAs 1, 1, ... n recognizing the languages L1 = {w1}, L2 = {w1}, ..., Ln = {wn}, respectively. Then by using -transitions we can construct a single FA recognizing

 L  =   L1   L2    ...    Ln. (3)

Therefore, if a language L is not recognized by FA then it must contain an infinite number of words. This leads us to the following remark.

Proposition 2   Let L be a language recognized by FA and with an infinite number of words. Let = (, S, I, T,) be a DFA accepting L. Then possesses a circuit. In other words, there exist a positive integer n together with n states s1, s2,..., sn and n - 1 letters x1, x2,..., xn-1 (not necessarily pairwise different) such that
(i)
si+1 = (si, xi) for every i = 1 ... n - 1,
(ii)
si sj for i, j = 1 ... n - 1 and i j,
(iii)
s1 = sn.

Indeed, let s1 be the initial state, q be the number of states of and p the number of letters. Since L is infinite, there exists a word w = x1 ... x in L with lenght > q. Recognizing w requires at least q + 1 transitions:

 s2 = (s1, x1), s3 = (s3, x3),...s = (s-1, x-1), s+1 = (s, x). (4)

At least two of the visited states si must be equal and the proposition is proved. Proposition 3 gives a more precise statement of the above statement and Figure 11 sketches its proof.

Proposition 3 (Pumping Lemma)   Let L be a language recognized by FA and with an infinite number of words. Then there exists a positive integer N such that for every word m L with length > N there exist three words u, v, w such that
(i)
m = u v w,
(ii)
v ,
(iii)
for every positive integer k we have u vk w L.

Example 1   Let us apply Proposition 3 to the language L = {anbn} over = {a, b}. Let us assume that L is recognized by FA. Let N, m, u, v, w be as in the proposition. If v counts more a than b then for k big enough u vk w will also have more a than b and thus cannot belong to L. Similarly if v counts more b than a then u vk w cannot belong to L for k big enough. So v must be of the form anbn. But then for k > 1 the word u vk w does not belong to L. Finally we are led to a contradiction and the language L is recognized by FA.

CONSTRUCTION OF LANGUAGES RECOGNIZED BY FA We would like now to characterize those languages which are recognized by FA. To address this question we give a series of four propositions and one theorem. Each of these four propositions provide a mechanism (or rule) to build languages recognized by FA. Theorem 3 states that these rules allow us to build all possible languages recognized by FAs.

• Proposition 4 expresses the fact that languages consisting of a single word consisting itself of a single letter are recognized by FA. This statement is an obvious consequence of the previous Proposition 1 and is illustrated by Figure 12.
• Proposition 5 expresses the fact that the union of two languages recognized by FA is itself recognized by FA. This statement is also trivial and was illustrated by Figure 4.
• Proposition 6 expresses the fact that the concatenation (or product) of two languages recognized by FA is also recognized by FA. This statement is also not difficult to prove and we have already used this result implicitely with most examples.
• Proposition 7 expresses the fact that the star of a language recognized by FA is also recognized by FA. The star (or Kleene closure) of a language is defined and illustrated below.

Proposition 4   For every x     the language L = {x} consisting of the single word w = x is recognized by FA.

To illustrate Proposition 5, 6 and 7 we will use the concept of a normalized FA.

NORMALIZED FINITE AUTOMATA A finite automaton = (, S, I, F,) is normalized if it satisfies to the following properties

• has only one initial state, say s,
• has only one final (= accepting) state, say f,
• no transition leads to the initial state s, or more formally, for every t S and every x    {} we have s (t, x),
• no transition leaves from the final state f, or more formally, for every x    {} we have (f, x) = .
Normalized FA are generally depicted as shown on Figure 13.

THE UNION OF TWO LANGUAGES RECOGNIZED BY FA is a language recognized by FA. Proposition 5 formulates this statement with DFAs and Figure 15 illustrates it with normalized FAs.

Proposition 5   Let L1 and L2 be two languages over recognized by the DFAs 1 = (, S1, s1, F1,) and 2 = (, S2, s2, F2,) respectively. Let us assume that S1 S2 = . (If this is not the case then we can rename the states of 2.)

Then, the language L1 L2 is recognized by the NFA

 1+2 = (, S1 S2,{s1, s2}, F1 F2,) (5)

where the transition function is defined as follows for every x and for every s S1 S2

 (s, x)  = (6)

THE PRODUCT OF TWO LANGUAGES RECOGNIZED BY FA is a language recognized by FA. Proposition 6 formulates this statement with DFAs and Figure 16 illustrates it with normalized FAs.

Proposition 6   Let L1 and L2 be two languages over recognized by the DFAs 1 = (, S1, s1, F1,) and 2 = (, S2, s2, F2,) respectively. Again, let us assume that S1 S2 = .

We denote by L1L2 (or L1.L2) the language consisting of all words of the form w1w2 (concatenation of w1 and w2) where w1 and w2 belong to L1 and L2 respectively. This language is called the PRODUCT LANGUAGE of L1 by L2

Then, the language L1L2 is recognized by the NFA

 1.2 = (, S1 S2,{s1}, F2,) (7)

where the transition function is defined as follows for every x {} and for every s S1 S2

 (s, x)  = (8)

KLEENE CLOSURE OF A LANGUAGE. Let be an alphabet and let w be a word over . Let n be a non-negative integer. First we define the n-th power of w, denoted by wn, as follows:

 wn  = (9)

Let L be a language over . Now we define the n-th power of L, denoted by Ln, as follows:

 Ln  = (10)

Then we define the star (or Kleene closure) of L, denoted by L * , as the union of all powers of L, that is

 L *   =   Ln. (11)

Therefore, a word w over belongs to L * if
• either w is the empty word
• or there exists a positive integer n and a word u such that w = un.
We give now some useful formulas. Let L and M be two languages over . Observe that
• is the set of all words over .
• Let L+ be the union of all Ln with n > 0. Then L+ = L.L * .
• If L M then L * M * .
• For every non-negative integer n we ahve L * n = L * .
• L * * = L * .
• (L * + M * ) * = (L * .M * ) * .
• (L * + M * ) * = (L + M) * .

THE KLEENE CLOSURE OF A LANGUAGE RECOGNIZED BY FA is a language recognized by FA. Proposition 7 formulates this statement with DFAs and Figure 18 illustrates it with normalized FAs.

Proposition 7   Let L be a language over recognized by the DFA = (, S, s0, F,).

Then L * is recognized by the NFA

 * = (, S, s, F,) (12)

where the transition function is defined as follows for every x {} and for every s S

 (s, x)  = (13)

KLEENE THEOREM. We denote by R0() the set of all languages of the form {x} for x    {}. Then for a positive integer n we define Rn() as the set of languages over consisting of

1. the languages of Rn-1(),
2. the languages of the form L + M for L, M Rn-1(),
3. the languages of the form L.M for L, M Rn-1(),
4. the languages of the form L * for L Rn-1().
Obviously, for every non-negative integer n, every language member of Rn() is recognized by FA. The theorem of Kleene studies the opposite direction.

Theorem 3 (Kleene)   Let L be a language recognized by FA. Then there exists a non-negative integer n such that L belongs to Rn().

This suggests the introduction of the notion of a regular expression.

Next: Regular expressions Up: Finite Automata Previous: Finite Automata.
Marc Moreno Maza
2004-12-02