Next: Non-deterministic Finite Automata Up: Finite Automata Previous: Finite Automata

## Deterministic Finite Automata

ALPHABET, STRING, LANGUAGE. We call an alphabet any finite set of symbols. Let be an alphabet. A string or word over is any finite sequence of symbols from . The empty string is denoted by . A language over is any set of strings over . Note that the above definitions do not ascribe any meaning to the strings of the language.

DETERMINISTIC FINITE AUTOMATA. A deterministic finite automaton (DFA) consists of five things:

• an input alphabet ,
• a finite set S whose elements are called states,
• a distinguished state  s0 S, called starting state,
• a set F S of distinguished states, called accepting states (or final states),
• a partial function from S× to S called the transition function (hence for every state-symbol couple either maps it to a symbol or is not defined for this couple).
It is convenient
• to say let = (, S, s0, F,) be a DFA'' for let be a DFA with alphabet , set of states S, starting state s0, set of accepting states F and transition function .
• to represent a DFA by a transition diagram using the rules shown on Figure 1.

LANGUAGE RECOGNIZED BY A DFA. A DFA accepts an input string if when beginning the computation in the starting state, after reading the entire string, the automaton is in an accepting state.

The set of the strings recognized by a DFA is called the language recognized by the DFA and will be denoted by ().

TRANSITION TABLE. A straightforward way to implement a DFA is to represent the transition function as a transition table. This table has a row for each state s and a column for each input symbol a . The intersection of row s and column a contains (s, a). Figure 3 shows the transition diagram and the transition table of an automaton.

A DFA can be easily converted to a program to look for tokens specified by it.

COMPLETE DFA. A DFA = (, S, s0, F,) is said complete if for every s S and for every a the transition (s, a) is defined.

The automaton of Figure 3 is complete whereas none of the automata of Figure 2 are.

As a model for a computer program, it is desirable for a DFA to be complete. To transform a non-complete DFA = (, S, s0, F,) into a complete DFA one needs only

1. to add a new state to S, say E (for error) and then
2. for every (s, a) S× such that was not defined at this couple, define (s, a) = E.

DFA RECOGNIZING THE UNION OF SEVERAL LANGUAGES. Each of the four DFA in Figure 2 recognizes a simple language over the alphabet

Let us denote these languages by L1, L2, L3, L4.
• L1 is the language reduced to the single world if,
• L2 is the language of the identifiers made from letters and digits and starting with a letter,
• L3 is the language of the integer numbers,
• L4 is the language of the floating point numbers.
It is desirable to build a DFA that would recognize the language L1   L2   L3   L4. In other words, one would like to combine the four transition diagrams in Figure 2 specifying the various patterns into a single transition diagram. This is a non-trivial task. Here are some difficulties.
• Consider a word wi which belongs to some Li and a word wj such that wiwj belongs to some Lj. For instance with wi = 123 and wj = .456 should the combined DFA accept wi or wiwj? The rule is to choose the longest initial substring that can match a pattern (think of integers). Hence wiwj is chosen.
• It may happen that wiwj (the longest match) belongs to several Lj. Hence we need to define some priorities. For instance the word if belongs to L1 and L2. However we want the word if to be a reserved word, not an identifier. So we have to replace L2 by L2 L1.
N.B. From now on and for simplicity, we do not consider capital letters any more in our example (Figure 2). In other words we reduce to the union of [a ... z], [0 ... 9] and {.}.

Next: Non-deterministic Finite Automata Up: Finite Automata Previous: Finite Automata
Marc Moreno Maza
2004-12-02