next up previous
Next: Lexical Analysis Up: Finite Automata Previous: Languages recognized by finite automata

Regular expressions

REGULAR EXPRESSION. Let $ \Sigma$ be an alphabet such that none of the symbols (, ), |, * belongs to $ \Sigma$. A regular expression over $ \Sigma$

Remark 2   Unnecessary parentheses can be avoided in regular expressions if we adopt the following conventions:
1.
the unary operator * has the highest precedence and is left associative,
2.
concatenation has the second the highest precedence and is left associative,
3.
the operator | has the lowest precedence and is left associative.
With these conventions the regular expression (a)|((b)*(c)) is a| b*c.


REGULAR LANGUAGES. Let $ \Sigma$ be an alphabet (such that none of the symbols (, ), |, * belongs to it). A language $ \cal {L}$ over $ \Sigma$ is regular if there exists a regular expression r over $ \Sigma$ denoting $ \cal {L}$, that is if $ \cal {L}$ = L(r).

It follows from Kleene's theorem that a language is recognized by FA iff it is regular.


FROM A REGULAR EXPRESSION TO A FINITE AUTOMATON. We present now a method that for any regular expression r constructs a FA $ \cal {A}$(r) recognizing L(r).

The method is quite natural.
1.
We first parse r into its constituent subexpressions.
2.
We construct NFAs for each letter or $ \varepsilon$ in r by applying the above rules (R1) and (R2).
3.
Then guided by the syntaxic structure of the regular expression r, we combine the NFAs using rules (R3) and Figures 15, 16, 18.
This method is called Thompson's algorithm.

Remark 3   The number of states of the FA produced by the Thompson's algorithm is upper bounded by 2| r| where | r| is the number of symbols and operators in the regular expression r.


next up previous
Next: Lexical Analysis Up: Finite Automata Previous: Languages recognized by finite automata
Marc Moreno Maza
2004-12-02