next up previous
Next: How the input is matched Up: LEX Previous: Introduction

LEX regular expressions

LEX REGULAR EXPRESSIONS. A LEX regular expression is a word made of

Moreover


A CHARACTER CLASS is a class of characters specified using the operator pair [ ]. The expression

[ab]
matches the string a or b.

Within square brackets most operators are ignored except the three special characters \ - ^ are which used as follows

(a)
the escape character \ as above,
(b)
the minus character - which is used for ranges like in
digit       [0-9]
(c)
the hat character ^ as first character after the opening square bracket, it is used for complemented matches like in
NOTabc      [^abc]


OPTIONAL EXPRESSIONS. The ? operator indicates an optional element of an expression. For instance

ab?c
matches either ac or abc.


REPEATED EXPRESSIONS. Repetitions of patterns are indicated by the operators * and +.

Hence we can recognize identifiers in a typical computer language with
[A-Za-z][A-Za-z0-9]*
Repetitions can also be obtained with the pair operator {}.


ALTERNATING. The operator | indicates alternation. For instance

(ab|cd)
matches the language consisting of both words ab and cd.


GROUPING. Parentheses are used for grouping (when not clear). For instance

(ab|cd+)?(ef)*
denotes the language of the words that are either empty or Another example: an expression specifying a real number
-?(([0-9]+)|([0-9]*\.[0-9]+)([eE][-+]?[0-9]+)?)
where \. denotes a literal period.


CONTEXT SENSITIVITY. LEX provides some support for contextual grammatical rules.



next up previous
Next: How the input is matched Up: LEX Previous: Introduction
Marc Moreno Maza
2004-12-02