next up previous
Next: LEX regular expressions Up: LEX Previous: LEX

Introduction

DESCRIPTION. LEX is a tool for generating programs that perform pattern-matching on text. More precisely LEX reads

for a description of a scanner to generate in form of a C source file, lex.yy.c, which defines a routine yylex().


FORMAT OF THE INPUT FILE The flex input file consists of three sections, separated by a line with just %% in it:

definitions
%%
rules
%%
user code

Example 2   The following flex input specifies a scanner which whenever it encounters the string "username" will replace it with the user's login name.
[moreno@iguanodon lex]$ more username.l
%%
username    printf( "%s", getlogin() );
[moreno@iguanodon lex]$ lex username.l 
[moreno@iguanodon lex]$ gcc lex.yy.c  -ll -o username.exe
[moreno@iguanodon lex]$  ./username.exe 
username
moreno
user name
user name
By default, any text not matched by a flex scanner is copied to the output.

Example 3   This scanner counts the number of characters and the number of lines in its input (it produces no output other than the final report on the counts). The first line declares two globals, num_lines and num_chars, which are accessible both inside yylex() and in the main() routine declared after the second %%. There are two rules. One which matches a newline and increments both the line count and the character count. The other one which matches any character other than a newline (indicated by the . regular expression).
[moreno@iguanodon lex]$ more count.l
        int num_lines = 0, num_chars = 0;

%%
\n      ++num_lines; ++num_chars;
.       ++num_chars;

%%
main()
     {
     yylex();
     printf( "# of lines = %d, # of chars = %d\n",
            num_lines, num_chars );
}
[moreno@iguanodon lex]$ flex count.l 
[moreno@iguanodon lex]$ gcc lex.yy.c -ll -o count.out
[moreno@iguanodon lex]$ ./count.out < count.l
# of lines = 13, # of chars = 222

Example 4  

[moreno@iguanodon lex]$ more trivialLanguage.l 
%{
/* need this for the call to atof() below */
#include <stdlib.h>
%}

DIGIT    [0-9]
ID       [a-zA-z][a-zA-z0-9]*
INT      {DIGIT}+
REAL     {DIGIT}*"."{DIGIT}+  


%%

{INT}       {
            printf( "An integer: %s (%d)\n", yytext,
                    atoi( yytext ) );
            }

{REAL}      {
            printf( "A float: %s (%g)\n", yytext,
                    atof( yytext ) );
            }

if          {printf( "The keyword: %s\n", yytext );
            }

{ID}        printf( "An identifier: %s\n", yytext );

[ \t\n]+          /* eat up whitespace */

.           printf( "Unrecognized character: %s\n", yytext );


[moreno@iguanodon lex]$ flex trivialLanguage.l 
[moreno@iguanodon lex]$ gcc lex.yy.c -ll -o trivialLanguage.out
[moreno@iguanodon lex]$ ./trivialLanguage.out 
123.456 if compiler ok then x := -780000000000000000000000000000000000
A float: 123.456 (123.456)
The keyword: if
An identifier: compiler
An identifier: ok
An identifier: then
An identifier: x
Unrecognized character: :
Unrecognized character: =
Unrecognized character: -
An integer: 780000000000000000000000000000000000 (2147483647)


THE DEFINITION SECTION. Each line in this section has the general form

name expression
where Moreover anything global to lex.yy.c


TRANSLATION RULES SECTION. Each line in this section has the general form

expression action
where


USER CODE. In this section the user


next up previous
Next: LEX regular expressions Up: LEX Previous: LEX
Marc Moreno Maza
2004-12-02