The flex input file consists of three sections, separated by a line with just %% in it:
definitions %% rules %% user code
The definitions section contains declarations of simple name definitions to simplify the scanner specification, and declarations of start conditions.
Name definitions have the form:
name definition
The ``name'' is a word beginning with a letter or an underscore ('_') followed by zero or more letters, digits, '_', or '-' (dash). The definition is taken to begin at the first non-white-space character following the name and continuing to the end of the line. The definition can subsequently be referred to using ``name'', which will expand to ``(definition)''
For example,
DIGIT [0-9] INTEGER {DIGIT}+ ID [[:alnum:]._/]* STRING [[:alnum:][:print:]]*
This definitions are taken from Flex input file written for the Controller, the Generator and the Validator modules. DIGIT is a regular expression that matches a single digit. INTEGER matches all integer values, ID matches all strings of chars that consists of alphanumeric characters or '.', '_', or '/'.
The rules section of the flex input contains a series of rules of the form:
pattern action
Where the pattern must be unindented and the action must begin on the same line. The patterns in the input are written using an extended set of regular expressions. It could be one of the definitions, or any combination of them. There are also special patterns, they are described with details in [15,12].
Actions are just a C language code, which will be executed when the given patterns are matched.
Finally, the user code section is simply copied to lex.yy.c verbatim. It is used for companion routines which call or are called by the scanner. The presence of this section is optional; if it is missing, the second %% in the input file may be skipped, too.
When the generated scanner is run, it analyzes its input looking for strings which match any of its patterns. If it finds more than one match, it takes the one matching the most text (for trailing context rules, this includes the length of the trailing part, even though it will then be returned to the input). If it finds two or more matches of the same length, the rule listed first in the flex input file is chosen.
Once the match is determined, the text corresponding to the match (called the token) is made available in the global character pointer yytext, and its length in the global integer yyleng. The action corresponding to the matched pattern is then executed, and then the remaining input is scanned for another match.
The input file for Flex to generate a scanner for Controller, Generator and Validator is given in Appendix E.