Regular Expressions Syntax

The syntax of a regular expression always starts with regular expression <name> =. After that, the actual regular expression is written. Regular expressions are constructed from the following primitives and operators (see the reader for their explanations):

  • the empty set, which is denoted by \o;
  • the empty word, which is denoted by \e;
  • a symbol from the alphabet is either written by a single upper or lower case letter, such as a or Z, or an arbitrary sequence of characters, placed between single or double quotes, e.g., '.' or "'";
  • the union of two or more expressions is written by placing the + symbol between them;
  • the concatenation of two or more expressions is written by placing the . symbol between them;
  • the Kleene closure of a regular expression is written with the * symbol behind it.
  • for ω-regular expressions, the indefinite repetition of a regular expression is written with the ** symbols behind it.
  • parentheses can be used to group expressions, e.g., (a+b)**; without parentheses, the order of priority of operators is: Kleene star. ω repetition, concatenation, union.

where clauses can be used to introduce short hand to avoid a lot of repetition of similar subexpressions. It is used as in the following example.

regular expression number = @Digit . @Digit* . '.' . @Digit . @Digit*
where Digit = '0' + '1' +'2' +'3' +'4' +'5' +'6' +'7' +'8' +'9'

where is added after the expression, followed by a list of definitions of the form <name> = <regular expression>@ followed by the name of a definitions refers to the regular expression of the definition. The regular expressions can in turn use references to their definitions, but definitions cannot be cyclic.