Parser 2

From DaveWiki

Jump to: navigation, search

Before we start actually writing the expression parser we need at least a rough idea what our expression format is going to look like. Some examples of expressions we would like to use are:

Result = 1.2 * 5.61 + 7.109 / 8.0 - 91.0 ^ 0.5
Result = (1 - 7.2) / -2.3 + PI/2
Result = DataChannel1 + DataChannel2
Result = cos(10.0) * Amplitude
Result = atan2(45.0, Angle1)

Also keep in mind we are going to operating on multiple rows of data so the third example above when implemented would really look more like:

for (i = 0; i < Length; ++i)
{
   Result[i] = DataChannel1[i] + DataChannel2[i]
}


Base Format

The base format for our expression will simply be a single line of the form:

  <ResultID> = <Expression>

We won't bother with trying to parse multiple expressions at once although it would be a relatively easy thing to add later if we wanted. All white space will be ignored in the expression(which for now includes newlines and carriage returns).


Data Types

Since all the data we will be working on is floating point we will keep it simple and make all values and identifiers floating point as well.


Operators

We will use the following operators:

  • + -- Addition
  • - -- Subtraction and unary negation
  • ~ -- Unary negation
  • * -- Multiplication
  • / -- Division
  • ^ -- Exponent
  • ( ) -- Brackets

Note that the subtraction and negation operators are mentioned separately because in the parser they are handled differently (0 - 1 is parsed differently than -1 even though the ultimate result is the same).


Functions

Functions will have the format:

  <FunctionName> ( [Parameter]? [, Parameter]* )

Simply a function name with 0 or more parameters inside brackets. Note that we will not actually be defining functions within the expression language itself. All functions will merely be defined and included externally.


Language Grammar

Character Classes

Character classes are used by the tokenizer to turn a character stream into a token stream. See Parser_3 where we actually implement the following character classes.

    CC_DIGIT        = [0..9]
    CC_ALPHA        = [a..z] [A..Z]
    CC_END          = NUL
    CC_WHITESPACE   = space \t \r \n \v
    CC_IDSTART      = CC_ALPHA
    CC_ID           = CC_IDSTART CC_DIGIT
    CC_LEFTBRACKET  = (
    CC_RIGHTBRACKET = )
    CC_EQUAL        = =
    CC_ADD          = + 
    CC_SUBTRACT     = -
    CC_NEGATE       = ~
    CC_MULTIPLY     = *
    CC_DIVIDE       = /
    CC_MODULUS      = %
    CC_EXPONENT     = ^
    CC_COMMA        = ,
    CC_DECIMAL      = .

Tokens

Tokens are created by the tokenizer using the character class definitions and input into the parser. See Parser_3 where we actually implement a tokenizer.

    Unknown      = Default for any unknown characters not matching below
    Number       = [CC_DIGIT]+ [CC_DECIMAL [CC_DIGIT]* ]? [e|E [CC_ADD|CC_SUBTRACT]? [CC_DIGIT]+ ]?
    LeftBracket  = CC_LEFTBRACKET
    RightBracket = CC_RIGHTBRACKET
    Add          = CC_ADD
    Subtract     = CC_SUBTRACT
    Negate       = CC_NEGATE
    Multiply     = CC_MULTIPLY
    Divide       = CC_DIVIDE
    Modulus      = CC_MODULUS
    Exponent     = CC_EXPONENT
    Equal        = CC_EQUAL
    Identifier   = [CC_IDSTART] [CC_ID]*
    Function     = [ Look up identifier in list of predefined functions ]
    Comma        = CC_COMMA
    End          = CC_END

Our tokens have a pretty simple definition only because our language itself is very simple.

Parser Grammar

The following grammar is used by the compiler to turn the stream of tokens into a valid parse tree. Terminals are in bold and non-terminals (token types) are in italics. Creating a formal grammar like this is not required but it makes it very easy to create the compiler by simply following each rule.

    Start            => identifier = Expression  end
    Expression       => SimpleExpression Expression
    Expression       => ( Expression )
    SimpleExpression => Term
    SimpleExpression => sign Term
    SimpleExpression => SimpleExp addop Term
    Term           => Factor
    Term           => Term multop Factor
    Factor         => identifier
    Factor         => float
    Factor         => ( SimpleExpression )
    Factor         => FunctionCall
    FunctionCall   => function ( Parameters )
    Parameters     => Parameter , Parameters
    Parameters     => Parameter
    Parameters     => empty


Next: Parser_3

Personal tools