My Octopress Blog

A blogging framework for hackers.

Operator Precedence in Treetop

I have been using Treetop to implement an small expression langauge, and I recently ran into the problem of operator precedence and associativity. I had to deal with infix operators once before using JavaCC, when I was working on Viento, but I didn’t bother to implement operator precedence correctly, because it wasn’t high priority, and I had no idea how to go about it.

In a recursive descent parser, the most naïve way to implement infix operators is to just put an expression on either side of an operator, and hope for the best.

rule infix_operation
  expression operator expression
end

Associativity

The problem with the is left-recursion, which is discussed on the Treetop website. Basically, since an infix operation is an expression, referencing expression as the first thing in the infix definition results in infinite recursion. The easiest way to fix this is to put something more specific on the left side of the definition.

rule infix_operation
  primary operator expression
end

This basically works. Primary represents any kind of single value, such as a literal, a variable, or possibly a parenthesized expression. Since infix_operation is an expression, they will recursively chain together. However, there are two problems with this approach. First, it doesn’t deal with operator precedence in any way. Second, it ends up being right-associative.

Associativity describes the order that operations of the same precedence level are executed. An operator that posesses the “associative” property is an operator for which associativity does not matter. E.g.

(1 + 2) + 3 = 1 + (2 + 3)

However

(3 - 2) - 1 ≠ 3 - (2 - 1)

Subtraction is an example of an operation which does not have the associative property. Because of this, it matters which direction we evaluate a string of subtraction operations. Subtraction is normally evaluated left to right, so that 3 - 2 - 1 = 0. However, our implementation processes right to left, so that 3 - 2 - 1 = 2. Most operations should be processed left to right. Exponentiation is an example of an operation that is processed right to left (at least in Ruby).

Operator Precedence

On the Treetop website, there are examples of given of implementing operator precedence through a hierarchy of nonterminals (rules). These examples handle precedence, but they are still right-associative, and they require you to have a separate rule for every precedence level. E.g.

rule multitive
  additive [*/] multitive / additive
end

rule additive
  primary [+-] additive / primary
end

The / operator in Treetop denotes a branch. I believe the reason they chain the lower-precedence operation with each higher-precedence operation is so that you don’t have to enumerate them all in precedence order at the top level.

I originally imitated this pattern in my expression language, but I didn’t like the way I had to grow the parser to add more operators, and I couldn’t figure out how to modify it to have correct associativity.

Precedence Tables

I had never used an operator precedence table before, but I had seen them used in tools such as racc and yacc. I decided that it was time to learn how they worked, and implement one using Treetop.

It turns out that there is a linear-time algorithm invented by Edsger Dijkstra called the Shunting Yard Algorithm. It takes a list of operators and operands in the order they appear, and returns a list of the same operators and operands in Reverse Polish Notation based on the precedence and associativity of the operators. RPN is a stack-based notation where the order of operations is explicit, without the need for parenteses. Shunting Yard can also handle prefix operators and parentheses, but my implementation doesn’t include those because I handle them elsewhere in the grammar.

    def shunting_yard(input)
  returning [] do |rpn|
    operator_stack = []
    input.each do |object|
      if object.operator?
        op1 = object
        rpn << operator_stack.pop while (op2 = operator_stack.last) && (op1.left_associative? ? op1.precedence <= op2.precedence : op1.precedence < op2.precedence)
        operator_stack << op1
      else
        rpn << object
      end
    end
    rpn << operator_stack.pop until operator_stack.empty?
  end
end
  

I then process the RPN output.

    def rpn(input)
  results = []
  input.each do |object|
    if object.operator?
      r, l = results.pop, results.pop
      results << object.apply(l, r)
    else
      results << object
    end
  end
  results.first
end
  

I use the following rules in the grammar to construct a list of operands and operators that are appropriate to pass into shunting_yard.

    rule infix_operation
  lhs:infix_operation_chain rhs:primary {
    def list
      lhs.list + [rhs]
    end
  }
end

rule infix_operation_chain
  (primary operator)+ {
    def list
      elements.map {|e| [e.primary, e.operator] }.flatten
    end
  }
end
  

This is a simplified version of the implementation (it doesn’t allow for whitespace, for instance), but it shows the basic solution. You could probably combine the two rules to simplify it further, but I wanted to show the basic shape of my solution.

To build the actual precedence table, I created the following Module. The methods #precedence and #left_associative? you see in the implementation of shunting_yard depend on the lookup method on PrecedenceTable.

    module PrecedenceTable
  Operator = Struct.new(:precedence, :associativity)

  def self.lookup(operator)
    @operators[operator]
  end

  def self.op(associativity, *operators)
    @precedence ||= 0
    @operators ||= {}
    operators.each do |operator|
      @operators[operator] = Operator.new(@precedence, associativity)
    end
    @precedence += 1
  end

  # operator precedence, low to high
  op :left, '||'
  op :left, '&&'
  op :none, '==', '!='
  op :left, '<', '<=', '>', '>='
  op :left, '+', '-'
  op :left, '*', '/'
  op :right, '^'
end
  

Javascript

One of the other interesting things about the expression language I was working on is that it can compile itself to Javascript for execution on the client. At first, I was just writing out infix operations verbatim, relying on the assumption that Javascript has the same precedence rules as our language. This broke down when we wanted to add the ^ operator. In Javascript, exponentiation is accomplished by calling Math.pow(x, y) rather than by an infix operator.

In order to make this work, we had to define the order of operations explicitly in our Javascript output. To do this, we refactored our implementation of the RPN algorithm to take two lambdas, which tell it what to do with values and operators.

    def rpn(input, operator_lambda, evaluate_lambda)
  results = []
  input.each do |object|
    if object.operator?
      r, l = results.pop, results.pop
      results << operator_lambda.call(object, l, r)
    else
      results << evaluate_lambda.call(object)
    end
  end
  results.first
end

def to_js(input)
  rpn(shunting_yard(input),
      lambda { |op,l,r| op == '^' ? "Math.pow(#{l},#{r})" : "(#{l} #{op} #{r})" }
      lambda { |operand| operand.to_js }
  )
end
  

I included this last part even though I don’t expect anyone to ever have the same problem, because I thought it was really interesting that the RPN algorithm can be so easily modified to generate code rather than simply evaluate numbers. In this case the results stack contains strings rather than numbers.