An arithmetic expressions evaluator with Kotlin and ANTLR4

I've recently been working with Kotlin and ANTLR4 in my IC project and one of the tasks I had to do is writing a parser using these two technologies. I didn't found much information online besides the ANTLR book and documentation. So, after reading them, I decided to make a short tutorial to make things easier for me and my colleagues.

2020-07-07

In this post I'll be showing how to write a arithmetic expression evaluator using ANTLR4 as a parser generator and a visitor interface written in Kotlin to walk on the generated tree. I didn't use any build tool for this tutorial because it was supposed to be quite simple, but I intend to improve it later and show how things can be done using gradle.

1 - Install Kotlin

Follow the instructions at the Kotlin installation guide.

$ java -version
openjdk version "11.0.6" 2020-01-14
OpenJDK Runtime Environment (build 11.0.6+10)
OpenJDK 64-Bit Server VM (build 11.0.6+10, mixed mode)

$ kotlin -version
Kotlin version 1.3.72-release-468 (JRE 11.0.6+10)

$ kotlinc -version
info: kotlinc-jvm 1.3.72 (JRE 11.0.6+10)

2 - Install ANTLR4

Follow the instructions at the ANTLR website.

By now you should have the antlr4 alias configured as described.

$ antlr4
ANTLR Parser Generator  Version 4.8
...

3 - Set up a grammar file

After setting up our environment we need to write a grammar for our parser.

/** ./Expr.g4 */

/** Grammars always start with a grammar header (which must have the same name of the file). */
grammar Expr;

/** A rule called `prog` which will be our entry point
 *  This rule means that a program is a sequence of one or more `stat`s 
 */
prog:   stat+ ;

/** A `stat` rule may be each of the four alternatives (represented by `|`)
 *  The `#` in front of each alternative is a label, we need it so ANTLR generates a visitor function for each alternative (else it would generate one for the whole rule)
 */
stat:   ID '=' expr NEWLINE         # assign
    |   CLEAR NEWLINE               # clear
    |   expr NEWLINE                # printExpr
    |   NEWLINE                     # blank
    ;

/** The `expr` rule to recognize arithmetic expressions */
expr:   expr op=('*'|'/') expr      # MulDiv
    |   expr op=('+'|'-') expr      # AddSub
    |   INT                         # int
    |   ID                          # id
    |   '(' expr ')'                # parens
    ;

/** We also need to define some token names for the operator literals.
 *  That way we can reference token names as Java/Kotlin constants in the visitor.
 *  The same applies for the `op=` in the `expr` rule
 */
MUL :   '*' ;            // assigns token name to '*' used above in grammar
DIV :   '/' ;
ADD :   '+' ;
SUB :   '-' ;
CLEAR:  'clear' ;        // match clear keyword
ID  :   [a-zA-Z]+ ;      // match identifiers
INT :   [0-9]+ ;         // match integers
NEWLINE:'\r'? '\n' ;     // return newlines to parser (is end-statement signal)
WS  :   [ \t]+ -> skip ; // toss out whitespace

4 - Tree walker

By using ANTLR we have basically three easy ways of interacting with the parsing tree generated by the parser:

Listeners
- ANTLR generates a Listener interface
- Can create listener functions for each grammar rule
- There's no need explicitly walk down the tree (ANTLR does it automatically)
Visitors:
- ANTLR generates an interface for the well-known Visitor pattern
- Can create visitor functions for each grammar rule or alternative
- You have to explicitly walk down the tree by calling visit() on the children nodes
Actions and Semantic Predicates
- Can embed arbitrary code within expression grammar.
- Can compute values or print things on-the-fly during the parsing (no need of a parsing tree).
- Influence on how the parser recognizes input phrases.
- Based on runtime information decide which rule to apply.

You can find more information about each of these in the ANTLR documentation. Here I'll keep with the visitor pattern.

// ./eval-visitor.kt

/** ExprBaseVisitor is the base class generated by ANTLR */
class EvalVisitor() : ExprBaseVisitor<Int>() {
  var memory: HashMap<String, Int> = HashMap<String, Int>()

/** These are the functions generated by each label as we defined in our grammar.
 *  Whe only need to override the functions we are interested in.
 */
  override fun visitAssign(ctx: ExprParser.AssignContext ): Int {
      val id = ctx.ID().getText()  
      val value = visit(ctx.expr())   
      memory.put(id, value)
      return value
  }

  /** expr NEWLINE */
  override fun visitPrintExpr(ctx: ExprParser.PrintExprContext): Int {
      val value = visit(ctx.expr()) 
      println(value)         
      return 0                          
  }

  /** INT */
  override fun visitInt(ctx: ExprParser.IntContext): Int {
      return ctx.INT().getText().toInt()
  }

  /** ID */
  override fun visitId(ctx: ExprParser.IdContext): Int {
      val id = ctx.ID().getText()
      val value = memory.get(id)

      return if ( value != null ) { value } else { 0 }
  }

  /** expr op=('*'|'/') expr */
  override fun visitMulDiv(ctx: ExprParser.MulDivContext): Int {
      val left = visit(ctx.expr(0))
      val right = visit(ctx.expr(1)) 

      if ( ctx.op.getType() == ExprParser.MUL ) {
        return left * right
      }
      return left / right 
  }

  /** expr op=('+'|'-') expr */
  override fun visitAddSub(ctx: ExprParser.AddSubContext): Int {
      val left = visit(ctx.expr(0))  
      val right = visit(ctx.expr(1)) 

      if ( ctx.op.getType() == ExprParser.ADD ) {
        return left + right
      }
      return left - right 
  }

  /** '(' expr ')' */
  override fun visitParens(ctx: ExprParser.ParensContext): Int {
      return visit(ctx.expr()) 
  }

  /** 'clear' */
  override fun visitClear(ctx: ExprParser.ClearContext): Int {
      memory.clear()
      return 0
  }
}

5 - Testing

ANTLR comes with a testing tool in the runtime library called TestRig. If you've followed the installation guide you have a grun alias to run and play with.

However, we'll write a simple kotlin application to invoke our parser and call our newly created visitor to execute the arithmetic expressions.

The code is pretty straightforward so I didn't put any comments. Basically we're just creating an InputStream from a file, if given, or using stdin. Then, this stream will be used to feed our lexer and, consequently, be consumed by our parser.

// ./calc.kt

package calc

import org.antlr.v4.runtime.*
import org.antlr.v4.runtime.tree.ParseTree

import java.io.FileInputStream
import java.io.InputStream

import ExprLexer
import ExprParser
import EvalVisitor

fun main(args: Array<String>){

  val inputStream = if (args.size > 0) { FileInputStream(args[0]) } else { System.`in` }

  val input = CharStreams.fromStream(inputStream)
  val lexer = ExprLexer(input)
  val tokens = CommonTokenStream(lexer)
  val parser = ExprParser(tokens)
  val tree = parser.prog()   //`prog` is the entry point we defined in our grammar

  val eval = EvalVisitor()
  eval.visit(tree)
}

6 - Running

Once we have everything set up, it's quite easy to compile and run our programs. Here's a sequence of commands to achieve it.

Generate the Lexer, Parser, and the Visitor interface with ANTLR4:

$ antlr4 -no-listener -visitor Expr.g4

Compile the generated java files:

$ javac Expr*.java

Compile our visitor and main program with kotlinc:

$ echo $CLASSPATH
.:/usr/local/lib/antlr-4.8-complete.jar:

$ kotlinc calc.kt eval-visitor.kt -cp $CLASSPATH

For the sake of testing I wrote a small input file (test.expr) so we can pass to our program.

193
a = 5
b = 6
a+b*2
clear
a+b

Run the main program with kotlin:

$ kotlin -cp $CLASSPATH calc.CalcKt test.expr
193
17
0

The first line is just a print expression. The second one is an evaluation of the expression a+b*2 with the previously defined identifiers. And the last is trying to evaluate a+b after cleaning the memory, which obviously is wrong as the identifiers are no more defined. For simplicity's sake we just evaluated it to a default value 0 (this behavior can be changed in our visitor).

7 - Conclusion

So that's it! Now we have a working (and simple) Arithmetic Expressions Evaluator.

Most of the information presented here is from The Definitive ANTLR 4 Reference with some adaptions for the Kotlin language. It's a really good book and you're more than encouraged to read it.

There are plenty of information in the Official Documentation as well, so fell free to use both.

Finally, it's probably worth mentioning that you may find the all of the code used in this tutorial in my github repository.

Hope you've enjoyed xD

flávio.