Writing a mathematical expression evaluator in Java

Introduction Writing an expression evaluator, as simple as it may be, has always been a long standing challenge for me, mostly because it feels so simple, and yet, gets tricky very fast. Writing code to parse and evaluate a simple expression with few operands and/or operators with the same precedence, is simple. However, the generalization to higher complexities can be really tricky due to multiple factors: operator precedence, expression nesting and parenthesis. Inspired by the first chapter of what promises to be an amazing read of the book: "Engineering a Compiler", 2nd edition, and, taking only their simple definitions of a parser and a scanner, I managed to put together a small example that seems good enough to share. Devising a language to talk about mathematical expressions The first thing we need to define when writing something like this, is a language, that translates well into code, that allows us to think about the problem at a higher level of abstraction. We will say that an expression is composed of tokens, or, as I called it, TokenType s.

public enum TokenType  ADD, SUB, MUL, DIV, POW, LPAR, RPAR, VALUE; @Override public String toString()  switch (this.ordinal()) case 0: return "+"; case 1: return "-"; case 2: return "*"; case 3: return "/"; case 4: return "^"; case 5: return "("; case 6: return ")"; case 7: return this.name(); default: return "null"; > > public static TokenType fromString(String s) switch (s) case "+": return TokenType.ADD; case "-": return TokenType.SUB; case "*": return TokenType.MUL; case "/": return TokenType.DIV; case "^": return TokenType.POW; case "(": return TokenType.LPAR; case ")": return TokenType.RPAR; default: return TokenType.VALUE; > > > 

This is a simple Java enum with methods to write from/to Strings into Tokens and vice-versa. We immediately notice that the . for decimal values is encoded in the enum as being a "value" since it's not a special token. Obviously this can be improved further, but, seems good enough for now. So, from now on, when we talk about a mathematical expression, we will talk about it in terms of its TokenType s. Writing and understanding the Scanner The scanner is the piece in a compiler front-end that will read an expression and match its pieces to known tokens, so that the expression can have a real meaning in the context where we will be evaluating it. Starting with a simple expression like 12+5 , after a Scanner pass, we could get output along the lines of: (12, Token.NUM) , (+, Token.ADD), (5, Token.NUM) This will enable us to look at the expression and parse it, evaluate it, reduce it to a simpler form, etc. In the first Scanner pass, some things will need to be corrected, for instance, if we see the expression: (-, Token.SUB), (3, Token.NUM) , what we really want to have is: (-3, Token.NUM) . I chose to do this further down the pipeline, in the Parser stage. The code for the Scanner is as follows:

public class ScannedToken  private String expressionPiece; private TokenType type; public ScannedToken(String exp, TokenType type) this.expressionPiece = exp; this.type = type; > @Override public String toString() return "(Expr:"+ expressionPiece+ ", Token:"+ type+")"; > public TokenType type() return type; > public String expression() return expressionPiece; > > 

A ScannedToken is a token annotated with the tokenType, which we will use when performing an actual expression scan. The code for the Scanner is as follows:

 import java.text.DecimalFormat; import java.util.ArrayList; import java.util.List; import java.util.stream.Collectors; public class Scanner  private final String expression; public Scanner(String expr)  this.expression = expr; > public ListScannedToken> scan()  StringBuilder value = new StringBuilder(); ListScannedToken> scannedExpr = new ArrayList<>(); for (char c : expression.toCharArray())  TokenType type = TokenType.fromString(new String(new char[] c>)); if (!type.equals(TokenType.VALUE))  if (value.length() > 0)  //Add the full value TOKEN ScannedToken st = new ScannedToken(value.toString(), TokenType.VALUE); scannedExpr.add(st); > value = new StringBuilder(new String(new char[] c>)); ScannedToken st = new ScannedToken(value.toString(), type); scannedExpr.add(st); value = new StringBuilder(); > else  value.append(new String(new char[] c>)); > > if (value.length() > 0)  //Add the full value TOKEN ScannedToken st = new ScannedToken(value.toString(), TokenType.VALUE); scannedExpr.add(st); > return scannedExpr; > public double evaluate(ListScannedToken> tokenizedExpression)  if (tokenizedExpression.size() == 1)  return Double.parseDouble(tokenizedExpression.get(0).expression()); > //Eval order is PEMDAS - Parenthesis, exponents, multiply, divide, add, subtract ListScannedToken> simpleExpr = new ArrayList<>(); int idx = tokenizedExpression.stream() .map(ScannedToken::type) .collect(Collectors.toList()) .lastIndexOf(TokenType.LPAR); int matchingRPAR = -1; if (idx >= 0)  for (int i = idx + 1; i  tokenizedExpression.size(); i++)  ScannedToken curr = tokenizedExpression.get(i); if (curr.type() == TokenType.RPAR)  matchingRPAR = i; break; > else  simpleExpr.add(tokenizedExpression.get(i)); > > > else  simpleExpr.addAll(tokenizedExpression); return evaluateSimpleExpression(tokenizedExpression); > double value = evaluateSimpleExpression(simpleExpr); // System.out.println("val is " + value); ListScannedToken> partiallyEvaluatedExpression = new ArrayList<>(); for (int i = 0; i  idx; i++)  partiallyEvaluatedExpression.add(tokenizedExpression.get(i)); > partiallyEvaluatedExpression.add(new ScannedToken(Double.toString(value), TokenType.VALUE)); for (int i = matchingRPAR + 1; i  tokenizedExpression.size(); i++)  partiallyEvaluatedExpression.add(tokenizedExpression.get(i)); > // from idx find first ), extract, evaluate, replace, call recursively // System.out.println("Expr to eval indexes: " + idx + ", " + matchingRPAR); System.out.println(partiallyEvaluatedExpression); return evaluate(partiallyEvaluatedExpression); > //A simple expression won't contain parenthesis public double evaluateSimpleExpression(ListScannedToken> expression)  if (expression.size() == 1)  return Double.parseDouble(expression.get(0).expression()); > else  ListScannedToken> newExpression = new ArrayList<>(); int idx = expression.stream().map(ScannedToken::type).collect(Collectors.toList()).indexOf(TokenType.POW); if (idx != -1)  double base = Double.parseDouble(expression.get(idx - 1).expression()); double exp = Double.parseDouble(expression.get(idx + 1).expression()); DecimalFormat df = new DecimalFormat(".00"); double ans = Math.pow(base, exp); for (int i = 0; i  idx - 1; i++)  newExpression.add(expression.get(i)); > newExpression.add(new ScannedToken(ans + "", TokenType.VALUE)); for (int i = idx + 2; i  expression.size(); i++)  newExpression.add(expression.get(i)); > return evaluateSimpleExpression(newExpression); > else  int mulIdx = expression.stream() .map(ScannedToken::type) .collect(Collectors.toList()) .indexOf(TokenType.MUL); int divIdx = expression.stream() .map(ScannedToken::type) .collect(Collectors.toList()) .indexOf(TokenType.DIV); int computationIdx = (mulIdx >= 0 && divIdx >= 0) ? Math.min(mulIdx, divIdx) : Math.max(mulIdx, divIdx); if (computationIdx != -1)  double left = Double.parseDouble(expression.get(computationIdx - 1).expression()); double right = Double.parseDouble(expression.get(computationIdx + 1).expression()); DecimalFormat df = new DecimalFormat(".00"); double ans = computationIdx == mulIdx ? left * right : left / right * 1.0; for (int i = 0; i  computationIdx - 1; i++)  newExpression.add(expression.get(i)); > newExpression.add(new ScannedToken(ans + "", TokenType.VALUE)); for (int i = computationIdx + 2; i  expression.size(); i++)  newExpression.add(expression.get(i)); > return evaluateSimpleExpression(newExpression); > else  int addIdx = expression.stream() .map(e -> e.type()) .collect(Collectors.toList()) .indexOf(TokenType.ADD); int subIdx = expression.stream() .map(e -> e.type()) .collect(Collectors.toList()) .indexOf(TokenType.SUB); int computationIdx2 = (addIdx >= 0 && subIdx >= 0) ? Math.min(addIdx, subIdx) : Math.max(addIdx, subIdx); if (computationIdx2 != -1)  double left = Double.parseDouble(expression.get(computationIdx2 - 1).expression()); double right = Double.parseDouble(expression.get(computationIdx2 + 1).expression()); DecimalFormat df = new DecimalFormat(".00"); double ans = computationIdx2 == addIdx ? left + right : (left - right) * 1.0; for (int i = 0; i  computationIdx2 - 1; i++)  newExpression.add(expression.get(i)); > newExpression.add(new ScannedToken(ans + "", TokenType.VALUE)); for (int i = computationIdx2 + 2; i  expression.size(); i++)  newExpression.add(expression.get(i)); > return evaluateSimpleExpression(newExpression); > > > > return -1.0; > > 

This is what we do in the parser:

import java.util.ArrayList; import java.util.List; import java.util.stream.Collectors; public class Parser  private final ListScannedToken> expression; public Parser(ListScannedToken> expression)  this.expression = expression; > /* We need a triple in a sliding window fashion where middle is current token so we can contextualize what needs to be parsed into correct tokens */ public ListScannedToken> parse()  TokenType prev = null; TokenType curr = null; TokenType next = null; ListScannedToken> properlyParsedExpression = new ArrayList<>(); ListTokenType> types = expression.stream().map(ScannedToken::type).collect(Collectors.toList()); ListInteger> indexes = new ArrayList<>(); ListScannedToken> negativeValues = new ArrayList<>(); for (int i = 0; i  types.size() - 1; i++)  prev = i == 0 ? null : types.get(i - 1); curr = types.get(i); next = types.get(i + 1); if (prev == null && curr == TokenType.SUB && next == TokenType.VALUE)  ScannedToken negativeValue = new ScannedToken("" + (-1 * Double.parseDouble(expression.get(i + 1).expression())), TokenType.VALUE); System.out.println("new token at index " + i); indexes.add(i); negativeValues.add(negativeValue); > else if (prev == TokenType.LPAR && curr == TokenType.SUB && next == TokenType.VALUE)  ScannedToken negativeValue = new ScannedToken("" + (-1 * Double.parseDouble(expression.get(i + 1).expression())), TokenType.VALUE); System.out.println("new token at index " + i); indexes.add(i); negativeValues.add(negativeValue); > > int maxIterations = expression.size(); int i = 0; int j = 0; while (i  maxIterations)  if (indexes.contains(i) && j  negativeValues.size())  properlyParsedExpression.add(negativeValues.get(j)); j++; i++; > else  properlyParsedExpression.add(expression.get(i)); > i++; > System.out.println(properlyParsedExpression); return properlyParsedExpression; > > 

And, finally the Main function of our code, where we execute what we did:

import java.util.List; import java.io.BufferedReader; import java.io.IOException; import java.io.InputStreamReader; class Main  public static void main(String[] args) throws IOException System.out.println("Enter a mathematical expression"); //Enter data using BufferReader while(true) BufferedReader reader = new BufferedReader(new InputStreamReader(System.in)); // Reading data using readLine String name = reader.readLine(); Scanner sc = new Scanner(name); //(12*5)+1*(8.6-3*2^2)-34 //(12*5) //12*5 //12 //-5+12*(-3+2) //-12 //12 //15/(3+2) ListScannedToken> scanExp = sc.scan(); Parser parser = new Parser(scanExp); ListScannedToken> parsed = parser.parse(); scanExp.forEach(e->System.out.println(e)); System.out.println(sc.evaluate(parsed)); > > > 

So, we evaluate the parsed expression, with the corrected tokens, and we get the value we want.

Additional design considerations

There are many additional design considerations to take that can improve this:

Conclusion

This was a great exercise and I think it's a nice challenge that everyone can take on! It shows how something which is deceptively simple can actually be quite tricky and it's a humbling exercise!
As a recommended read from where I got the inspiration to write this, I recommend the 2nd edition of the book "Engineering a Compiler"