Monday, January 24, 2011

Parsing very simple expressions in Java

You might say parsing such a simple expression

function(param1, param2)
(and the like) is easy even if you want escaping of special characters and quoted parts. At least that's what I said.

Obvious approaches include using standard Java classes
  1. StringTokenizer
  2. Scanner
  3. Regular Expressions (sometimes coming in disguise as direct String methods like String.split)
  4. Use a lexer generator like http://www.cs.princeton.edu/~appel/modern/java/JLex/ or http://jflex.de/
  5. Using a full parer generator like http://www.antlr.org/
  6. doing this by hand using low level String functions, e.g. indexOf and substring
  7. remember your C days and still do this by hand, but rather by iterating through the input character by character
What did I do?
  • Options (1) and (2) do not work as they are not powerful enough, especially when it comes to escaping. Besides they are slow
  • Option (3) might acutally work, but besides being slow, dealing with complex regular expressions and the Java API for it is something I can not recommend. Even after reading and groking this tutorial would you want to write and maintain regular expressions? Try it. I don't.
  • (4) and (5): Having the full burden of a generated lexer plus runtime libraries or even a full generated parser is hardly worth the effort considering the simplicity of the expressions, even though both approaches would solve the problem.
Which leaves me doing it by hand - which I wound up doing. First I persued option (6) which did the job, but resulted in code very hard to read and very hard to maintain.

I then turned to option (7) and things became both easy and fast. I even came up with a very simple utility class that not only handles the above case, but other similar inputs. This is a test case for that utility class

@Test
public void scanAndSplit() throws Exception {
String input = "function(param1, param2)";
List segments = MiniParser.defaultInstance().scan(input, "(", ")");
assertEquals(2, segments.size());

String functionName = segments.get(0);
String parameterString = segments.get(1);
assertEquals("function", functionName);
assertEquals("param1, param2", parameterString);
List parameters = MiniParser.trimmedInstance().split(parameterString, ',');
assertEquals(2, parameters.size());
assertEquals("param1", parameters.get(0));
assertEquals("param2", parameters.get(1));
}

No comments: