Showing posts with label java parsing simple library jmte. Show all posts
Showing posts with label java parsing simple library jmte. Show all posts

Monday, January 24, 2011

Parsing very simple expressions in Java

You might say parsing such a simple expression

function(param1, param2)
(and the like) is easy even if you want escaping of special characters and quoted parts. At least that's what I said.

Obvious approaches include using standard Java classes
  1. StringTokenizer
  2. Scanner
  3. Regular Expressions (sometimes coming in disguise as direct String methods like String.split)
  4. Use a lexer generator like http://www.cs.princeton.edu/~appel/modern/java/JLex/ or http://jflex.de/
  5. Using a full parer generator like http://www.antlr.org/
  6. doing this by hand using low level String functions, e.g. indexOf and substring
  7. remember your C days and still do this by hand, but rather by iterating through the input character by character
What did I do?
  • Options (1) and (2) do not work as they are not powerful enough, especially when it comes to escaping. Besides they are slow
  • Option (3) might acutally work, but besides being slow, dealing with complex regular expressions and the Java API for it is something I can not recommend. Even after reading and groking this tutorial would you want to write and maintain regular expressions? Try it. I don't.
  • (4) and (5): Having the full burden of a generated lexer plus runtime libraries or even a full generated parser is hardly worth the effort considering the simplicity of the expressions, even though both approaches would solve the problem.
Which leaves me doing it by hand - which I wound up doing. First I persued option (6) which did the job, but resulted in code very hard to read and very hard to maintain.

I then turned to option (7) and things became both easy and fast. I even came up with a very simple utility class that not only handles the above case, but other similar inputs. This is a test case for that utility class

@Test
public void scanAndSplit() throws Exception {
String input = "function(param1, param2)";
List segments = MiniParser.defaultInstance().scan(input, "(", ")");
assertEquals(2, segments.size());

String functionName = segments.get(0);
String parameterString = segments.get(1);
assertEquals("function", functionName);
assertEquals("param1, param2", parameterString);
List parameters = MiniParser.trimmedInstance().split(parameterString, ',');
assertEquals(2, parameters.size());
assertEquals("param1", parameters.get(0));
assertEquals("param2", parameters.get(1));
}