Tuesday, February 1, 2011

Test all your states

I never even thought I would write a post about unit tests.

Why? Because it is such a dogmatic approach, invented by people far from practical experience. The dogmatic application of Test Driven Development would for sure have ruined my company and any single project that I worked on.

That was the disclaimer :)

But...

when you have a library that is actually easy to test because it has a relatively small and well defined interface, it might be a good idea to add unit tests. Even when you are in a hurry and just hack that shit away, a test driven approach might be useful (just notice my use of the word "might").

If you actually do so I'd advice to write tests for every state (do not care for 100% code coverage - it does not say anything and might even be misleading) that comes to your mind even if you think it is ridiculous. I know, depending on the input, this might be impractical, but you can at least try.

Let me give you an example from my current open source project the "Java Minimal Template Engine".

I remember I was thinking about an else inside a foreach while I already had a test for an if without the else part inside a foreach. Obviously two different states. First: simple if inside foreach, second if with else block inside foreach.

I thought, I know the code that evaluates that stuff and I know that it really does not make a difference if the else is inside a foreach or not. And I had a test for that else outside a foreach. And I was right. I skipped the test. Code worked.

Time passed.

I refactored the complete core of the template engine. I felt save to do so as I had a good bunch of unit tests.

After the refactoring, it did make a difference if that else was inside that foreach and - of course - I sill had no test for it. All tests passed after refactoring. But I made a really silly mistake in the refactored code and else inside foreach was broken. And it wasn't me who discovered that ;)

See the point? Just because you know the code and you know it does not make a difference now, does not mean this will remain true when time passes and the code changes. It might not even be you who changes that code.

Those tests are there to protect you, so either take them seriously or skip them completely. So: Test all your states.

Against any dogmatic approach

I am just against any dogmatic approach to anything.

Because dogmatic approaches do not work. If they did even AI might eventually succeed (I no longer believe in it).

Because it is arrogant and maybe ignorant to even suggest or advice a dogmatic approach to others.

This is a dogmatic approach? Then this is the only viable exception :)

Monday, January 24, 2011

Parsing very simple expressions in Java

You might say parsing such a simple expression

function(param1, param2)
(and the like) is easy even if you want escaping of special characters and quoted parts. At least that's what I said.

Obvious approaches include using standard Java classes
  1. StringTokenizer
  2. Scanner
  3. Regular Expressions (sometimes coming in disguise as direct String methods like String.split)
  4. Use a lexer generator like http://www.cs.princeton.edu/~appel/modern/java/JLex/ or http://jflex.de/
  5. Using a full parer generator like http://www.antlr.org/
  6. doing this by hand using low level String functions, e.g. indexOf and substring
  7. remember your C days and still do this by hand, but rather by iterating through the input character by character
What did I do?
  • Options (1) and (2) do not work as they are not powerful enough, especially when it comes to escaping. Besides they are slow
  • Option (3) might acutally work, but besides being slow, dealing with complex regular expressions and the Java API for it is something I can not recommend. Even after reading and groking this tutorial would you want to write and maintain regular expressions? Try it. I don't.
  • (4) and (5): Having the full burden of a generated lexer plus runtime libraries or even a full generated parser is hardly worth the effort considering the simplicity of the expressions, even though both approaches would solve the problem.
Which leaves me doing it by hand - which I wound up doing. First I persued option (6) which did the job, but resulted in code very hard to read and very hard to maintain.

I then turned to option (7) and things became both easy and fast. I even came up with a very simple utility class that not only handles the above case, but other similar inputs. This is a test case for that utility class

@Test
public void scanAndSplit() throws Exception {
String input = "function(param1, param2)";
List segments = MiniParser.defaultInstance().scan(input, "(", ")");
assertEquals(2, segments.size());

String functionName = segments.get(0);
String parameterString = segments.get(1);
assertEquals("function", functionName);
assertEquals("param1, param2", parameterString);
List parameters = MiniParser.trimmedInstance().split(parameterString, ',');
assertEquals(2, parameters.size());
assertEquals("param1", parameters.get(0));
assertEquals("param2", parameters.get(1));
}

Java byte code generation

To speed up template processing for our template engine I recently decided to optionally compile each template into byte code.

Looking for a library to help me I chose ASM, because it
  1. is small with no dependencies
  2. has been released lately
  3. has decent documentation
  4. has a tool (Eclipse plugin) that generates ASM API calls from existing byte code
Especially no 4 inspired me to do the following:
  1. Create a fistful of hand crafted Java classes that do the same as some selected templates
  2. Create ASM code that generates the same byte code as those Java classes by the tool described in no 4
  3. Create a recursive descent compiler that calls the generalized ASM code derived from step 2 that generates the desired byte code
  4. Load and cache the classes at runtime
It turned out to be a pretty viable approach and allowed me to finish my task without too much knowledge of Java byte code. And what I had to learn anyway was quite enlightening :)

Even though a lot of code remains to be dynamic, because I do not know what will be in the model at run time, the caliper micro benchmarks comparing uncompiled reference code against the compiled version shows a 2.5x - 10x speedup:


benchmark ns logarithmic runtime
SimpleExpressionReference 3367 ================
SimpleExpressionCompiled 963 ==
ComplexExpressionReference 6721 =======================
ComplexExpressionCompiled 1331 ======
If 7067 =======================
IfCompiled 805 =
Foreach 12796 =============================
ForeachCompiled 5275 ====================