Wednesday, October 07, 2009

Dangerous Data Structures

We have been working on a particular component at work recently. The origins of this code go back to some of the very origins of the product. The code was written assuming a set of facts that no longer really holds true. My team has been patching it, trying to make it work with our new realities. We have come up with some pretty impressive hacks, but each fix has exposed new limitations of the code which we then need to try and patch.

What went wrong? Why was it so hard to fix the old code?

The problem was that the original code was using data-structures that didn't align with the new goals for the code. The core data-structure was of a rather neat design, but was also rather complicated. Both aspects deterred people from replacing it with something different. If the existing code was so complicated, the replacement (which must support more scenarios) must be more complicated, right?

I've seen this happen many times. Someone writes a neat bit of code, and later developers are nervous to replace it. Even though trying to retrofit new ideas on the old code is obviously painful, they would rather layer hack upon hack than rethink the original. This may not actually be true.

The worst cruft I have seen accumulate in 'legacy' code, happens when the original implementation used an inappropriate data-structure. Layer after layer of hack tries to pretend that the data is structured differently than it really is. This is one of those things that I think of when people talk about 'code smells'. This is one of the few times that a (partial) rewrite is in order.

It is critical that the data structures used are appropriate for the task at hand. Pick the wrong representation and now your code has to jump through hoops to do simple tasks. Pick the right data-structures and the code is clearer, and thus less buggy. With the right data-structure choice it is also easier to evolve the code, to add new features.

Just watch out for the day when those new features indicate that maybe you need to rethink your data-structures.

0 Comments:

Post a Comment

<< Home