Thursday, April 06, 2006

The Trinity of DOM

Back when I was first working the Microsoft's implementation in MSXML, the spell checker used to try and 'correct' DOM to Doom. Our W3C DOM representative at the time (Rich Rollman) and I used to get a hearty chuckle out of that. We invested a great deal of effort into trying to build a usable implementation of the DOM, that was conformant with the public spec, but included extensions for things like Namespace and usability improvements (like the text property, or the selectNodes() method). Fast forward 7+ years.... XML-DEV currently has a lively thread about the DOM, and the below is my recent contribution. I thought some of you who may not monitor that mailing list might find it interesting, so I'm also publishing it here.

When thinking about the DOM, I think it is worth breaking the users into 3 groups:

  1. Browser (Javascript) HTML DOM
  2. Browser (Javascript) XML DOM
  3. Java/C++/C# XML DOM


The issues with the DOM for group (1) isn't the DOM api, it is incompatible browsers. If the browsers ever 'fixed' that (they won't.. it is always in someone's best interest to have features that the others don't have), then the DOM api might benefit from some tweaking, but the api is pretty heavily tailored to their usage and isn't really all that bad.

Group (2) users would definitely benefit from E4X, when it becomes standardized. No future google/flickr/myspace is going to succeed by requiring people to upgrade their browser, so while E4X might be a productive toy for a few lucky ones, it isn't going to be anything better than that until IE ships it and a few years have passed.

Group (3) users are the ones who really suffer unnecessarily. The DOM is a freaking pain to use and has been an hindrance to XML adoption. Worse, DOM's awkwardness has pushed developers to roll-their-own APIs. I see this as a negative because the majority of these that I have seen (and I got to see a lot while on the XML team an Microsoft) got basic XML-isms wrong. XML looks so effortlessly easy, but is full of hidden gotchas that take careful (sometimes repeated) reads of the spec to understand. (On a side note, I find it amusing that 8 years after the XML spec was released, I'm still relying on my SGML history to explain _why_ bits of the XML spec are the way they are.)

XLinq, XOM, etc. are a good thing. It really is to bad that dot-Net and Java both standardized on the DOM as their standard API. Microsoft, being Microsoft, extended the DOM with methods like selectNodes and selectSingleNode, and you know what? That is a good thing. The W3C DOm committee should look at some code written using selectNodes vs their official APIs. selectNodes is easy to use and actually encourages more robust code. I can't count how many times I've looked at customer code that would break if there was a Comment, PI or CDATA where it expected just PCDATA.

The ultimate test of an API is whether it enables non-specialized developers to get their task done, such that their code is reasonably robust. XML makes that damn hard. The XML DOM, per spec, makes it worse. I don't think that many of the alternatives really resolve that. E4X may be the best attempt yet. XLinq has some excellent qualities, but it is too closely tied to C#, and tries harder to be everything to everyone. XML is used in many different ways that place competing demands on the APIs. The best APIs pick a target set of scenarios and heavily prioritize those. That is hard to do in a committee and the DOM demonstrates that.

My rant done, I want to say that the DOM was a critical keystone in getting XML adopted as it was been. For all its flaws, the DOM set a common bar for XML in-memory stores. SAX did the same for xml parsing. I dislike the phrasing 'worse is better', but it definitely is true that 'shipping is better than perfection'. Nothing real is perfect. The fact that a number of people put the effort in to creating the XML DOM spec has been a factor in XML's success. For all it's flaws, more that likely, the world is a better place for having it here.

0 Comments:

Post a Comment

<< Home