Wednesday, November 22, 2006

XQuery... worse late than never

I just noticed, XSLT 2.0, XML Query and XPath 2.0 Are Proposed Recommendations.  Since leaving MS, I've stopped tracking these.  It baffles me that XQuery is only just now 'shipping'.  This + the-mess-that-is-XSD spell the doom of XML.  This isn't just doom, like the Outlook spell-checker would try an correct the DOM to; this is real doom.  These standards to too complicated and too late.

I don't deny that big companies (and some very determined individuals) will implement them and people will use them.  Lots of people will likely be putting XQuery on their resume in 5 years.  What that will actually mean is that they used the 20% of the spec that SQL-Server/Oracle/DB2 implement.  It is like SQL but so very much worse.  SQL is complicated and isn't really standardized across RDBMS engines, but you can still learn the basics in an hour and be writing basic queries right away.  I worry that XQuery is so complicated that it takes days to comprehend.  That means that you may be able to write a basic query in 1hr, but there will be all sorts of edge cases you aren't aware of could actually impact your query.

I honestly think we would be better without XQuery.  Let the vendors think for themselves and see what customers actually use.  XQuery is a standard looking for a use, which is backward and guaranteed to produce a problematic result.

XSLT/XPath 2.0 is a harder one... There are a couple things that XSLT 2.0 adds that were desperately needed vs XSLT 1.0.  But I've managed a team implementing a commercial quality XSLT 1.0 implementation and that was a huge amount of work.  XSLT 2.0 is at least 4x as much work.  That is terrifying.  Why not just 'fix' XSLT 1.0?  It would be dramatically less work, and provide 80% of the gains, at 10% the cost.

The type support that XPath 2.0 and XQuery provide carries a huge cost.  Javascript/Perl/Python/Php/Ruby have demonstrated that you don't need types.  Maybe XPath 1.0 had issues with type support, but XML and type-lessness are such a natural fit.  I've played around a bit with E4X and find it very natural.  Microsoft's X-Linq is also about leaving XML type-less.

Types in XML is really about mapping XML to traditional relational models.  This should be solved by either improving the primitives provided by the database, or just having simple defaults for the obvious cases, and making everything else require explicit conversion.

I've found that I am using XSLT, less and less.  Depending on my needs, I either use DOM + xml-writer, or I'm using E4X.  XSLT is definitely the choice for document-y transforms (the priority rules and apply-templates is perfect for this), but I find it annoying to author (almost as bad as XSD). and anything more complicated than a page or so is write-once code.

Back to the XQuery pummeling... At this point, I think XQuery is actually a bad thing.  It is complicated enough that most open source implementations will be bad or slow, which means that Saxon will remain the only real implementation.  This means that people will continue to struggle with awkward APIs (the DOM) and in the end find their experience with XML less than appealing.  I think that a simpler query language, that maps better to existing common languages (java/ruby/etc) and is simpler to implement would enable a much large class of user to benefit from XML.


Blogger RichB said...

If XSLT2 is such a huge, phenomenal amount of work, then how come Saxon is a quality XSLT2 processor, written more or less by 1 person?

Microsoft is not afraid to buy in 3rd party XML processors and ship them in the .net Framework (see the new XmlTextReader in .Net 2). They could easily buy another XSLT2 processor and integrate that.

1:15 AM  

Post a Comment

<< Home