Thursday, October 26, 2006

My long road there: Binary XML

First an admission: I've been 'doing XML' for a long time, and was not an easy convert to 'Binary-XML'.  XML has many uses, and a curious history that is often at odds with it's actual usage.  XML serves many masters.   I've been using XML for markup, configuration, and quick'n'dirty data-stores for longer than I like to admit.  For most of those purposes, Binary XML really just isn't necessary.  In fact, I still don't think Binary XML is particularly advantageous.  The advantages of being able to open the data in you text editor of choice far outweighs any of the benefits of a Binary XML encoding.

Yet... I have come to understand that Binary XML is a necessity. While I was still at Microsoft I was tangentially involved with SQL Server's XML support.  One of the things I remember clearly was the motivation for a Binary-XML serialization: float to string conversion.  Conversion from a IEEE float/double to an accurate text representation is horrendously expensive.  While there are queries where you need to do this in the server, there is a huge advantage to offloading that to the client.

Since coming to work at AgileDelta I've been encountering another use-case that didn't really come up when I was at Microsoft: Constrained Bandwidth.  At the extreme, when all you have got is a 300 baud modem, XML's verbosity becomes a real problem.  This issue arises even in more common situations, very chatty web services for example.  A few quick web searches show that a number of people are struggling with the issue.

It is worth stopping to review why people want to use XML.  It isn't just because it is the cool thing to do.  XML has an array of tools and frameworks that make it easier to use XML than building your own custom language or custom format.  One of the big motivations to use XML is that you can depend on all these pre-built, pre-tested components to keep your focus on your primary task.  Just as using Java/C# makes development faster/cheaper because the developers don't need to worry about memory allocation woes, XML provides faster/easier custom data formats.

What happens when you want all that faster/easier-ness but some part of your pipeline is bandwidth constrained?  Or maybe you are struggling with a bottleneck such as the one I mentioned about SQL Server, where a significant amount of your precious CPU is being lost to XML serialization?  This is where Binary-XML can save your bacon.  I've watched some of our current customers implement solutions that leverage our Efficient XML to integrate limited bandwidth systems into an existing system with minimum of work, where the mere idea was unthinkable before.

Binary-XML isn't for everyone.  Ant should never use Binary-XML.  Nor would it really help XHTML.  But when bandwidth is a limiting factor (say... a smart-phone on a metered data-link?) it may just be the thing to save you a few bytes without the huge implementation/test cost of a custom binary protocol.

I've heard the arguement: to just wait a year or two, the bandwidth will come.  Rubish.  Bandwidth does not obey Moore's law.  Take cell-phones as an example.  We in the US are just now getting 3G data services!  It gets worse.  Cell bandwidth is like Cable-modem's.  Everyone on the block shares a fixed available connection, but with cell-phones, it is shared per tower.  As more people start using data services on their phones, the effective bandwidth will drop because the available bandwidth to the towers is limited.  Limited bandwidth is already a problem for some.  Nothing I've read indicates that this is likely to change anytime soon.

The W3C's Efficient XML Interchange group (of which my employer is a member) is evaluating a number of proposals (including one from my employer).  I have personally seen how our format can produce messages that compete in size with custom-designed formats with a fraction the effort, while preserving XML's flexibility and extensibility.  It will take a while, but an efficient Binary-XML will remove the need for many custom formats.  Having a standard with implementations integrated into the major platforms will open many new doors.

1 Comments:

Anonymous Anonymous said...

I have been struggling with this too lately, specifically "float to string conversion"

In addition to the performance, I'm concnered about accuracy/precision. As far as I know, IEEE-754 numbers can't be represented with 100% accuracy as strings using XSD's xs:double or xs:float data types.

Having the internal binary representation of IEEE-754 numbers would be of course great. I'm considering for regular XML, storing a hexified string version of the IEEE-754 binary value.

Is accuracy a concern? I was playing around with this:

http://babbage.cs.qc.edu/IEEE-754/Decimal.html

and there sure seems to be some discrepancy between the rounded and not rounded binary representations.

12:00 PM  

Post a Comment

<< Home