Monday, May 28, 2007

Sasquatch 2007

Back from Sasquatch. I only attended the first day, and my main motivation in being there was to see Bjork. Quick summary:
  • The Gorge is stunning. I'd never even imagined listening to such great music with a backdrop so fantastic. To my poor New Englander brain, it felt unreal.
  • Sarah Silverman was completely useless. IMHO, she shouldn't even get paid for her presence there Saturday.
  • Michael Showalter was entertaining and seemed much more comfortable with the MC role
  • Electrelane have potential... the live show left me wanting, but I could definitely see a good producer helping form their sound into a damn good pop album.
  • Viva Voce were intimate and beautiful on the side stage. Damn impressive to see the male half out there drumming while also playing bass to such lovely female vocals.
  • Citizen Cope and The Long Winters completely failed to impress. It wasn't that they were bad. I just like live shows that have edge and energy. They had neither.
  • MIA was well ... not there... apparently my friendly government found her form of music to be some sort of security risk.
  • Manu Chao rocked the house. This is what I want in a live show. Energy! Dance white boy, dance!
  • The Arcade Fire kept up the energy from Manu Chao, with their insane drummer, amazing vocals, and crazy musicians who seem to randomly swap instruments every other song.
  • Bjork was amazing. She performed some reworks of older songs that really blew me away. The laser show was simple and clean, in the best of ways.


Thursday, May 24, 2007

Binary XML not quite so evil?

I've recently been spending more time writing code that uses my company's Efficient XML (the basis for the forthcoming W3C Efficient XML Interchange format). I've never been one to claim the Binary XML will replace XML, rather the sweet spot where we target Efficient XML usage is places where the alternative is a custom binary protocol. I was reimplementing a tool that used Efficient XML to serialize some potentially large data structures. I was sure that I could do better and that using Efficient XML was overkill. (Yes... all programmers think they can do it better than the automagic tools.) So I sat down and implemented a custom binary format for the same information. I had the advantage of a working implementation that used Efficient XML, so the core implementation was pretty quick; a few hours. One of the goals for this is compactness of the result, and the data structures being written out had circular references and other things that ruled out any normal serialization tools I knew of. I did some quick tests on some really simple data, and it all looked good. Time to go home for the day.

The next morning, I picked it up and gave it some real data. Crash and burn. I spent the rest of the day tracking down bit alignment errors, and all sorts of small compatibility bugs between the reader and the writer. By the end of the day I had it working on most of our data. It still failed on a few cases, but they generated multi-megabyte output. I had no idea how to debug this. I added tracing, but the trace files were too large to load into memory! By this point I had already spent more idea on my custom format than it had taken to implement and debug the Efficient XML based code, and that code didn't have the benefit of existing, working code, when I wrote it!

AT this point it was at least good enough to evaluate the compactness of my custom format. My format was hand optimized, down to the bits. I expected to beat the Efficient XML encoded data hands-down. So I ran some tests. My custom encoding did beat Efficient XML for most samples... but not all of them! In fact, Efficient XML beat my hand coded format my 20% in one case! What was going on?!?

Well, I knew right away why Efficient XML was beating my code. I had skimped in one case to simplify the code. To achieve equivalent encoding, I was going to have to encode it the same way Efficient XML encodes such situations. The scenario is when you have a set of individually optional values. This is a pain to handle manually, and I have never seen a manual encoding that handles this optimally.

So what does this tell us? That Efficient XML can truely be as compact as a custom binary format! Since the format is specified by using XSD, there are a number of tools out there to help define and document the format. You can prototype the format in Text XML, and then switch to Efficient XML, once the bugs are mostly ironed out. Alternatively, you can manually decode the Efficient XML stream to Text XML for debugging purposes. I've found this invaluable.

When using Efficient XML (rather than a custom binary format) you are programming against standard XML APIs. There is a cleaner separation of the bit-encoding from the rest of the encoding/decoding logic. I have long argued that XML can be a good fit for configuration files, simply because it means there is less parser logic, and it is easier to user standard tools to process your config files. Much of the same benefits apply to Efficient XML, with the caveat that you need to either use APIs that understand Efficient XML, or translate from Efficient XML to Text XML. The import point is that you have all the options available with very little effort, all while getting many of the benefits of a custom binary format.

Efficient XML is no panacea. It is not a replacement for all binary formats, just as XML is not the be-all/end-all. Efficient XML is an excellent choice when XML would be a viable choice, except for it's verbosity. (Efficient XML is also faster than Text XML to generate and parse). I have also played with auto-generating custom parsers for Efficient XML with a specific grammar. These can be blazingly fast and yet still working with conformant Efficient XML.

Lots of people like to talk about why they think Binary XML is a bad idea: (a) (b) (c). Most arguments against Binary XML focus on 2 points:
  1. Text is good, Binary is bad
  2. XML is defined as a textual format. Anything else isn't XML.
(1) There are some good reasons to recommend Text formats. Any text editor can be used to edit the data. It is easier to debug. In packet traces and other debug logs, it is easier to extract and investigate. I definitely agree that Text is easier than Binary to debug and apply generic tools to. But! to compare Binary XML to a custom binary format is unfair. All you need is one conversion tool to convert any Binary XML to Text XML, and thus get all the benefits of text. In comparison, you would need a custom tool for every custom binary format. The extra effort means that the custom tool would likely never be written. With Binary XML, the tool is just a given. I have used this many times to great effect, for both Text XML and Efficient XML.

Summary: Text is better than Custom Binary, but Binary XML is more like Text than it is like like Custom Binary.

(2) The XML spec does define XML as a Unicode stream of characters, no-one can argue with that. But why then is it OK to talk about XML APIs? or the XML Infoset? When people talk about 'XML ' (or ' XML') they are talking about leveraging XML. In order to really do anything with XML, you need a parser, so unless you are writing an XML parser, you never deal with straight XML anyway. Binary XML just extends the existing XML domain to include a more compact encoding. You give up some of the benefits of a Text format, while gaining many of the benefits of a custom binary format.

Summary: Most software that 'uses' XML isn't interacting with the raw text stream, so why does this matter so much. Binary XML isn't XML, it is Binary XML.

Ultimately, Binary XML is not about replacing XML with some new binary encoding. It is about leveraging the many benefits of XML in situations that can not use Text XML. Binary XML just extends the reach of all those existing XML tools, both for the application developer and the application user.

Thursday, May 17, 2007

plugging things in...

This last weekend, I wandered down to the local Fry's to get me some gadgety goodness. Specifically, I wanted to pick up a Buffalo TeraStation and some wireless gear so that I could hide the TeraStation away. I've been trying to figure out how to run ethernet through the walls for as long as I've lived in the house, and have decided to give up. Wireless speeds are good enough now to make it not worth the bother.

I spent a long while in the networking aisle, trying to figure out the best options. My ideal setup was a 802.11n router and then something 802.11n to plug the TeraStation into. Thus my first dilemma. Every major manufacturer seems to be selling some kind of 802.11n router, and many of them sell USB/PCCard/PCI adaptors. My problem is that my idea location for my new TeraStation is neither near the DSL modem, nor near a computer I want to leave running 24/7. Thus I need either a 802.11n 'extender' or a 'game' adaptor, at least that seems to be what manufacturers are calling any kind of 802.11n client that has an ethernet port. Thus began my first problem with plugging things in... namely that I had nothing into which to plug my TeraStation!

It came down to 3 options:

(1) I already have 802.11g in the house, and there were a few 'extender's on the shelf. Annoyingly, and 802.11g 'extender' costs more than a cheap 802.11n router! I also was worried that a backup would saturate the network and leave me no bandwidth for checking the latest on Reddit! I could buy a router and extender, but I'd just want to replace them in a year with 802.11n.

(2) The old AirPorts could act as extenders, but I had been unclear from reading Apple materials whether the new ones could as well. They are also expensive!

(3) I've heard negative things about Powerline ethernet before, but the new versions promised up to 200Mb/s. It is comparatively cheap, and avoids saturating the house with more radio noise. I thought I'd give D-Link's Powerline HD Ethernet a try. It's all about the plugs baby!

I get home and unpack everything. I plug in one Powerline HD adaptor in my office, and another downstairs. I run their little setup tool and sha-za! They see each other! I was hoping the tool would also provide something that told how good the connection was, or some indication of what kind of throughput I might expect. No such luck though. The easiest things to do seemed to be to plug in the TeraStation, and time a file-copy. That required actually plugging in the TeraStation.

The outlet where I wanted to plug in the TeraStation was full, so I pulled a power-strip out of the closet and plugged the Powerline adaptor and the TeraStation into the power-strip. Everything boots up... but now the Powerline adaptor can't find it's brethren upstairs. I figure that the power-spike protection circuit in the power-strip are at fault, so I plug the Powerline adaptor directly into the wall... only to do so means unplugging something. Yea... I unplugged the power-strip with the freshly booted TeraStation attached. Thus began my woes.

I immediately realized what I'd done, and re-arraigned things so that I could plug the Powerline adaptor into the wall directly, and still power the TeraStation. The TeraStation booted up and began blinking at me with wild abandon. I figured it was just doing a filesystem check, so I went back to making sure that the Powerline adaptor was working. 15 minutes later, I can down to check on the TeraStation... it was still blinking and flashing like mad. The normal way to administer the device is via a browser/http interface, but the device didn't seem to be responding. The manual was less than helpful. Supposedly if you count the number of times the 'diag' led blinks over a 4 second period, you can figure out what kind of error it is attempting to report. I'm not sure if you have ever tried counting LED flashes while also counting the seconds... but let me tell you, it ain't easy! I decided to leave it be for a bit more. In an hour, I came back and it was still going! I know that it was trying to check 1TB of disk, but these disks were empty. I had yet to actually even use the device when I inadvertently unplugged it. At least the Powerline adaptor seemed to be working.

By the next morning, the TeraStation had settled down and I was able to connect to it and verify that all appeared to be working fine. Time for some bandwidth tests. First I verified that a trivial file copy worked. I was using Robocopy which copy speed in bytes/sec. All appeared good. Now copy over a ~1MB file. hmm.. I was only getting <1mb style="font-style: italic;">really slow. So I powered down the TeraStation and moved it up to where I could plug it directly into the router.. no Powerline ethernet involved. Now my laptop was getting ~12Mb/s! down to the basement, to get as far away from the wireless router as possible, and was still getting ~11Mb/s! Time to pack up the Powerline Adaptors, they are going home. No plugging for them!

The moral of the story? Powerline ethernet didn't work for me. oh yea.. and don't unplug a running computer.

** edit ** The product is called TeraStation... not TerraStation. Tera = 1,000,000. Terra = earth.