Wednesday, April 27, 2005

What is XML?

I wandered into our admin's office today and we got talking about my team actually builds; what is this 'XML' thing? (Note: Microsoft teams are organized with a single 'admin' babysitting a few hundred people, dealing with equipment, purchasing, and helping manage the schedules for management. I'm probably missing a half dozen crucial things, but basically, they are the hidden duct-tape and bubble-gum that holds everything else together. In their own unique way probably one of the least respected, yet most subtly powerful individuals in the company... but I digress.) She commented that she doesn't really understand what my team (a.k.a. WebData - XML) actually does. In trying to answer her question, I think I gave the best, non-technical, answer I have ever found for that question, so I thought I'd post a version of it here, for posterity, as they say. The ultimate litmus test will be to run this past my parents, who have been trying to get me to explain what I do for quite a few years now. So the question is, in non-technical terms, what is XML and what is it used for?

Imagine that every time you talked to another person about a specific topic, each topic had it's own unique language. The language was specifically tailored to that topic, so it was extremely efficient at expressing the concepts that one would need to express with regard to that topic. When a topic evolved to include new concepts you had to figure out how to add them to that specific language. If you had interests in multiple topics, you would need to be able to be fluent in multiple distinct languages. At your job you would probably have different languages for lunch social conversation, work specific talk, accounting issues, management discussions, etc.

Now imagine someone suggested a core common language infrastructure; a common set of idioms and structures that all the specific languages conformed with. This would vastly simplify learning new languages, and would allow you to reuse concepts from one language in another language.

XML is analogous to that common language infrastructure. XML itself is just an empty framework with no concepts: tags have no predefined meaning. XML+Namespaces makes it easy to reuse tag vocabularies from one domain in another domain. All the basic work of parsing into a common structure is handled by the XML parser, so the application doesn't need worry about the low-level details.

An interesting fall-out of all that is that it is possible to write generic tools that can now interact with data without needing to know the intimate details of how to parse that data. These generic tools don't know the semantic meaning of the data, just as at the grammatical level 'in' and 'on' are basically the same in English. These generic tools can provide some facilities to enable common operations to be expressed simply. Thus we have XPath, XSLT, etc... XPath doesn't care what your Element tag-names mean, but that doesn't stop if from being useful. Compare that to HTML. In an HTML document, you need an HTML parser which knows the semantic meaning of the various tags, and knows which tags require end tags and all sorts of other HTML specific parsing rules. XML forces a much simpler structure, but then provides the benefit that tools can be written that do not need intimate familiarity with your data.

This helps explain why XML is so popular. Most of why people use computers has something to do with entering data and having at least one program do something with that data. If the data is stored in a format that other programs have some hope of partially understanding, then there is some hope that those other programs might also be able to use some of that data. The classic example of something like this was 'mail-merge'. Take a form letter and a database of addresses/names and create a copy of that letter for each address/name. With something so flexible as XML, the possibilities are much larger than just mail-merge though. Store you Word document in SQL Server (as an XML datatype) and query by document title (or author or footnote or ...). Store your application configuration as XML and now you can write other tools to render you configuration as a pretty HTML page, or help compare two different configurations in a meaningful way. The possibilities are virtually endless.