Sound advice - blog

Tales from the homeworld

My current feeds

Tue, 2006-Apr-11

Namespaces and Community-driven Protocol Development

We have heard an anti-namespace buzz on the internet for years, especially regarding namespaces in XML. Namespaces make processing of documents more complicated. If you are working in a modular language you will find yourself inevitably trapped between the long names and the qnames and having to preserve both. If you use something like XSLT you will find yourself having to be extra careful to ensure you select elements from the right namespace, especially in any xpath expressions. It isn't possible in xpath to refer to an element that exists within the default namespace of the xslt document. It must be given an explicit qname.

Another hiccup comes about when working with RDF. It would be easy to produce compact rdf documents if one could conveniently use xml element attributes to convey simple literals. One thing that makes this more difficult is that while xml document elements automatically ineherit a default namespace, attributes get the null namespace. RDF uses namespaces extensively, so you will always find yourself filling out duplicate prefixes for attributes in what would otherwise be quite straightforward documents. This makes it difficult to both define a sensible XML format and to make it "RDF-compatible".

A new argument for me against the use of namespaces in some circumstances comes from Mark Nottingham's recent article on protocol extensibility. He argues that the use of namespaces in protocols has a social effect, and that the effect leads to incompatability in the long term. He combines this discussion with what he paints as the inevitable futility of "must understand" semantics.

Protocol development is fundamentally a social rather than a technical problem. In a protocol situation all parties must agree on a basic message structure as well as the meaning of a large enough subset of the terms and features included to get useful things done. A server and client must broadly agree on what HTTP's GET method means, and intermediataries must also have a good idea. In HTML we need to agree that <p> is a paragraph marker rather than a punctuation mark. These decisions can be made top-down, but without the user community's support such decisions will be ignored. Decisions can be made from the bottom up, but at some stage coordinated agreement will be required. Namespaces provide a technical solution to a social problem by allowing multiple definitions of the same term to be differentiated and thus to interoprate. Mark writes:

What I found interesting about HTML extensibility was that namespaces weren't necessary; Netscape added blink, MSFT added marquee, and so forth.

I'd put forth that having namespaces in HTML from the start would have had the effect of legitimising and institutionalising the differences between different browsers, instead of (eventually) converging on the same solution, as we (mostly) see today, at least at the element/attribute level.

HTML does have a scarce resource, in that the space of possible element and attribute names is flat; that requires some level of coordiation within the community, if only to avoid conflicts.

Dan Connolly is writing on obliquely the same subject. He is also concerened about the universe without namespaces, but his main concern is that protocol development decisions get adequate oversight before deployment. Dan writes:

We particularly encourage [uri-based namespaces] for XML vocabularies... But while making up a URI is pretty straightforward, it's more trouble than not bothering at all. And people usually don't do any more work than they have to.

There is a time and a place for just using short strings, but since short strings are scarce resources shared by the global community, fair and open processes should be used to manage them. Witness TCP/IP ports, HTML element names, Unicode characters, and domain names and trademarks -- different processes, with different escalation and enforcement mechanisms, but all accepted as fair by the global community, more or less, I think.

Both Dan and Mark end up covering the IETF convention of snubbing namespaces, but using a "x-" prefix to indicate that a particular protocol term is experimental rather than standard. It is Dan that comes down the hardest on this approach citing the "application/x-www-form-urlencoded" mime type as a term that became entrenched in working code before it stopped being experimental. It can't be fixed without breaking backwards-compatability, and there doesn't seem to be a good reason to go about fixing it.

Both Mark and Dan have good credentials and are backed up by good sources, so who is right? I think they both are, but at different stages in the protocol development cycle.

So let's say that the centralised committee-based protocol development model is a historical dinosaur. We no longer try to make top-down decisions and produce thousands of pages of unused technical documentation. So now how do new terms and new features get adopted into protocols and into document types? It seems that the right way is to the following process:

Marks suggests that using namespaces within a protocol may unhelpfully encourage communities to avoid that third step. The constraints of a short string world would force them to interoperate and to engage one another on one level or another and doesn't produce a result of "microsoft-this" and "netscape-that" littered throughout the final HTML document. Using short strings produced a cleaner protocol definition in the end for both HTTP and HTML, and forced compromises onto everyone in the interests of interoperability. If opposing camps are given infinite namespaces to work with they may tend towards diverent competing protocols (eg RSS and Atom) rather than coming back to the fold and working for a wider common good (HTML).

Dan criticises google's rel-nofollow in his article, saying:

Google is sufficiently influential that they form a critical mass for deploying these things all by themselves. While Google enjoys a good reputation these days, and the community isn't complaining much, I don't think what they're doing is fair. Other companies with similarly influential positions used to play this game with HTML element names, and I think the community is decided that it's not fair or even much fun.

I think that google is taking problably a less-community-minded approach than they may have done. Technorati is also criticised for rel-tag. Both relationship types started with a single company wanting to have a new feature, and there is foundataion for criticism on both fronts. Both incidents appear to have developed in a dictatorial fashion rather than by engaging a community of existing expertise. Technorati's penance was to blossom into the microformats community, a consensus-based approach with reasonable process for ensuring work is not wasted.

HTML classes are a limited community resource, just as HTML tags are. This resource has traditionally been defined within a single web site without wider consideration. Context disambiguated the class names, as only the css and javascript files associated with a single site would use the definitions from that site. Microformats and the wider semantic HTML world have recently taken up this slack in the HTML specification and are busy defining meanings that can be used across sites. The HTML elements list is not expanding, because that is primarily about document structure. HTML classes are treated differently. They are given semantic importance. Communities like microformats will spend the next five years or so coming up with standard html class names and do the same with link types. They will be based on existing implementation and implied schema, and will attempt not to splinter themselves into namespaces. Other communities will develop, and may collide with the microformats world. At those times there will be a need for compromise.

We are headed into a world of increasingly rich semantics on the web, and the right way to do it seems to be without namespaces. Individuals, groups and organisations will continue to develop their own semantics where appropriate. Collisions and mergers will happen in natural ways. The role of standards bodies will be to oversee and shape emerging spheres of influence in as organic a way as possible, and to document the results of pushing them through their paces.

Benjamin