Sound advice - blog

Tales from the homeworld

My current feeds

Fri, 2008-Feb-22

The Future of Media Types

MIME is the corner stone of document type identification on the Web today. XHTML, for example carries the application/xhtml+xml media type. On the other hand, it also carries a XML namespace. Media types are controlled centrally by the IANA. XML namespaces are URLs, and therefore open to anyone to create. What are the trade-offs, and what is the future for document type identification?


Mark Baker comments on recent attempts to decentralise media type assignment, saying:

Been there, tried that. I used to think that was a good idea, but no longer do.

I'm a centrist on most technical and political issues. I see both sides of this debate. On the one hand, tight technical control ensures that document types on the Web are well controlled. This is a good thing. It means that different components of the Web's architecture can exchange information through these data types. On the other hand, there are always going to be document types that make sense outside of the context of the Web. These might be used within an isolated railway control system, or in a Business-to-Business pairing or grouping.

As the importance of the REST architectural style sinks in outside the Web it becomes less likely that the set of Web content types will be sufficient to convey all useful semantics. If we make the assumption that all document types ultimately need to be identified, we might to ask questions like: "What is the content type of the configuration file format for my SMTP server?". As we spiral outwards from this kind of very specific case, "What is the content type of a machine-readable description of a Railway?". Neither of these cases appear on the Web, and the IETF is unlikely to be an appropriate forum to discuss standardising the identification of these documents. Web protocols and standards extend past the single World-Wide Web, and to some extent I think this is important to recognise in the development of these standards.


Even if we do open up document types to a very wide base, MIME Types and URLs do not contain the same information. Importantly, URLs are generally opaque. In contrast, MIME Types can be interpreted. An atom document containing future xml content knows from the application/future+xml type attribute that it is contained as an unescaped XML sub-document. It can interpret a text/future as meaning that the sub-document is an XML text node. It can interpret the absence of these conditions as meaning that the content is binary, and is base64 encoded. Likewise, parameters on text document types can indicate information such as character encodings.

A danger in heading into the URL approach for document identification is that we loose this additional metadata. We would ideally be able to extract from a document both its type, and any parent types it may have that we could understand.

Where I stand

My solution for the moment is to just make up media types for special-purpose applications. These types are not registered with the IANA, and are not exchanged in general parlance. My theory is that a time will come where this type either comes into contact with another type with the same name, or another type with the same essential data schema. There will be some kind of conflict when either of these things happen, and the conflict will have to be resolved through social (rather than technical) means.

Once these types are well-enough developed for this kind of conflict to have occurred, it is likely that they will be ready for inclusion in some form of register. That might eventually be the IANA, but I suspect that satellite bodies will need to participate in the control of the document type space.

The question with this approach is where it leaves us with XML namespaces, and with URLs in general for document type identification. At present I don't recommend the use of XML namespaces at all. I think that MIME Types are king, and will remain king for the immediately forseeable future. XML namespaces should therefore be ignored by consumers, in general as redundant information. On the other hand, I might just be swimming against the tide on that one. I guess that atom could get by perfectly well with an XML namespace and no type attribute for XML sub-documents. Perhaps there is no practical benefit from being able to parse a document type for additional metadata.