Sound advice - blog

Tales from the homeworld

My current feeds

Tue, 2009-May-05

The text/plain Semantic Web

Perhaps the most important media type in an enterprise-scale or world-scale or architecture is text/plain. The text/plain type is essentially schema free, and allows a representation to be retrieved or PUT with little to no jargon or domain-specific knowledge required by server or client. It is applicable to a wide range of problems and contexts, and is easily consumed by tools and humans alike.

Uses of text/plain

In essence, this type conveys a string. However, we can also think about embedding numbers or other simple data types. The modern dynamic language approach to looking at strings is to allow implicit conversion between the information inserted by the sender and the type expected by the consumer. These values can easily be incorporated into programming language data types, inserted into databases, spreadsheets, reports, or other structures.

To outline a few potential uses of text/plain, consider the following interactions

Standards and compatibility

While formatting of numbers and other types may seem natural enough, it is important that this be done consistently if the information is to remain legible when it is processed. To my mind the best resource in formatting and processing of simple text-compatible data types can be found in the specification for . Part 2 contains a section on built-in datatypes that covers a range of string, numeric, URI, date and time, and other simple types. Any data that can be formatted according to the rules in this section absolutely should be.

However, this leads to a dilemma. What do we do with types that are not found in this set? Should a geo-location become a structured XML document, or should it too be coded as text/plain? rfc2426 defines a semi-colon-separated standard format for geo-location, which could certainly be coded as text/plain. However, it is not clear at this stage that this is or will be the canonical way of encoding this information as a text/plain document. Without reference to applicable and universal standards we bear a significant risk that the partially-formatted content we transfer will in fact not be understood.

Applicability of text/plain MIME type

Part of the problem that emerges is that text/plain is not specific enough. It doesn't have sub-types that are clearly tied to a specification document or standards body. This makes interoperability a potential nightmare of heuristic detection.

Unfortunately, while XSD provides an excellent catalogue of basic types it is neither comprehensive nor sufficiently connected to MIME usage. Another problem with using text/plain in its bare form is its default assumption of a US-ASCII character type. This can lead to obvious problems in a modern internationalised world.

Without being backed by some kind of standards body, the advice I give in this regard is merely that. Standards may emerge later that contradict what I have to say here. That said, my advice is this:

  1. Treat text/plain content as being formatted according to XSD conventions when you recieve it. Take care to process character encoding directives correctly and support at least a utf-8 encoding.
  2. Consider using a text/xsd+plain document type when transmitting XSD-formatted simple content. This will hopefully indicate that the document can be understood as text/plain, but provide additional context if more complex processing is applied to the document.
  3. Make use of other specialised types that indicate the standard being applied when types outside of the XSD set are employed. For example, the geo coordinates above might be described as text/vcard+plain.

Again, ideally we would be making use of a well-defined standards body to own and maintain the media types used to communicate very basic information. Making up your own can only take the state of the art so far. However, standards sometimes emerge out of common best practice... so it is not a complete waste of time to be heading down this particular path.

When not to use text/plain

It should be clear that text/plain is not a tool for every occasion. It is often important to sample or send an atomic set of data that would require additional schema. Plain text when overused can lead to performance problems as individual values are sampled one by one instead of as a consistent and coherent document.

Perhaps the clearest indication that you are overusing text/plain is that you are experiencing an explosion in hyperlinks. When you start to need a document to provide links for consumers to find these text/plain-centric resources, you should probably consider incorporating the information directly into these documents themselves.

Used appropriately to transfer information to and from well-known and stable resources, text/plain or its variants can be an efficient way to communicate simple data without introducing unnecessary jargon. The URI of the resource and the implementation of client and server will provide sufficient context to format and process these simple data types.

The low barrier to entry to these types makes them universally applicable and easy to work with, however the lack of standardisation around matching encodings to media types is an inhibitor to their potential uptake. Used well, especially in combination with link headers and/or text/uri-list these types can provide an effective to way to make your protocols get out of the way of communication and let clients and servers interoperate with minimal complexity for simple use cases.

Benjamin