Sound advice - blog

Tales from the homeworld

My current feeds

Fri, 2005-Jul-15

REST Content Types

So we have our REST triangle of nouns, verbs, and content types. REST is tipping us towards placing site to site and object to object variation in our nouns. Verbs and content types should be "standard", which means that they shouldn't vary needlessly but that we can support some reasonable levels of variation.


If it were only the client and server involved in any exchange, REST verbs could be whittled down to a single "DoIt" operation. Differences between GET, PUT, POST, DELETE, COPY, LOCK or any of the verbs which HTTP in its various forms supports today could be managed in the noun-space instead of the verb space. After all, it's just as easy to create a resource as it is to create with a GET verb on it. The server implementation is not going to be overly complicated by either implementation. Likewise, it should be just as easy to supply two hyperlinks to the client as it is to provide a single hyperlink with two verbs. Current HTTP "A" tags are unable to specify which verb to use in a transaction with the href resource. That has lead to tool providers misusing the GET verb to perform user actions. Instead of creating a whole html form, they supply a simple hyperlink. This of course breaks the web, but why is not as straightforward as you may think.

Verbs vs Delegates

Delegates in C# and functions in python give away how useful a single "doIt" verb approach is. In a typical O-O observer pattern you need the observer to inherit from or otherwise match the specification available for a baseclass. When the subject of the pattern changes it looks through its list of observer objects and calls the same function on each one. It quickly becomes clear when we use this pattern that the one function may have to deal with several different scenareos. One observer may be watching several subjects, and it may be important to disambiguate between them. It may be important to name the function in a more observer-centric rather than subject-centric way. Rather than just "changed", the observer might want to call the method "openPopupWindow". Java tries to support this flexibility by making it easy to create inner classes which themselves inherit from Observer and call back your "real" object with the most appropriate function. C# and python don't bother with any of the baseclass nonsense (and the number of keystrokes required to implement them) and supply delegates and callable objects instead. Although Java's way allows for multiple verbs to be associated with each inner object, delegates are more "fun" to work with. Delegates are effectively hyperlinks provided by the observer to the subject that should be followed on change, issuing a "doIt" call on the observer object. Because we're now hyperlinking rather than trying to conceptualise a type hierarchy things turn out to be both simpler and more flexible.

The purpose of verbs

So if not for the server's benefit, and not for the client's benefit, why do we have all of these verbs? The answer for the web of today is caching, but the reasoning can be applied to any intermediatary. When a user does a GET, the cache saves its result away. Other verbs either mark that cache entry dirty or may update the entry in some ways. The cache is a third party to the conversation and should not be required to understand it in too much detail, so we expose the facets of the conversation that are important to the cache as verbs. This principle could apply any time we have a third party involved who's role is to manage the communication efficiently rather than to become involved in it directly.

Server Naivety and Client Omniscience

In a client/server relationship the server can be as naive as it likes. So long as it maintains the basic service contstraints it is designed for, it doesn't care whether operations succeed or fail. It isn't responsible for making the system work. Clients are the ones who do that. Clients follow hyperlinks to their servers, and they do so for a reason. Whenever a client makes a request it already knows the effect its operation should have and what it plans to do with the returned content. To the extent necessary to do its job, the client already knows what kind of document will be returned to it.

A web browser doesn't know which content type it will receive. It may be HTML, or some form of XML, or a JPEG image. It could be anything within reason, and within reason is a precisely definable term in this context. The web browser expects a document that can be presented to its user in a human-readable form, and one that corresponds to one of the standard content types it supports for this purpose. If we take this view of how data is handled and transfer it into a financial setting where only machines are involved, it might read like this: "An account reconciler doesn't know which content type it will receive. It may be ebXML, or some form from of OFX, or an XBRL report. It could be anything with reason, and within reason is a precisely definable term in this content. The reconciler expects a document that can be used to compare its own records to that of a supplier or customer and highlight any discrepencies. The document's content type must correspond to one of the standard content types it supports for this purpose."

REST allows for variations in content type, so long as the client understands how to extract the data out of the returned document and transform it into its own internal representation. Each form must carry sufficient information to construct this representation, or it is not useful for the task and the client must report an error. Different clients may have different internal representations, and the content types must reflect those differences. HTTP supports negotiation of content types to allow for clients with differing supported sets, but when new content types are required to handle different internal data models it is typically time to introduce a new noun as well.


So how does the client become this all knowing entity it must be in every transaction it participates? Firstly, it must be configured with a set of starting points or allow them to be entered at runtime. In some applications this may be completely sufficient, and the configuration of the client could refer to all URIs it will ever have to deal with. If that is not the case, it must use its configured and entered URIs to learn more about the world.

The HTML case is simple because its use cases are simple. It has two basic forms of hyperlink: the "A" and the "IMG" tags. When it comes across an "A" it knows that whenever that hyperlink is activated it should look for a whole document to present to its user in a human-readable form. It should replace any current document on the screen. When it comes across "IMG" it knows to go looking for something humean-readable (probably an actual image) and embed it into the content of the document it is currently rendering. It doesn't have to be any more intelligent than that, because that is all the web browser needs to know to get its job done.

More sophisicated processes require more sophisticated hyperlinks. If they're not configured into the program, it must learn about them. You could look at this from one of two perspectives. Either you are extending the configuration of your client by telling it where to look to find further information, or the configuration itself is just another link document. Hyperlinks may be picked up indirectly as well, as the result of POST operations which return "303 See Other". As the omniscient client they must already know what to do when they see this response, just as a web browser knows to chase down that Location: URI and present its content to the user.

There is a danger in all things of introducing needless complexity. We can create new content types until we're blue in the face, but when it comes down to it we need to understand the clients requirements and internal data models. We must convey as much information as our clients require, and have some faith that they know enough to handle their end of the request processing. It's important not to over-explain things, or include a lot of redundant information. The same goes for types of hyperlinks. It may be possible to reduce the complexity of documents that describe relationships between resources by assuming that clients already know what kind of relationship they're looking for and what they can infer from a relationships existence. I think we'll continue to find as we have found in recent times that untyped lists are most of what you want from a linking document, and that using RDF's ability to create new and arbitrary predicates is often overkill. My guide for deciding how much information to include is to think about those men in the middle who are neither client nor server. Think about which ones you'll support and how much they need to know. Don't dumb it down the for the sake of anyone else. Server doesn't care, and Client already knows.