Sound advice - blog

Tales from the homeworld

My current feeds

Sat, 2005-Nov-12

The Makings of a Good HTTP API

I've had the opportunity over the last few weeks to develop my ideas about how to build APIs for interfacing over HTTP. Coming from the REST world view I don't see a WSDL or IDL -derived header file or baseclass definition as a fundamentally useful level of abstraction. I want to get at the contents of my HTTP messages in the way the HTTP protocol demands, but I may also want to do some interfacing to other protocols.

The first component of a good internet-facing API is decent URI parsing. Most URI parsing APIs of today use the old rfc2396 model of a URI. This model was complex, allowing only a very basic level of URI parsing without knowledge of the URI scheme. For example, a http URI reference such as http://example.com:8080/some/path?query#fragment could be broken into

scheme
http
authority
example.com:8080
path
/some/path
query
query
fragment
fragment

while an unknown URI could only be deconstructed into "scheme" and "scheme-specific-part". A URI parser that understood HTTP and another that did not would produce different results!

January 2005's rfc3986 maps out a solution to URI parsing that doesn't depend on whether you understand the URI scheme or not. All URIs must now conform to the generic syntax of (scheme, authority, path, query, fragment), but all elements of the URI except the path are strictly optional. This is great for API authors who want to provide a consistent interface, however most APIs for URI handling were developed before 2005 and feel clunky in light of the newer definitions. A good API is necessarily a post January 2005 API.

Once you have your URI handling API in place, the next thing to consider is how your client and server APIs work. Java makes a cardinal error on both sides of this equation by defining a set of HTTP verbs it knows how to use, and effectively prohibiting the transport of other verbs. In fact, the set of HTTP verbs has changed over time and may continue to change. Extensions like WEBDAV and those required to support subscription are important considerations in desiging a general purpose interface of this kind. rfc2616 is clear that extension methods are part of the HTTP protocol, and that there is a natural expectation that methods defined outside the core standard will be seen in the wild. A client API should behave like a proxy that passes requests through that it does not understand. It should invalidate any cache entries it may have associated with the named resource, but otherwise trust that the client code knows what it is doing.

On the server side the option to handle requests that your API never dreamed of is just as important. Java embeds the operations "GET", "HEAD", "OPTIONS", "POST", "PUT", "DELETE", and "TRACE" into its HttpServlet class, but this is a mistake. If anything this is a REST resource, rather than a simple HTTP resource. The problem is that your view of REST and mine may differ. REST only says that a standard set of methods be used. It doesn't say what those methods are. GET, HEAD, OPTIONS, POST, PUT, DELETE, and TRACE have emerged from many years of standardisation activity and from use in the wild... however other methods have been tried along the way and more will be tried in the future. HttpServlet should be what it says it is and let met get at any method tried on me. I should be able to define my own "RestServlet" class with my own concept of the set of standard verbs if I like. Using this Java interface I have to override the service method and do an ugly call up to the parent class to finish the job it one of my own methods isn't hit. Python (and various other languages, such as smalltak) actually allow the neatest solution to this problem: Just call the method and get an exception thrown if one doesn't exist. No need to override anything but the methods you understand.

Another thing I've found useful is to separate the set of end-to-end headers from those that are hop-by-hop. When developing a HTTP proxy it is important that some headers be stripped from any request before passing it on. I've found that putting those headers into a separate map from those end-to-end headers makes life simpler, and since these headers usually carry a level of detail that regular clients don't need be involed with they can be handed into the request formatting and transmission process separately. That way API-added headers and client added headers don't have to be combined.

I guess this brings me to my final few criticisms of the j2se and j2ee HTTP APIs. I think it's worthwhile having a shared concept of what a HTTP message looks like between client and server. Currently the servlet model requires HttpServletRequest and HttpServletResponse objects, however the client API has a HttpURLConnection class that has no relationship to either object. Also, the HttpURLConnection class itself looks nothing like a servlet. If we had started from a RESTful perspective, I would suggest that the definition of a servlet (a resource) and the definitions of the messages that pass between resources would be the first items on the list. It would certainly make writing HTTP proxies in Java easier, and should be more consistent overall. In fact there is very little difference between HTTP request and response messages, so they could share a common baseclass. There is very little difference between HTTP and SMTP messages, once you boil away the hop-by-hop headers. There are even some good synergies with FTP, and any other protocol that uses a URI for location. Transferring data between these different protocols shouldn't be difficult with a basic model of resources in place internal to your program.

I think that ultimately the most successful APIs will attempt to model the web and the internet within your program rather than simply provide onramps for access to different protocols. The web does not have a tightly-controlled data model, even at the protocol level. It's important to keep things light and easy rather than tying them down in an overly strict and strongly-typed Object-Oriented way. The web isn't like that, and to some extent I believe that our programming styles should be shifting away also. There's always going to be a need to express something that two objects in an overall system will understand completely, but those objects in-between that have to handle requests and responses have only a sketchy picture of.

Benjamin