Sound advice - blog

Tales from the homeworld

My current feeds

Sun, 2007-Dec-09

Revising HTTP

Mark Nottingham is talking about an effort to revise HTTP. I have been thinking about and working with HTTP for machine-to-machine communications for a while, now. This group's charter includes the phrase "must not introduce a new version of HTTP and should not add new functionality to HTTP". If changes to HTTP were possible, my wishlist would include improvements to reliable messaging and working in a high availability environment.

Reliable Messaging

Reliable messaging is about getting your message through intact over an unreliable network. WS-* takes the approach of implementing TCP over TCP. Sequence numbers are added to messages such that they can arrive out of order or multiple times without being processed out of order or multiple times.

HTTP takes a better approach. It uses idempotent messages, so any that arrive multiple times have the same effect as having arrived only once. This is cheaper and more scalable than the WS-* approach. WS-* requires different servers to communicate their place in the sequence to each other as the stream moves from server to server. Any server-to-server communication at this level is an impediment to scalability, especially in a high-availability environment.

The thing HTTP misses out on is reliable ordering of messages. If I make requests down separate TCP/IP connections there is no way to achieve a reliable ordering. Unfortunately, if I choose to make requests down a persistent connection my guarantees are no better. In an ideal world I would be able to send a PUT http://example.com/ 1, then a fraction of a second later decide to send a PUT http://example.com/ 2. Without an ordering guarantee I have to wait for the first request to come back with a response.

The first problem comes with proxies. They are free to take pipelined requests from a single TCP connection and forward them on across different connections. The second problem comes with a multi-threaded server that under extreme conditions could acquire locks for the second requests before they acquire locks for the first request. If so, I could be left with a value of "1" at my URL, rather than the "2" I intended.

The whole thing gets complicated when caching is included, and PUT or DELETE requests are mixed with GET requests. A GET request might not even make it past any given proxy, so talking about reliable ordering of that request and others to the origin server is not a sensible conversation. If we talk about GET requests which we know will make it to the origin server (eg no-cache requests) then things are probably ok.

High availability clients

I wrote a little while ago about using TCP keepalives to allow a client to check for a failed server and fail over to a non-failed server in a bounded time. This hack can be fairly effective, but is only required because HTTP must return responses in the order requests were made. A special keep-alive message that can be sent and responded to while other requests are outstanding would allow a fast client failover.

This could be achieved in a number of ways: The TCP-keepalive solution could be recognised officially, a special keepalive for HTTP with special response characteristics could be introduced, or a general mechanism for out-of-order responses could be used.

Conclusion

I have only covered a couple of pain points for me at the moment. Both have work-arounds. The issue with reliable messaging means that responses have to come back before sending another request. It probably isn't the worst thing in the world that this work-around needs to be invoked. Likewise, the TCP keepalive solution to high availability clients is an effective solution in controlled environments.

I haven't covered areas like security and caching, as there are a lot of areas that I haven't thought through sufficiently. Publish/subscribe is also a pain point of HTTP, but is a separate protocol development in its own right.

There is probably little that this group can effectively achieve. HTTP itself is difficult to do anything with, as we don't want to break backwards-compatibility with existing implementations. Perhaps the best that can be done is to put problems like these and their work-arounds in as clear a print as possible.

Benjamin