Sound advice - blog

Tales from the homeworld

My current feeds

Fri, 2005-Sep-09

More HTTP Subscription

I've been actively tinkering away at the RestWiki HttpSubscription pages since my blog entry last and have started to knock some of the rough edges off my early specifications. I've begun coding up prototype implementations for the basic subscription concepts also, and have written up a little experience from that activity. I still have to resolve the issue of subscribing to multiple resources efficiently, and am currently proposing the idea of an aggregate resource to do that for me.

I'm hoping that what I'm proposing will eventually be RESTful, that is to say that at least parts of what I'm writing up will hit standardisation one day or become common enough that de jour standardisation occurs. I've been hitting more writeups of what people have done in the past in these areas and have added them to the wiki also.

There are various names for the basic technique I'm using, which is to return an indefinite or long-lived HTTP response to a client. There's server push, dynamic documents, pushlets, or simply pubsub. The mozilla family of browsers actually implements some of what I'm doing, at least for simple subscription. If you return a HTTP response with content-type multipart/x-mixed-replace, then each mime part will replace the previous one as it is received. This is a very basic form of subscription, and could be used for any kind of subscription really. That's the technique used by bugzilla to display a "wait" page to mozilla clients before returning the results of a long query. The key problems are these:

At the moment it seems we need to savagely hit the cache-control headers in order to prompt proxies to stream rather than buffer our responses to requests. If caches did understand what was going on, though, they could offer the subscription on behalf of the origin server rather than acting as a glorified router. A proxy could offer a subscription to a cached data representation, and separately update that representation using subscription techniques. This would give us the kind of multicasting for subscriptions as we currently get for everyday web documents.

Scaling up remains a problem. Using this technique, one subscription equals one TCP/IP connection. When that drops the subscription is ended, and if you need more than one subscription you need more than one connection. If you need a thousand subscriptions you need a thousand connections. It isn't hard to see how this might break some architectures.

My proposal to create aggregate resources is still a thorny one for me. I'm sure it would help in these architectures but there are issues to consider about what wrapping many http responses into a single response means. If aggregates can effectively be created, though, you could get back your one connection per client model for communications.

I'm eager to get more feedback on this topic, especially if you are developing software for my specification or are using similar techniques yourself. I have a vauge notion that in the longer term it will be possible for a client to be written to a HTTP protocol that explicitly permits subscription and prepares clients for the data stream they will get back. As I said earlier there are implementations already out there, but the data on the wire is in a myriad of forms and I see the lack of consistency as an opportunity to get it right.

Benjamin