Sound advice

Sat, 2005-Sep-10

HTTP Subscription without SUBSCRIBE

I've been very serouslying considering dropping the only new http verb I've introduced for HttpSubscription: SUBSCRIBE.

I'm coming around to the idea of saying this is simply a GET. I've heard opinions that it should be essentially a GET to a "subscribable" version of a resource. I haven't quite swung that far around. In particular, I think that separating the subscribable and the gettable version of the resource is going to make things harder on clients. In fact, I think it doesn't make a lot of sense from the server side either. If this is a resource that is likely to change while the user is looking it it, it should offer either subscribe support or a refresh header indicating how often to poll the resource. To offer subscribe but only a single-shot GET doesn't really make sense. Therefore, insted of using a special SUBSCRIBE request I'm thinking along the lines of content negotiation on a GET request. SUBSCRIBE made sense when I thought we would be "creating" a subscription. In that world SUBSCRIBE was a good alternative to POST. Now it's looking like an alternative to GET, I think that is less appropriate.

As I mentioned in my previous entry, bugzilla uses mozilla's experimental multipart/x-mixed-replace mime type to put up a "wait" page before displaying the results of a large query. Mozilla browsers see this document, render it, then render the result section of the multipart replace so the user sees that results page they are after. Bugzilla doesn't use this technique for browsers that don't support it. How does bugzilla know? It looks at the user-agent header!

I think this is clearly unacceptable as a general model. New clients can be written, and unless they identify themselves as mozilla or get the bugzilla authors to add them to the "in" list bugzilla won't make effective use of their capability. I see this as a content negotiation issue, so it seems reasonable to start looking at the Accept header. It is interesting to note that mozilla doesn't explicitly include multipart/x-mixed-replace in its accept header. Instead, it contains something like this:

Accept: text/xml, application/xml, application/xhtml+xml, text/html;q=0.9, 
        text/plain;q=0.8, image/png,*/*;q=0.5

Only the '*/*' permits the return of x-mixed-replace at all. There may even be some doubt about exactly how to apply the accept header to multipart mime of data. If I accept a multipart mime does that mean I don't have any control over content below that level? The rfc is strangely quiet on the issue. I suspect it has just never come up.

In practice, I think a reasonable interpretation of accept for multipart content types is that if the accept contains that type and also other types then the accept continues to apply recursively down the multipart tree. If you say you accept both multipart/x-mixed-replace and text/html, then chances are the document will come back as plain text/html or text/html wrapped up in multipart/x-mixed-replace. An application/xml version of the same is probably not acceptable. Also in practice, I think it is necessary at the current time to treat mozilla browsers as if they included multipart/x-mixed-replace in their accept headers.

Benjamin

in links: google google blogsearch technorati delicious
[/pubsub] permanent link

Sat, 2005-Sep-10

More HTTP Subscription

I've been actively tinkering away at the RestWiki HttpSubscription pages since my blog entry last and have started to knock some of the rough edges off my early specifications. I've begun coding up prototype implementations for the basic subscription concepts also, and have written up a little experience from that activity. I still have to resolve the issue of subscribing to multiple resources efficiently, and am currently proposing the idea of an aggregate resource to do that for me.

I'm hoping that what I'm proposing will eventually be RESTful, that is to say that at least parts of what I'm writing up will hit standardisation one day or become common enough that de jour standardisation occurs. I've been hitting more writeups of what people have done in the past in these areas and have added them to the wiki also.

There are various names for the basic technique I'm using, which is to return an indefinite or long-lived HTTP response to a client. There's server push, dynamic documents, pushlets, or simply pubsub. The mozilla family of browsers actually implements some of what I'm doing, at least for simple subscription. If you return a HTTP response with content-type multipart/x-mixed-replace, then each mime part will replace the previous one as it is received. This is a very basic form of subscription, and could be used for any kind of subscription really. That's the technique used by bugzilla to display a "wait" page to mozilla clients before returning the results of a long query. The key problems are these:

Scaling up to thousands of subscriptions, and
Making effective use of caches along the way

At the moment it seems we need to savagely hit the cache-control headers in order to prompt proxies to stream rather than buffer our responses to requests. If caches did understand what was going on, though, they could offer the subscription on behalf of the origin server rather than acting as a glorified router. A proxy could offer a subscription to a cached data representation, and separately update that representation using subscription techniques. This would give us the kind of multicasting for subscriptions as we currently get for everyday web documents.

Scaling up remains a problem. Using this technique, one subscription equals one TCP/IP connection. When that drops the subscription is ended, and if you need more than one subscription you need more than one connection. If you need a thousand subscriptions you need a thousand connections. It isn't hard to see how this might break some architectures.

My proposal to create aggregate resources is still a thorny one for me. I'm sure it would help in these architectures but there are issues to consider about what wrapping many http responses into a single response means. If aggregates can effectively be created, though, you could get back your one connection per client model for communications.

I'm eager to get more feedback on this topic, especially if you are developing software for my specification or are using similar techniques yourself. I have a vauge notion that in the longer term it will be possible for a client to be written to a HTTP protocol that explicitly permits subscription and prepares clients for the data stream they will get back. As I said earlier there are implementations already out there, but the data on the wire is in a myriad of forms and I see the lack of consistency as an opportunity to get it right.

Benjamin

in links: google google blogsearch technorati delicious
[/pubsub] permanent link

Sound advice - blog

Lifesigns

Subscribe

RDF

Feedback and Social Software

Support Software

Site Statistics

License

My recent bookmarks

HTTP Subscription without SUBSCRIBE

More HTTP Subscription

Sound advice - blog

Lifesigns

Subscribe

RDF

Feedback and Social Software

Support Software

Site Statistics

License

My current feeds

My recent bookmarks

HTTP Subscription without SUBSCRIBE

More HTTP Subscription