Sound advice - blog

Tales from the homeworld

My current feeds

Sat, 2005-Jul-23

Generic Event Notification Architecture

I was recently asked my opinion of the Generic Event Notification Architecutre (GENA). It is a subscription protocol that uses HTTP as its transport. A client makes a subscribe request to a URI, and the server is responsible for returning notifications via separate HTTP requests back to the client. The protocol was submitted by Microsoft to the IETF as a draft in September 2000, and it is a little unclear as to whether it has seen any sort of comitted adoption. It may be that it has since been superceeded in the minds of Microsoft employees by SOAP-based protocols.

GENA uses a HTTP SUBSCRIBE verb to make request of the server. The request is submitted to a specific URI which represents the subscribe access point. The subscription must be periodically confimed with an additional subscription request. One of the HTTP headers in the original SUBSCRIBE response carries what is known as a Subscription ID, or SID. The same SID header must be included in the additional SUBSCRIBE requests. Each subscription can specify the kinds of event notifications this client is interested in receiving, associated with the original resource it subscribed to. SUBSCRIBE requests include the URI that the server should NOTIFY when the event appears.

I have qualms generally about subscription models that require the server to connect back to the client. This confuses matters signficantly when firewalls are involved, but on the purely philiophical level it makes what is fundamentally a client-server relationship into one of two peers. I'll get back to that concern, but I think there are other aspects of the protocol that could do with some fine tuning as well.

The protocol is almost RESTful. It allows different things to be subscribed to by specifying different resources. It allows n-layered arbitration between the origin server and clients, just like HTTP's caching permits. It gets confused, though, and I think the SID is a prime example of this. The SID identifies a subscription, but instead of being a URI it is an opaque string that must be returned to the original SUBSCRIBE URI. If I were writing the protocol I would turn this around and clearly separate these two resources. You have a resource that acts as a factory for subscriptions and is the thing you want to subscribe to, and you have a subscription resource. I would suggest that the subscription resource be a complete URI that is returned in a Location header to match the effect of POST. It might even be reasonable to use the POST verb rather than a SUBSCRIBE verb for the purpose.

Once the subscription resource is created, it should be able to be queried to determine its outstanding lifetime. A 404 could be returned should the lifetime have been exceeded, and a PUT could be used to refresh the lifetime or even alter the set of events to be returned. From the protocol's perspective, though it is probably simplest just to define the effect of a SUBSCRIBE operation on the subscription in refreshing the timeout and leave the rest to best practice or a later draft.

Returning to the issues of how updates are propagated back to clients, I've harped on before about how I believe this needs to be a change to the HTTP protocol rather than just an overlay. I believe that a single request needs to be able to have multiple responses associated with it that will arrive in the order they were sent down the same TCP/IP connection as the request was made on. Dropping the connection drops all associated subscriptions just as it aborts responses to any outsanding requests. I agree that this approach may not suit loosely-coupled subscribe scenarios that don't want the overhead of one TCP/IP connection for each client/server relationship, but the GENA authors appear to also have been thinking along these lines. The draft includes the following:

We need to add a "connect and flood" mechanism such that if you connect to a certain TCP port you will get events. There is no subscribe/unsubscribe. We also need to discuss this feature for multicasting. If you cut the connection then you won't get any more events.

To turn specific focus back on GENA, I think that the HTTP callback mechanism is still underspecified. In particular it isn't clear what the responsibilities of the server are in returning responses. The server could use HTTP pipelining to deliver a sequence of notifications down the same TCP/IP connection, but what should it do when the connection blocks? The server could try to make concurrent connections when multiple notifications need to be sent, but which will arrive first? Will out of order notifications cause the client to perform incorrect processing? Can the client assume that the latest noficication represents the current state of the resource? Infinite buffering of events is certianly not an option, so what do you do when you exeed your buffer size? Do you utilise your bandwith via pipelining or do you limit your notification rate to the network latency by waiting for the last response before sending another? I don't see any mention in the protocol of an "Updates-Missed" header that might indicate to the client that buffering capabilities had been exceeded.

The specification also allows the server to silently drop subscriptions, a point of which clients may be unaware until it comes time to refresh the subscription. For this to work in practice the cases under which subscriptions could be dropped without notification would have to be well understood.

The actual content being delivered by GENA is unspecified, but GENA does include mechanisms for specifying event types. Personally, I think that the set of resources should be included in the definition of the subscribe URI rather than a special "NT" or "NTS" header. I think it's more RESTful to create separate resources for these separate things you might want to subscribe to than to alias the SUBSCRIBE for a single resource to mean different things depending on header metadata. If we were to take a RESTful view, we would probably want to assume that each update notification's body was a statement of the current representation of the resource. In some cases a kind of difference might also be appropriate. If caching is to be supported in this model the meaning of that content would have to be made as clear as possible, and may have to be explicitly specified in a header just as HTTP's chunked encoding is.

In conclusion, GENA is a good start but could do with some tweaking. I don't know whether the rfc is going anywhere, but if it ever does I think it would be interesting to view and refine it through REST goggles.