A am currently in the process of authoring an Internet Draft relating to an internet-scale subscription mechanism. The protocol is HTTP-based, and REST-like. I am currently denoting it as the Scalable Event Notification Architecture (SENA), a play on the old Generic Event Notification Architecture (GENA). It is intended for time-sensitive applications like SCADA and the relaying of stock market quotes, rather than for general use across the Internet. The draft draft can be found here and remains for the present time a work in progress.
Feedback Welcome
I have been soliciting targetted feedback on the document for the last few weeks, with some good input from a number of sources. Please consider proving feedback yourself by emailing me at benjamincarlyle at optusnet.com.au with the subject line "HTTP Subscription".
Architecture
The architecture consists of an originating resource, a set of clients, and of intemeditaries. Clients use a new SUBSCRIBE HTTP verb to request a subscription. This may be passed through without inspection by intermediataries, in which case the server will answer the request directly. A subscription resource is created by the origin server which may be DELETEd to terminate the subscription. A notify client associated with the subscription resource sends HTTP requests back to a specified client Call-Back URL whenever the originating resource changes.
Instead of passing the requests through directly, intermediataries can participate in the subscription. They can intercept the same subscription made by several clients themselves, and subscribe only once to the originating resource. The intermediatary can specify its own notify resource which will in turn notify its clients. This has a similar scalability effect to caching proxies on the Web of today.
Notification Verbs
I currently have a number of notification request verbs listed in the draft. The simplest one that could possibly work is EXPIRE. Based on the EXPIRE semantics, a NOTIFY resource could be told that it needs to revalidate its cache entry for the originating resource. If it is a client the likely response will be to issue an immediate GET request. This request will have a max-age:0 header to ensure revalidation occurs through non-subscription-aware intermediataries.
If the notify resource is operating on behalf of an intermediatary, it may choose to fetch the data immediately given that clients are likely to ask for it again very soon. Alternatively, it may wait for the first GET request from its clients to come in. Because it has a subscrpition to the originating resource, the intermediatary can safely ignore the max-age header. This allows the intermeditary to perform one revalidation for each received EXPIRE request, regardless of the number of clients it has.
The good thing about EXPIRE is that its semantics are so weak it is almost completely unnecessary to validate that it is genunine request from the real notify source. The worst thing an attacker could do is chew up a little extra bandwidth, and that could be detected when the originating resource consistently validated the old content rather than having new content available. EXPIRE also allows all normal GET semantics to apply, including content negotiation. The main bad things about EXPIRE are that it takes an extra two network traversals to get the subscribed data after receiving the request (GET request, GET response), and that you really have to GET the whole resource. There is no means of observing just the changes to a list of 100,000 alarms.
The alternative system is a combination of NOTIFY and PATCHNOTIFY requests. These requests carry the state of the originating resource and changes to that state respectively. The big problems with these requests are in their increased semantic importance. You must be able to trust the sender of the data, which means you need digital signatures guaranteed by a shared certificate authority. This introuduces a significantly higher processing cost to communications. Useful semantics of GET such as content negotiation also disappear. I am almost resigning myself to these methods not being useful. They aren't the simplest thing that could possibly work.
Summarisation
One of the explicit features of the specification is summarisation of data. Most subscription models don't seem to have a way of dealing with notifications through intermediataries that have fixed buffer sizes to a set of clients with different connection characteristics. If an intermeditary has a buffer size of one and recieves an update that can only be delivered to one of two clients, then recieves another update... what does it do?
The intermediatary usually either has to block waiting for the slowest client or kick the slowest client off so that it can deliver messages to the faster clients. The third option is to discard old messages that have not been transmitted to the slow client. In SCADA-like situations this is an obvious winner. The newer message will have newer state, so the slow client is always getting fresh data despite not getting all the data. The fast client is not stopped from recieving messages, so gets any advantages their faster connectivity is designed to bring. Most message delivery mechanisms don't know and can't take the semantics of the message into account. SENA is explictly a state-transfer protocol, thus these semantics of "old, unimportant state" and "new, important state" can be taken into consideration. PATCHNOTIFY reqeusts can even be merged into each other to form a single coherent update.
The EXPIRE request can also be summarised. New EXPIRE requests trump old requests. There is no point in delivering multiple queued EXPIREs. Likewise, the data fetches triggered by EXPIRE requests implicitly summarise the actual sequence of state changes by virtue of multiple changes occuring between GET requests.
Keep-alive
Most subscription mechanisms include a keep-alive function. This typically exists for a number of reasons:
- To ensure that server-side resources are not consumed long after a client has forgotten about the subscription
- To allow a client to determine that the server has forgotten about the subscription and needs a reminder
- To allow a client to detect the death of its server
SENA addresses the first point with a long server-driven keep-alive. I have deliberately pointed to a default period consistent with the default required for TCP/IP keep-alive: Two hours. It should be long enough not to significantly increase the base-load of the Internet associated with ping data, while still allowing servers the opportunity to clean up eventually.
The second point is dealt with in SENA by a prohibition against the server-side losing subscriptions. Subscriptions should be persistent across server failures and failovers. Client-side query of subscription state is permitted via a GET to the subscription resource, however this should not be used as a regular keepalive especially over a short period. Essentially, a server should never lose a subscription and user intervention will be required whenever it does happen.
Death detection is not dealt with in SENA. It is an assumption of SENA that the cost of generic death detection outweighs the benefits. The cost of death detection is at least one message pair exchange per detection period. Over the scale of the internet that sort of base load just doesn't compute. Subscriptions should become active again after the server comes online, therefore server downtime is just another time when the basic condition of the subscription holds: That the client has the freshest possible data. Monitoring of the state of the server is left as a specialised service capability wherever it is required.
Benjamin