Sound advice - blog

Tales from the homeworld

My current feeds

Sat, 2005-Mar-05

A RESTful subscription specification

Further to my previous entry on the subject, this blog entry documents a first cut at how I would update rfc2616 to support restful subscription. This is a quick hack update, and not a thorough workthrough of the original document.

The summary:

The detail:

  1. Add SUBSCRIBE and UNSUBSCRIBE methods to section 5.1.1
  2. Add SUBSCRIBE and UNSUBSCRIBE methods to the list of "Safe Methods" in 9.1.1
  3. Add section "9.10 SUBSCRIBE" with the following text:

    The SUBSCRIBE method means retrieve and subscribe to whatever information (in the form of an entity) is identified by the Request-URI. If the Request-URI refers to a data-producing process, it is the produced data which shall be returned as the entity in the response and not the source text of the process, unless that text happens to be the output of the process.

    A response to SUBSCRIBE SHOULD match the semantics of GET. In addition to the GET semantics, a successful subscription MUST establish a valid subscription. The subscription remains valid until an UNSUBSCRIBE request matching the successful SUBSCRIBE url is successfully made, until the server returns a 103 (SUBSCRIBE cancelled) response, or until the connection is terminated. A server with a VALID subscription SHOULD return changes using a 102 (SUBSCRIBE update) response to URL content immediately, but may delay responses according to flow control or server-side decisions about priority of subscription updates as compared to regular response messages. Whenever a 102 (SUBSCRIBE update) response is returned it SHOULD represent the most recent URL data state. Data MAY be returned as a difference between the current and previously-returned URL state if client and server can agree to do this out of band. A Updates-Missed header MAY be returned to indicate the number of undelivered subscription updates.

    A SUBSCRIBE request made to a URL for which a subscription is already valid SHOULD match the semantics of GET, but MUST not establish a new valid subscription.

    The response to a SUBSCRIBE request is cacheable if and only if the subscription is still valid. Updates to the subscription MUST either update the cache entry or cause the client to treat the cache entry as stale.

  4. Add section "9.11 UNSUBSCRIBE" with the following text:

    The UNSUBSCRIBE method means cancel a valid subscription. A server MUST set the state of the selected subscription to invalid. A client MUST either continue to process 102 (SUBSCRIBE update) responses for the URL as per a valid subscription, or ignore 102 (SUBSCRIBE update) responses. A successful unsubscription (one that marks a valid subscription invalid) SHOULD return a 200 (OK) response.

  5. Add section "10.1.3 102 SUBSCRIBE update" with the following text:

    A valid subscription existed at the time this response was generated on the server side, and the resource identified by the subscription URL may have a new value. The new value is returned as part of this response.

    This response should not be assumed to be associated with an in-sequence request, and may be returned when no request is outstanding.

  6. Add section "10.1.4 103 SUBSCRIBE cancelled" with the following text:

    A valid subscription existed at the time this response was generated on the server side, but the server is no longer able or willing to maintian the subscription. The subscription MUST be marked invalid on the client side.

    This response should not be assumed to be associated with an in-sequence request, and may be returned when no request is outstanding.

  7. Add section "14.48 Updates-Missed" with the following text:

    The Updates-Missed header MAY be included in 102 (SUBSCRIBE update) response messages. If included, it MUST contain a numeric count of the missed updates.

           Updates-Missed = "Updates-Missed" ":" 1*DIGIT

    An example is

           Updates-Missed: 34

    This response should not be assumed to be associated with an in-sequence request, and may be returned when no request is outstanding.

I guess the question to ask is whether or not subscription is a compatible concept with REST. I say it is. We're still using URLs and resources. We still have a limited number of request commands that can be used for a wide variety of media types. Essentially, all I'm doing is making a more efficient meta-refresh tag part of the protocol. It's part of what makes some of the proprietary protocols I've used in the past efficient and scalable. It's particularly something that helps servers deal gracefully with killer load conditions. You have a fixed number of clients making requests down a fixed number of TCP/IP connections. When the rate of requests goes up and the server starts to hurt, it simply increases the response time. No extra work is put on the system in times of stress. The server works as fast as it can, and any missed updates simply get recorded as such.

In a polling-based system things tend to degrade. You don't know whether the new TCP/IP connection is an old client or a new one, so you can't decide how to allocate resources between your clients. Even if they are using persistent connections, they keep asking you for changes when they haven't happened yet! If we're to see a highly-dynamic web I'm of the opinion that subscription needs to become part of it at some point.

Why not head for the WS-* stack? Well, I think the question answers itself... but in particular the separation of a client request and the connection it's using to maintain that subscription make the whole question about whether the subscription is still valid hard to assess. When it's not clear whether a subscription is up or down, time is wasted on both sides. My approach is simple and the broad conceptual framework has been proven in-house (although it hasn't been applied to http in-house just yet).

On another note, I was surpised to see the lack of high-performance http client interfaces available in Java. I was hoping to be able to make use of request pipelining to improve throughput where I'm gathering little pieces of data from a wide variety of URL sources on a display. There's just very little around that doesn't require a separate TCP/IP connection for each request, and usually a separate thread also. When you're talking about a thousand requests on a display, and possibly dozens of HMI's that want to do this simultaneously... well the server starts to look sick pretty quickly...