Sound advice

Tales from the homeworld

My current feeds

Thu, 2008-Jan-03

Realtime Messaging over Unreliable Networks

Embedded systems have to work under adverse conditions without human supervision. Embedded components that participate in distributed systems need to work despite arbitrary network delays, and with periods under which available bandwidth is extremely low. In these environments it is important to simplify communication to the basics, and these closely align to REST messaging on the Web: "I have recently sampled the state of resource X" (GET and/or SUBSCRIBE), and "A valid client wants the state of resource Y to be Z" (PUT or DELETE).

All networks are to some extent unreliable. Packets get lost, and this translates to delays and constrained bandwidth over the TCP layer. Each message could take an arbitrary amount of time to get through. The assumption of an embedded component should be that any individual message will eventually get through, however that effective bandwidth might not be sufficient for all messages to eventually get through.

One of the keys of REST is that messaging is based around the transfer of resource state. My intent for the value of a resource may change over time. As my intent for the resource changes I will PUT a new value, replacing the last. This means that I can easily drop intermediate states. Consider the case of an online jukebox with a play/pause control. I can put a "true" to <http://jukebox.example.com/playing>, then "false", then "true" again as my intent changes. All that really matters is my last intent for the resource. I can drop the earlier "false", and even the earlier "true". I can still get my eventual intent through, even if my intent changes more rapidly than available bandwidth allows. I can also repeat may latest intent in case my first attempt failed to garner a positive response.

A subscription mechanism based on transfer resource state can exploit the same feature: The state of the resource being monitored can change more rapidly than the available effective bandwidth. While a client may benefit from seeing intermediate states, the only state it must see is usually the most recent one.

Both changes of intent and changes of state can become more complex when resource state overlaps. For example, it may not be obvious that a PUT to <http://jukebox.example.com/currentSelection> should extinguish an outstanding request to pause the jukebox. Say the selection of a new track automatically starts the jukebox playing from a pause. This is a new intent that should extinguish the old, including any PUT request that the client has so far failed to send.

The simple solution to overlapping resources is to split these resources so that they don't overlap: Split the user-level change of selection into separate change selection and play requests. Note that is it is only intent we are trying to separate. The resources which capture client intent will usually still affect other resources. For example, the submission of a purchase order will undoubtably affect accounting records. Every request is likely to have an affect on server logs, and these too may be accessible as resources.

The lesson is that URLs accepting PUT or DELETE requests should demarcate non-overlapping state. This state may overlap with URLs that don't accept PUT or DELETE requests. If that isn't possible then clients that send requests with overlapping states need to take additional care.

The server can act on the client's intent in any way appropriate once the client has communicated it by depositing state at an appropriate URL. Likewise, a client can behave in any way it sees fit once it successfully samples the recent state of a particular URL. REST is not about implementation, but interfaces between components.

Distributed systems that work even under adverse network conditions can't just buffer infinitely and hope for the best. They must have a strategy for utilising limited bandwidth effectively and avoiding infinite buffering scenarios. Service Level Agreements can provide a legal framework for adequate network characteristics, but reliable systems still need to plan for the worst.

Benjamin