Sound Advice - patterns

Tales from the homeworld

My current feeds

Published: Tue May 3 20:35:23 EST 2011

Updated: Wed May 4 08:01:20 EST 2011

Delta Encoding

Intent

Keep one or more clients up to date with the state of large resource.

Motivation

Use case 1: Software updates

A server has a program that it wants to distribute to its clients. After an initial installation it hopes to be able to keep its clients up to date for minimal cost. It wants to send each client only the differences that client needs in order for the client to complete its update.

Different clients may be up to different revisions of the software, meaning that different updates may be needed for different clients. The server does not wish to keep track of individual clients and where they are up to, and does want to make effective use of caching infrastructure.

Some clients will be too far out of date, and it won't be worth supplying the patch updates to them. For these clients the server wants to redistribute the whole of the new program version as an update.

Use case 2: Publishing a log

A server is keeping a log of access to a large software system. This server is one of a collection of many thousand that are all generating logs that must be analysed centrally. At various points in the architecture there are servers that aggregate logs for servers that are nearby, and transmit the aggregated logs to several secure locations where the logs can be viewed and interrogated.

Applicability

Delta Encoding is useful wherever it is likely to save time, save processing, save bandwidth, or better inform consumers about changes to a large resource to send updates to that resource instead of sending the whole representation each time it changes.

It depends on clients that know what they want to fetch, and when, and who can follow a simple algorithm to bring themselves up to date and stay up to date. It is friendly to caching, and supports the normal features available to GET requests such as content negotiation.

Structure

Delta Encoding pattern structure

Participants

Client
  • Performs an initial FetchMain
  • Records the time reference returned as part of the normal fetch response
  • Whenever it wants to be brought up to date:
    • Performs a FetchDelta, specifying its current time reference
    • Records the time reference of the response
    • Repeats this process until a NoDeltaYet response is returned
  • If the client receives a DeltaGone response to a delta fetch invokes an appropriate recovery mechanism, such as invoking FetchMain()
Server
  • Tracks changes that occur to the main resource as they occur, or is otherwise able to generate a delta for the resource from a known point in time
  • Provides a time reference in the FetchMain() response that can be used to bring clients up to date from this point in time
  • Provides a time reference in the FetchDelta() response that can be used to bring clients up to date from this point in time
  • Returns NoDeltaYet for FetchDelta() calls for the next time reference until new data is produced
  • For each FetchDelta() call old returns the difference from the specified point of time to the current point in time for the main URL
  • If the server is unable to provide deltas indefinitely far back in time, then return DeltaGone for any FetchDelta calls that nominate a time reference for which the server does not wish to produce deltas

Collaboration

Delta Encoding sequence
  • Client issues a series of requests to Server to bring itself up to date, until it sees a NoDeltaYet or a DeltaGone response.
  • Server generally brings client up to date in one request, providing a full delta from the point in time indicated in the delta URL to the current time
  • If the client fetches a delta that has no data yet, Server responds with NoDeltaYet
  • If the client fetches a delta that can no longer be generated, Server responds with DeltaGone
  • On seeing a DeltaGone response for a delta the client takes corrective action, such as re-fetching the main URL

Consequences

Although a delta encoding protocol exists for use in conjunction with HTTP it is geared towards keeping a cache up to date with a particular representation of a resource. This approach for delta encoding specified in the Implementation section below allows a more semantic differences model to be applied. Origin resources can be of any media type, and content negotiation is supported. Deltas too can be of any media type and support content negotiation. Deltas do not need to be taken by all clients as literal instructions on how to modify the current resource representation to bring it up to date. Instead, they can be taken as a list of changes to the main resource that can be processed independently of the whole main resource state for so long as fetches to delta resources do not result in a DeltaGone response.

Caching proxies may exist between client and server. This mechanism works well with caching proxies, including those that are not aware of the delta encoding pattern in use. The main resource and each of the delta resources can be queried and cached independently by multiple layers, and deltas can be distributed through a cache hierarchy for so long as clients are reasonably likely to be querying from the same point in time or a small collection of different points in time.

If the server provides similar cache control metadata for delta URLs as it does for the main URL then the client will generally be brought up to date within one request within the staleness allowed by those cache directives. However, if the server nominates a long max-age for deltas then clients will not be brought up to date just by issuing one request. For this reason the client continues fetching until it gets confirmation from the server that no more updates are available.

This pattern is stateless. It does not keep track of any individual clients, and instead tracks a series of legal time references and hands these out to clients. Each client is responsible for storing the time reference that it is up to. The amount of storage space required on the server is proportional to the size of its delta buffer, plus the number of concurrent requests being made at any given time. The total number of consumers does not directly affect the amount of storage space required on the server.

Time references generally need to be shared across all servers across a cluster in order to allow clients to access deltas on each server.

Implementation

Delta Encoding can be implemented with HTTP using the following mappings:

FetchMain
GET main URL HTTP/1.1
MainSuccess(representation, time reference)
HTTP/1.1 200 OK
Content-Type: representation media type
Link: <next delta URL, including time reference>; rel="Delta"

representation bytes
GetDelta
GET delta URL, including time reference HTTP/1.1
DeltaSuccess(representation, time reference)
HTTP/1.1 200 OK
Content-Type: representation media type
Link: <next delta URL, including time reference>; rel="Next"

representation bytes
NoDeltaYet()
HTTP/1.1 204 No Content
DeltaGone
HTTP/1.1 410 Gone

Sample Code

if (nextDelta == null)
{
fetch_main:
	Request mainRequest;
	mainRequest.url="http://example.com/log"
	Response mainResponse = fetch(mainRequest);
	nextDelta = mainResponse.links["Delta"];
	processMain(mainResponse);
}
else
{
	Request deltaRequest(nextDelta);
fetch_another:
	Response deltaResponse = fetch(request);
	if (response.code == NoContent)
	{
		// Break out of loop
	}
	else if (response.code == Gone)
	{
		jump fetch_main;
	}
	else
	{
		processDelta(deltaResponse);
		jump fetch_another;
	}
}

Related Patterns