Sound advice - patterns

Tales from the homeworld

My current feeds

Thu, 2008-Jul-10

Published: Wed Jul 2 20:38:52 EST 2008

Updated: Thu Jul 10 23:20:42 EST 2008

GET

Intent

Transfer a defined set of information from its owner to an anonymous client in a form the client understands.

Motivation

Servers often wish to expose information for general client consumption. In doing so, they do not wish to to introduce unnecessary coupling with the client. GET permits servers to expose information without exposing the method by which that information is produced, and without needing to keep track of individual clients.

The server wants to be able to support clients of different ages, some of whom will share the latest and greatest understanding of how to encode and parse particular kinds of data. Others will be left with legacy implementations. Likewise, the server may have been deployed for some time and may be running a legacy implementation while upgraded clients are present in the architecture.

Clients seek to minimise load on the server in processing their requests, load on the network in transferring messages, or on themselves in processing responses. Clients also need to be able to deal with possible error conditions, including communication failures.

The GET pattern provides a clean client/server separation that is able to survive independent upgrades of each over time, exercises control over traffic and processing waste, and deals with possible errors.

Applicability

GET is appropriate whenever a client wants to acquire the whole of the information behind a known URL, and can decide when it wants to issue the request (subject to a cache miss). Here are some common means by which a URL is discovered:

  1. Direct entry allows a user to enter the URL through an input device
  2. Configuration allows a document to be prepared ahead of time with links that have particular meaning to the client
  3. Hyperlinking is a generalisation of configuration. The document that contains meaningful links may be acquired from anywhere, including an earlier completed GET request
  4. Construction is the assembly of information available to the client into a URL format agreed with the server. This may be achieved by populating a form supplied by the server in an earlier GET request.

Methods of determining when to issue a GET request include:

  1. One-shot, the issuing of a request at a predefined time, when the URL first becomes known, or when the client needs access to the information
  2. Cyclic, the issuing of a request at a predefined rate while the client is active
  3. On cache expiry, the issuing of a request whenever a cache entry expires. Note that this requires the server to set a maximum age on cache entries, something that is not always provided.
  4. Otherwise-triggered, the issuing of a request based on some form of back-channel that indicates information at a given URL may have changed

Structure

GET pattern structure

Participants

Client
  • Keeps a URL that lets it access the Server
  • Issues the GET request
  • Is capable of parsing all forms that the data might be encoded in that are semantically rich enough to use
  • Selects the right parser implementation to use based on the returned document type
  • (optional) Retains a cache of past successful GET responses and their related cache control information
  • Is responsible for overall successful execution of the operation, including modifications to the request and resubmissions of the request
  • Treats a lost response as equivalent to a Resubmit response with no required changes
  • Aborts the operation on a failure response, on a resubmission response that cannot or will not be satisfied, or on a lost response after too many retries.
Server
  • Evaluates any condition supplied in the GET request before performing significant processing
  • Selects the information to return based on the supplied URL
  • (optional) Is configured with mechanism to require the client to resubmit their request with or without modifications
  • Guarantees that a GET request is a read-only operation that is never interpreted by the server as a request to "buy an airline ticket". The server may choose to update log files and other information, but is not free to behave as if the client has requested or authorised the change.
  • Can return the requested information in various formats. Any format which the client might reasonably request with its acceptable types list should be supported.
  • Selects the most appropriate encoding based on the supplied weighted acceptable types list and any preference it may have itself, and returns the document in that format

Collaboration

  • Client issues requests to Server via the Request Interface, modifying and resubmitting its request as needed until:
    1. A success response is elicited
    2. The request condition is not met, meaning that the cached response is still valid
    3. A failure response is elicited
    4. The client is unable to make changes required by a Resubmit response
    5. Client policy prevents either changes required by a Resubmit response, or further resubmissions in general

Consequences

The GET pattern introduces a Uniform Interface for transferring identified sets of information from server to client. Clients and servers of different ages can communicate without impediment, and communication failures can be overcome.

The use of an acceptable types list in a GET request means that clients built during different phases of the architecture will generally be able to communicate. Document-based communication has a degree of flexibility built in with must-ignore parameters. The acceptable types list fills a gap when incompatible changes occur to the set of document types, for example a new type deprecates an old type such as atom depreciating rss for news feed syndication.

An explicit failure response allows problems in the architecture to be reported and repaired as required. The resubmit feature allows temporary or permanent changes to the architecture to be accommodated by components without explicit reconfiguration, simplifying management. Note, however, the potential security implications of allowing one component to reconfigure others. A predefined policy for which modifications are permitted and which are to be treated as failure cases can be useful in security-sensitive environments.

The potential exists in common transports such as HTTP for requests sent down parallel TCP connections or pipelined requests to be processed in a different order to that in which they actually return to Client. This could cause the client to become confused by "seeing" an older state after a more recent state. A simple solution is to hold off sending a GET request to a given URL when the previous related GET has not yet returned.

A client that is holding off sending the next GET request should queue the first such request for the identified URL. After this point it should not queue another GET request to the URL until the previous has been transmitted. There is no point queuing up multiple GET requests for the same URL. If the first request has not been issued by the time motivation to issue a second request comes around, a single request will fulfil the motivation behind both.

Twin consequences of the GET pattern are that interim states at URLs may be missed, and that the architecture as a whole does not become overloaded as the architecture is put under stress. Each GET retrieves the current state of the resource, so rapid changes may see the next GET arrive several changes after an earlier GET. These states will be lost unless an additional buffering mechanism is employed. The client will read back the current state rather than the old transitional states.

The flip-side of this behaviour is that clients are never stuck reading old data. They come completely up to date quickly and process the latest information available. Many algorithms for real-time processing will behave better under this scenario than if they are fed through old changes. The GET pattern can be adapted to a buffering model for algorithms that suffer from losses of interim states.

Implementation

GET can be implemented with HTTP using the following mappings:

GET(url, condition, weighted acceptable types list)
GET url HTTP/1.1
Accept: weighted acceptable types list
If-condition
Success(document, type, cache)
HTTP/1.1 200 OK
Content-Type: type
Cache-Control: cache

document

All 2xx series response codes can be treated as Success responses for GET

Condition Not Met()
HTTP/1.1 304 Not Modified
Fail(reason)
HTTP/1.1 400 Bad Request

reason

Unknown 1xx series response codes can be treated as a Fail for GET. 300 Multiple Choices is a non-implementable Resubmit response for automated clients, so should also be treated as Fail alongside other 3xx series codes that are not understood. 4xx series response codes are Fail, except for 401 Unauthorised and 407 Proxy Authentication Required. These are Resubmit responses and should only be treated as failures if they are not understood. 5xx series responses should be treated as Fail, except for 503 Service Unavailable and 504 Gateway Timeout. These are Resubmit and Response Lost responses, respectively.

Resubmit(required changes)

Any of: 301 Moved Permanently, 302 Found, 303 See Other, 305 Use Proxy, 307 Temporary Redirect, 401 Unauthorized, or 407 Proxy Authentication Required.

Response Lost()

Any loss of communication before a response is received. This may include application or TCP/IP level timeouts, or an explicitly terminated connection. The 504 Gateway Timeout response is also equivalent to Response Lost, and indicates a loss occured somewhere past the TCP connection made directly by the client.

Sample Code

Request request;
request.url="http://example.com/publication-dates"
if cache_manager.fresh(request.url)
{
	// Do nothing. Our cache entry is still fresh.
}
else if (blocked())
{
	// Only queue one request for the URL
	request_pending(url) = true
}
else
{
try_again:
	request.accept=parser.accept
	request.condition=cache_manager.condition(request.url)

	switch (request())
	{
	Success(document, type, cache):
		cache_manager.update(document, type, cache)
		process(parser(document, type))

	Condition Not Met():
		// Do nothing.
		// We have already processed the
		// latest data with our last request.

	Fail(reason):
		log(reason)

	Resubmit(required_changes):
		if policy(request, required_changes)
			request.modify(required_changes)
			jump try_again
		else
			log("Policy forbids request modification")

	Response Lost():
		if policy(request, no required changes)
			jump try_again
		else
			log("Too many retries")
	}
}

Known Uses

GET is widely used on the Web, both under direct human control and under automation. Various aspects of GET are not always used well.

Common errors in applying the GET pattern include:

  • "Unsafe" GET handling by servers, where GET is treated as an update request.
  • Not including the acceptable types list, meaning that the deployed client component will not be readily handled by an upgraded server component. The wrong document type may be returned.
  • Returning the wrong type with a document, or using heuristics based on the URL to determine which parser to invoke for a returned document.
  • Using type identifiers that are too generic. Type specifications such as application/xml or application/rdf+xml could match multiple formats for the return of data, application/atom+xml and application/atom+rdf+xml allow the server to choose a document to return that is more likely to be understood when the client parses it.
  • Returning a different document from the same URL based on session state. GET requests should not create information on the server that has to be tracked and available when the client's next GET request arrives. The client should be able to issue its next request at any time without further coordination with the server. Sessions may be used to short-cut expensive processing over a series of requests. However, an expired or lost session should not cause a given request to fail or fail to be understood.

Related Patterns