Sound advice - blog

Tales from the homeworld

My current feeds

Fri, 2008-Mar-07

Idempotency of PUT, and common mistakes

Reliable messaging based on idempotency allows clients to retry requests safely when they time out. Another property of PUT and DELETE requests is that intermediate requests to a given URL can be discarded. Only the last request needs to make it through. This model allows application-level guarantees to take the place of transport layer guarantees to achieve high performance and scalable reliability. To a network purist this may seem like a step backwards, but in practice it is an example of better training of and agreement between developers leading to a simpler and more sustainable architecture.

The guarantees of PUT and DELETE must be applied consistently in order for APIs to make use of these properties to provide reliable messaging, and that depends a lot on the URL to which they are applied. I have a couple of examples from the SCADA world to share in this article that may be instructive elsewhere. The first relates to URLs that demarcate a changing scope of state, and the second is based on the problems of a mixed REST/non-REST architecture.

URLs with a changing scope

SCADA systems often have a centralised list of alarms that operators are expected to respond to quickly. Design of these alarm lists vary, and sometimes for particular reasons it is useful to request the system acknowledge or erase all alarms in the list.

A naive approach would be to model the requests thusly:

Acknowledge all alarms
PUT https://example.com/all-alarms/acknowledged (text/plain, "1")
Erase all alarms
DELETE https://example.com/all-alarms

The problem with this approach comes about when we consider the effect of repeating a PUT. Is it safe to repeat?

The answer in this case is "no". It is not safe to repeat, because a subsequent PUT or DELETE will affect a different set of alarms to the first request. The first request could be successful, then new alarms raised between the first request and the second. When the second request arrives, it is not "safe". It has an effect greater than a GET request.

This can be a fuzzy line to draw. Two clients making requests to the same URL will naturally interleave. It will always be the last valid client to successfully make or repeat its request that ends up winning out in the race condition. Client A wins out when A makes a request, then B, then A repeats. In this case, A's second PUT is not safe exactly because its request will negate the effect of B's request. However, this is a genuine race of intent. Clients A and B are in a race to see which of their intents will take effect.

This kind of race can be avoided by narrowing the scope of the URLs in use. A better approach would be as follows:

Acknowledge all alarms
PUT https://example.com/all-alarms/acknowledged?before=2008-03-07T12:32Z (text/plain, "1")
Erase all alarms
DELETE https://example.com/all-alarms?before=2008-03-07T12:32Z

These URLs demarcate the same scope each time the request is repeated. Alarms raised after the specified time are not affected by repeated requests, making the repeats safe.

Mixed Idempotency Environments

Consider the case of a circuit breaker in a power distribution system. These devices can be massive, and some have to be replaced after only a small number of trips. Think of a mechanical lever. Each position (tripped or closed) has a separate digital read-back. (Tripped=true, Closed=false) means that the lever is in the tripped position. (Tripped=false, Closed=true) means that the lever is in the closed position. (Tripped=false, Closed=false) means that the lever is in motion, or sometimes that the lever is stuck. (Tripped=true, Closed=true) should be an invalid state.

Sometimes the read-backs can be faulty. Even a breaker that reads as being in the closed state might not actually be conducting electricity. An operator can typically tell this by reading the voltage back from various meters around the network. In this case, the user knows more about what is going on than the machine does. So, how do we get an apparently-closed breaker really into the closed position? We given the gyros another kick. We send another request to close.

Is this idempotent? Can it be modelled as a PUT?

The answer is "no", or at least... not cleanly. A PUT of (text/plain, "1") to indicate a close would not be appropriate if we expected a repeat of the same PUT to kick the gyros again. Likewise, it is not appropriate for the system to automatically retry such a request. The expense in terms of wear and tear may be too costly. What is needed is a different kind of request; a non-idempotent one. This is best to model as either a POST or as a domain-specific method.

There are various ways to ensure reliability in this situation. The simplest is to involve a human in the loop. We specifically don't want to repeat our request due to a desire to keep a human involved with any costly decisions. Avoiding automatic repeat of non-idempotent requests allows us to simply return an error to the user. The user can reassess the situation, and determine for themselves whether to give things another kick along.

Other reliability approaches are possible, but not as strong. It is possible to arm an operation, then execute it as two separate requests. Either the arming or execution can be repeated individually without reissuing the same request. Another alternative is to move to a complete WS-style reliability model.

These sorts of cases don't really come up as part of interactions between information systems. It isn't until you get into the physical world that you really run into this problem. My suggestion is that idempotency is certainly king. Putting a human into the loop for other request that they must be involved in anyway seem reasonable for the cases where reliability is a problem, ie during failover of typically reliable control system.

Conclusion

PUT and DELETE requests are all about specifying the expected outcome of an operation, and allowing the server to decide how to achieve the outcome. Consider the effect of an automated system either discarding intermediate request, or repeating requests when considering whether your PUT or DELETE request on a given URL is appropriate. If it doesn't make sense, you are probably not specifying the URL or method correctly.

Benjamin