Sound advice - blog

Tales from the homeworld

My current feeds

Sat, 2006-Sep-30

Publish/Subscribe and XMPP

I have a long-standing interest in protocols and technologies. In the proprietary system I work with professionally, publish/subscribe is the cornerstone of realtime data collection. Client machines are capable of displaying updates from monitored field equipment in latencies measured according the speed of light, plus a processing delays.

My implementation is proprietary, so I have long been keeping an eye out for promising standards and research that may emerge into something positive. The solution must be architecturally sound. In particular, it should be scalable to the size of the Internet. I have some thoughts about this which mainly stem back to the protocol, Rohit Khare's dissertation Extending the REpresentational State Transfer Architectural Style for Decentralized Systems and my responses to it: Consensus on the Internet Scale, The Estimated Web, Routed REST, REST Trust Relationships, Infinite Buffering, and Use of HTTP verbs in ARREST architectural style.

I like the direct client to server nature of HTTP. You figure out who to connect to using DNS, then make a direct TCP/IP connection. Or indirect. For scalability purposes you can introduce intermediataries. These intermediataries are not confused about their role. It is to direct traffic on to the origin server. Sometimes this involves additional intermediataries, however these proxies are not expected to explicitly route data. That is a job for the network.

takes an instant-messenger approach to communications. JEP-0060 specifies a publish/subscribe mechanism for the XMPP protocol that apparently is seeing use as a transport for atom to notify interested parties when news feeds are updated. I don't mind saying that the fundamental architecture irks me. Instead of talking directly to an end server or being transparently pushed through layers that improve network performance, we start out with the assumption that we are talking to a XMPP server. This server could be anywhere. Chances are that unlike your web proxy, it is not being hosted by your ISP. Instead of measuring the request in terms of the speed of light between source and destination plus processing delays, we need to consider the speed of light and processing delays across a disorganised mishmash of servers from here to Antarctica. XMPP itself also appears to be a poor match to the REST architectural style. On the face of it, XMPP appears to have confusing identifier schemes, nouns, content types, and mish-mash of associated standards and extensions that remind me more of the WS-* stack than specifications or software stacks that are still used by the generation that follows their specifiers.

Nevertheless, GENA is dead outside of UPnP. The internet drafts submitted by Microsoft to the IETF don't match up with the specification that forms part of UPnP. Neither specification matches up to GENA implementations I have seen in wild. I think that the fundamental reason for this is not that HTTP forms a poor transport for subscription at a base technological level, but that firewalls are generally set up to make requests back from HTTP servers impossible as part of a subscription mechanism. As such, a protocol that already supports bidirectional communication and is acceptable to firewalls yields a better chance of ongoing success. For the moment, it is a technology that works on the small scale and in the wild Intenet today. Perhaps from that seed the organisational issue between servers will simply work itself out as the technology and associated traffic volume becomes more substantial and more important. After all, the web itself did not start out as the well-oiled reliable and high-performance machine it is today.

So, it seems reasonable that when it comes to rolling out a standards-based subscription mechanism today that JEP-0060 should be the preferred option ahead of trying to define and promote a HTTP-based specification. That said, there are a number of principles that must be transferrable to this XMPP-based solution:

In good RESTful style, subscriptions transfer a summarised sequence of the states of a resource. The first such state is the resource's state at the time the subscription request was recieved. This allows the state of the resource to be mirrored within a client and for the client to respond to changes in the resource's state. However it is reasonable to also consider subscription to transient data that is never retained as application state in any resource. This data has a null initial state, no matter when it is subscribed to.

Working through the XMPP protocol adds a great deal of complexity to the subscription relationship. Intermediataries handle the subscription, so they must also handle authorisation and other issues normally left out of the protocol to be handled within the origin server. In XMPP, the subscription effectively becomes a channel that certain users have a voice in and that other users can recieve messages from. My expertise is very thin about XMPP, but on the face of things it appears that subscription data is routed through a server that manages the particular channel, the pubsub service. Perhaps this service could be repaced with an origin server if that was desired.

In terms of matching up with my expectations of a subscription service, well... localised resynchronisation and patch updates can both be supported, but not at the same time. The pubsub service can forward the last message to a new subscriber. If that message contains the entire state of the resource, the client is synchronised. If it is a patch update, the client cannot synchronise. There does not appear to be a way to negotiate or inform the client of the nature of the update. "Message" appears to be the only recognised semantic. This is understandable, I suppose, and fits at least a niche of what a pubsub system can be expected to do.

Summarisation seems to be on the cards only at the edge of the network (i.e. the origin server). This is probably the best place for summarisation, however the lack of differential flow control is a concern. The server appears to simply send messages to the pubsub service at the rate that service can accept them. What happens from there is not clearly cemented in my mind. Either the rate is slowest to meet the slowed client, messages are buffered infintely (until the pubsub service crashes), or messages are buffered to a set limit and messages or clients are dropped past that point. There doesn't seem to be any way of reporting flow control back to the origin server in order to shape the summarisation activity at that point. If message dropping is occuring in the pubsub service then this should be more explicit. Other forms of summarisation may be preferrable to the wholesale discard of arbitrary messages.

JEP-0060 is long (really long) and full of inane examples. It is difficult to get a feel for what problems it does and does not solve. I doesn't contain text like "flow control", "loss", "missed", "sequence", "drop"... anything recongnisable as how the subscription model relates to the underlying transport's guarantees. Every time I look through it I feel like crying. Perhaps I am just missing the point, but when it comes to internet-scale subscription I don't think this document puts a standards-based solution in play.

I need to be able synchronise the state of a resource. I need the subscription mechanism to handle exceptional load or high latency situations effectively. I need it to be able to deal with thousands of changes per second across a dispirate client base even in my small example. On the Internet I expect it to deal with millions or billions of changes per second. Will a jabber-style network handle that kind of load without breaking client service guarantees? How are overflow conditions handled? Can messages be lost, reordered, or summarised? Are messages self-descriptive enough to allow summarisation by the pubsub server?

Perhaps I should go and pen an internet draft after all. GENA isn't that far off the mark, and really does work effectively when no firewalls are in the way. Perhaps it would be a useful mechanism to reliably and safely transfer data between jabber pubsub islands.


Sat, 2006-Sep-30

Using TCP Keepalive for Client Failover

I covered recently my foray into using mechanisms that are as standard as possible between client and server to facilitate a fixed-period failover time. A client may have a request outstanding and may be waiting for a response. A client may have subscriptions outstanding to the server. Even a server that transfers its IP or MAC address to its backup during failover does not completely isolate its clients from the failover process. Failover and server restart both cause a loss of the state of the server's TCP/IP stack. When that happens, clients must detect it in order to successfully move their processing to the new server instance.

I had originally pooh poohed TCP/IP keepalive as a limited option. Most (all?) operating systems that support keepalive use system-wide timeout settings, so values can't be tuned based on who you are talking to. I think this might be able to be overcome by solaris zones, however. Also, the failover characteristics of a particular host with respect to the services it talks to are often similar enough that this is not a problem.

I want to keep end-to-end pinging to a minimum, so I only want keepalive to be turned on while a client has requests outstanding. An idle connection should not generate traffic. Interestingly, this seems to be possible by using the socket option. It should be possible to turn the keepalive on when a request is sent, and turn it back off again when the last outstanding response is recieved. In the mean-time the active TCP/IP connection will often be sending data, so keepalives will most often be sent during network lull times while the server is taking time processing.

If I want my four second failover, it should just be a matter of setting the appropriate kernel variables to send requests every second or so and give up after a corresponding number of failures. Combined with IP-level server failover, and subscriptions that are persistent across the failover, this provides a consistent failover experience with a minimum of network load.


Sat, 2006-Sep-30

Common REST Questions

I just came across a blog entry that includes a number of common misconceptions and questions about about REST, here

I posted a response in comments, but I thought I might repeat it here also:

RESTwiki contains a some useful information on how REST models things differently to Object-Orientation. See:

and others. Also, see the rest wikipedia article which sums some aspects of REST up nicely:

The core of prevailing REST philosophy is the rest triangle, where naming of resource is separated from the set of operations that can be performed on resources, and again from the kinds of information representations at those resources. Verbs and content types must be standard if messages are to be self-descripitve, and the requirements of the REST style met. Also, there should be no crossover between the corners of the REST triangle. names should not be found in verbs or content types, except as hyperlinks. Content should not be found in names or verbs. Verbs should not be found in names or content.

REST can be seen a documented-oriented subset of Object-Orientation. It deliberately reduces the expressiveness of Objects down to the capabilities of resources to ensure compatability and interoperability between components of the architecture. Object-Orientation allows too great a scope of variation for internet-scale software systems such as the world-wide-web to develop, and doesn't evolve well as demands on the feature set change. REST is Object-Orientation that works between agencies, between opposing interests. For that you need to make compromises rather than doing things your own way.

Now, to address your example:
Verbs should not be part of the noun-space, so your urls

should not be things you POST to. They should demarcate the "void" state and the "reverse" state of your journal entry. When you GET the void URL it should return the text/plain "true" if the transaction is void and "false" if the transaction is not void. A put of the text/plain "true" will void the transaction, possibling impacting the state demarcated by other resources. Reverse is similar. The URL should be "reversal" rather than "reverse". It should return the url of the reversing transaction, or indicate 404 Not Found to show no reversal. A PUT to the reverse would return 201 Created and further GETs would show the reversal transaction.

Creation in REST is simple. Either the client knows the URL of the resource they want to create and PUT the resource's state to that URL, or the client requests a factory resource add the state it provides to itself. This is designed to either append the state provided or create a new resource to demarcate the new state. POST is more common. The PUT approach requires clients to know something about the namespace that they often shouldn't know outside of some kind of test environment.

On swapping: This is something of an edge case, and this sort of thing comes up less often than you think when you are designing RESTfully from the start. The canonical approach would be to include the position of the resource as part of its content. PUTting over the top of that position would move it. This is messy because it crosses between noun and content spaces. Introducing a SWAP operation is also a problem. HTTP operates on a single resource, so there is no unmunged way to issue a SWAP request. Any such SWAP request would have to assume both of the resources of the unordered list are held by the same server, or that the server of one of these resources was able to operate on the ordered list.

On transactions: The CRUD verb analogy is something of a bane for REST. I prefer cut-and-paste. Interestingly, cut-and-paste on the desktop is quite RESTful. A small number of verbs are able to transfer information in a small number of widely-understood formats from one application to another. The cursor identifies and demarcates the information that will be COPIED (GET) or CUT (GET + DELETE) and the position where the information or state will be PASTED to (PUT to paste over, POST to paste after). The CRUD analogy leaves us wondering how to do transactions, but with the cut-and-paste analogy the answer is obvious: Don't.

In REST, updates are almost universally atomic. You do everything you need to do atomically in a single request, rather than trying to spread it out over several requests and having to add transaction semantics. If you can't see how to do without transactions you are probably applying REST at a lower-level than it is typically applied. In this example, whenever you post a new journal entry you do so as a single operation. POST to a complete representation of the journal entry to a factory resource.

That is not to say that REST can't do transactions. Just POST to a transaction factory resource, perform several POSTS to the transaction that was created, then DELETE (roll-back) or POST a commit marker to the transaction.

How REST maps to objects is up to the implementation. You can evolve your objects independently of the namespace, which is expected to remain stable forever once clients start to use it. The URI space is not a map of your objects, it is a virtual view of the state of your application. Resources are not required or even expected to map directly onto objects. One method of a resource may operate on one object but another may operate on a different object. This is especially the case when state is being created or destroyed.

REST is about modelling the state of your application as resources, then operating on that virtualised state using state transfer methods rather than arbitrary methods with arbitrary parameter lists. REST advocates such as myself will claim this has significant benefits, but I'll refer you to the literature (especially the wikipedia page) rather than list them here.