Sound advice - blog

Tales from the homeworld

My current feeds

Fri, 2006-Nov-03

Introducing SENA

A am currently in the process of authoring an relating to an internet-scale mechanism. The protocol is HTTP-based, and REST-like. I am currently denoting it as the Scalable Event Notification Architecture (SENA), a play on the old (GENA). It is intended for time-sensitive applications like SCADA and the relaying of stock market quotes, rather than for general use across the Internet. The draft draft can be found here and remains for the present time a work in progress.

Feedback Welcome

I have been soliciting targetted feedback on the document for the last few weeks, with some good input from a number of sources. Please consider proving feedback yourself by emailing me at benjamincarlyle at optusnet.com.au with the subject line "HTTP Subscription".

Architecture

The architecture consists of an originating resource, a set of clients, and of intemeditaries. Clients use a new SUBSCRIBE HTTP verb to request a subscription. This may be passed through without inspection by intermediataries, in which case the server will answer the request directly. A subscription resource is created by the origin server which may be DELETEd to terminate the subscription. A notify client associated with the subscription resource sends HTTP requests back to a specified client Call-Back URL whenever the originating resource changes.

Instead of passing the requests through directly, intermediataries can participate in the subscription. They can intercept the same subscription made by several clients themselves, and subscribe only once to the originating resource. The intermediatary can specify its own notify resource which will in turn notify its clients. This has a similar scalability effect to caching proxies on the Web of today.

Notification Verbs

I currently have a number of notification request verbs listed in the draft. The simplest one that could possibly work is EXPIRE. Based on the EXPIRE semantics, a NOTIFY resource could be told that it needs to revalidate its cache entry for the originating resource. If it is a client the likely response will be to issue an immediate GET request. This request will have a max-age:0 header to ensure revalidation occurs through non-subscription-aware intermediataries.

If the notify resource is operating on behalf of an intermediatary, it may choose to fetch the data immediately given that clients are likely to ask for it again very soon. Alternatively, it may wait for the first GET request from its clients to come in. Because it has a subscrpition to the originating resource, the intermediatary can safely ignore the max-age header. This allows the intermeditary to perform one revalidation for each received EXPIRE request, regardless of the number of clients it has.

The good thing about EXPIRE is that its semantics are so weak it is almost completely unnecessary to validate that it is genunine request from the real notify source. The worst thing an attacker could do is chew up a little extra bandwidth, and that could be detected when the originating resource consistently validated the old content rather than having new content available. EXPIRE also allows all normal GET semantics to apply, including content negotiation. The main bad things about EXPIRE are that it takes an extra two network traversals to get the subscribed data after receiving the request (GET request, GET response), and that you really have to GET the whole resource. There is no means of observing just the changes to a list of 100,000 alarms.

The alternative system is a combination of NOTIFY and PATCHNOTIFY requests. These requests carry the state of the originating resource and changes to that state respectively. The big problems with these requests are in their increased semantic importance. You must be able to trust the sender of the data, which means you need digital signatures guaranteed by a shared certificate authority. This introuduces a significantly higher processing cost to communications. Useful semantics of GET such as content negotiation also disappear. I am almost resigning myself to these methods not being useful. They aren't the simplest thing that could possibly work.

Summarisation

One of the explicit features of the specification is summarisation of data. Most subscription models don't seem to have a way of dealing with notifications through intermediataries that have fixed buffer sizes to a set of clients with different connection characteristics. If an intermeditary has a buffer size of one and recieves an update that can only be delivered to one of two clients, then recieves another update... what does it do?

The intermediatary usually either has to block waiting for the slowest client or kick the slowest client off so that it can deliver messages to the faster clients. The third option is to discard old messages that have not been transmitted to the slow client. In SCADA-like situations this is an obvious winner. The newer message will have newer state, so the slow client is always getting fresh data despite not getting all the data. The fast client is not stopped from recieving messages, so gets any advantages their faster connectivity is designed to bring. Most message delivery mechanisms don't know and can't take the semantics of the message into account. SENA is explictly a state-transfer protocol, thus these semantics of "old, unimportant state" and "new, important state" can be taken into consideration. PATCHNOTIFY reqeusts can even be merged into each other to form a single coherent update.

The EXPIRE request can also be summarised. New EXPIRE requests trump old requests. There is no point in delivering multiple queued EXPIREs. Likewise, the data fetches triggered by EXPIRE requests implicitly summarise the actual sequence of state changes by virtue of multiple changes occuring between GET requests.

Keep-alive

Most subscription mechanisms include a keep-alive function. This typically exists for a number of reasons:

  1. To ensure that server-side resources are not consumed long after a client has forgotten about the subscription
  2. To allow a client to determine that the server has forgotten about the subscription and needs a reminder
  3. To allow a client to detect the death of its server

SENA addresses the first point with a long server-driven keep-alive. I have deliberately pointed to a default period consistent with the default required for TCP/IP keep-alive: Two hours. It should be long enough not to significantly increase the base-load of the Internet associated with ping data, while still allowing servers the opportunity to clean up eventually.

The second point is dealt with in SENA by a prohibition against the server-side losing subscriptions. Subscriptions should be persistent across server failures and failovers. Client-side query of subscription state is permitted via a GET to the subscription resource, however this should not be used as a regular keepalive especially over a short period. Essentially, a server should never lose a subscription and user intervention will be required whenever it does happen.

Death detection is not dealt with in SENA. It is an assumption of SENA that the cost of generic death detection outweighs the benefits. The cost of death detection is at least one message pair exchange per detection period. Over the scale of the internet that sort of base load just doesn't compute. Subscriptions should become active again after the server comes online, therefore server downtime is just another time when the basic condition of the subscription holds: That the client has the freshest possible data. Monitoring of the state of the server is left as a specialised service capability wherever it is required.

Benjamin

Sat, 2006-Sep-30

Publish/Subscribe and XMPP

I have a long-standing interest in protocols and technologies. In the proprietary system I work with professionally, publish/subscribe is the cornerstone of realtime data collection. Client machines are capable of displaying updates from monitored field equipment in latencies measured according the speed of light, plus a processing delays.

My implementation is proprietary, so I have long been keeping an eye out for promising standards and research that may emerge into something positive. The solution must be architecturally sound. In particular, it should be scalable to the size of the Internet. I have some thoughts about this which mainly stem back to the protocol, Rohit Khare's dissertation Extending the REpresentational State Transfer Architectural Style for Decentralized Systems and my responses to it: Consensus on the Internet Scale, The Estimated Web, Routed REST, REST Trust Relationships, Infinite Buffering, and Use of HTTP verbs in ARREST architectural style.

I like the direct client to server nature of HTTP. You figure out who to connect to using DNS, then make a direct TCP/IP connection. Or indirect. For scalability purposes you can introduce intermediataries. These intermediataries are not confused about their role. It is to direct traffic on to the origin server. Sometimes this involves additional intermediataries, however these proxies are not expected to explicitly route data. That is a job for the network.

takes an instant-messenger approach to communications. JEP-0060 specifies a publish/subscribe mechanism for the XMPP protocol that apparently is seeing use as a transport for atom to notify interested parties when news feeds are updated. I don't mind saying that the fundamental architecture irks me. Instead of talking directly to an end server or being transparently pushed through layers that improve network performance, we start out with the assumption that we are talking to a XMPP server. This server could be anywhere. Chances are that unlike your web proxy, it is not being hosted by your ISP. Instead of measuring the request in terms of the speed of light between source and destination plus processing delays, we need to consider the speed of light and processing delays across a disorganised mishmash of servers from here to Antarctica. XMPP itself also appears to be a poor match to the REST architectural style. On the face of it, XMPP appears to have confusing identifier schemes, nouns, content types, and mish-mash of associated standards and extensions that remind me more of the WS-* stack than specifications or software stacks that are still used by the generation that follows their specifiers.

Nevertheless, GENA is dead outside of UPnP. The internet drafts submitted by Microsoft to the IETF don't match up with the specification that forms part of UPnP. Neither specification matches up to GENA implementations I have seen in wild. I think that the fundamental reason for this is not that HTTP forms a poor transport for subscription at a base technological level, but that firewalls are generally set up to make requests back from HTTP servers impossible as part of a subscription mechanism. As such, a protocol that already supports bidirectional communication and is acceptable to firewalls yields a better chance of ongoing success. For the moment, it is a technology that works on the small scale and in the wild Intenet today. Perhaps from that seed the organisational issue between servers will simply work itself out as the technology and associated traffic volume becomes more substantial and more important. After all, the web itself did not start out as the well-oiled reliable and high-performance machine it is today.

So, it seems reasonable that when it comes to rolling out a standards-based subscription mechanism today that JEP-0060 should be the preferred option ahead of trying to define and promote a HTTP-based specification. That said, there are a number of principles that must be transferrable to this XMPP-based solution:

In good RESTful style, subscriptions transfer a summarised sequence of the states of a resource. The first such state is the resource's state at the time the subscription request was recieved. This allows the state of the resource to be mirrored within a client and for the client to respond to changes in the resource's state. However it is reasonable to also consider subscription to transient data that is never retained as application state in any resource. This data has a null initial state, no matter when it is subscribed to.

Working through the XMPP protocol adds a great deal of complexity to the subscription relationship. Intermediataries handle the subscription, so they must also handle authorisation and other issues normally left out of the protocol to be handled within the origin server. In XMPP, the subscription effectively becomes a channel that certain users have a voice in and that other users can recieve messages from. My expertise is very thin about XMPP, but on the face of things it appears that subscription data is routed through a server that manages the particular channel, the pubsub service. Perhaps this service could be repaced with an origin server if that was desired.

In terms of matching up with my expectations of a subscription service, well... localised resynchronisation and patch updates can both be supported, but not at the same time. The pubsub service can forward the last message to a new subscriber. If that message contains the entire state of the resource, the client is synchronised. If it is a patch update, the client cannot synchronise. There does not appear to be a way to negotiate or inform the client of the nature of the update. "Message" appears to be the only recognised semantic. This is understandable, I suppose, and fits at least a niche of what a pubsub system can be expected to do.

Summarisation seems to be on the cards only at the edge of the network (i.e. the origin server). This is probably the best place for summarisation, however the lack of differential flow control is a concern. The server appears to simply send messages to the pubsub service at the rate that service can accept them. What happens from there is not clearly cemented in my mind. Either the rate is slowest to meet the slowed client, messages are buffered infintely (until the pubsub service crashes), or messages are buffered to a set limit and messages or clients are dropped past that point. There doesn't seem to be any way of reporting flow control back to the origin server in order to shape the summarisation activity at that point. If message dropping is occuring in the pubsub service then this should be more explicit. Other forms of summarisation may be preferrable to the wholesale discard of arbitrary messages.

JEP-0060 is long (really long) and full of inane examples. It is difficult to get a feel for what problems it does and does not solve. I doesn't contain text like "flow control", "loss", "missed", "sequence", "drop"... anything recongnisable as how the subscription model relates to the underlying transport's guarantees. Every time I look through it I feel like crying. Perhaps I am just missing the point, but when it comes to internet-scale subscription I don't think this document puts a standards-based solution in play.

I need to be able synchronise the state of a resource. I need the subscription mechanism to handle exceptional load or high latency situations effectively. I need it to be able to deal with thousands of changes per second across a dispirate client base even in my small example. On the Internet I expect it to deal with millions or billions of changes per second. Will a jabber-style network handle that kind of load without breaking client service guarantees? How are overflow conditions handled? Can messages be lost, reordered, or summarised? Are messages self-descriptive enough to allow summarisation by the pubsub server?

Perhaps I should go and pen an internet draft after all. GENA isn't that far off the mark, and really does work effectively when no firewalls are in the way. Perhaps it would be a useful mechanism to reliably and safely transfer data between jabber pubsub islands.

Benjamin

Sat, 2006-Jan-28

Internet-scale Client Failover

Failover is the process of electing a new piece of hardware to take over the role of a failed piece of hardware (or sometimes software), and the process of bringing everyone on board with the new management structure. Detecting failure and electing a new master are not hard problems. Telling everyone about it is hard. You can attack the problem at various levels. You can have the new master take over the IP of the old and broadcast a new arp reply to take over from the MAC address. You can even have the new master take over the IP and MAC addres of the old. If new and old are not on the same subnet, you can try to solve the problem through DNS. The trouble with all of these approaches is that while they solve the problem for new clients that may come along, they don't solve the problem for clients with existing cached DNS entries or existing TCP/IP connections.

Imagine you are a client app, and you have sent a HTTP request to the server. The server fails over, and a new piece of hardware is now serving that IP address. You can still ping it. You can still see it in DNS. The problem is, it doesn't know about your TCP/IP connection to it, or the connection's "waiting for HTTP response" state. Until a new TCP/IP packet associated with the connection hits the new server it won't know you are there. Only when that happens and it returns a packet to that effect will the client learn its connection state is not reflected by the server side. Such a packet won't usually be generated until new request data is sent by the client, and often that just won't ever happen.

Under high load conditions clients should wait patiently to avoid putting extra strain on the server. If a client knows that a response will eventually be forthcoming it should be willing to wait for as long as it takes to generate the response. With the possibility of failover, the problem is that a client cannot know whether the server state reflects its own and cannot know whether a response really will be forthcoming or not. How often it must sample the remote state is determined by the desired failover time. In industrial applications the time may be as low as four or two seconds, and sampling must take place at a rate several times as quickly to allow for lost packets. If sampling is not possible the desired failover time represents the maximum time a server has to respond to its clients, plus network latency. Another means must be used to return the results of processing if any single request takes longer. Clients must use the desired failover time as their request timeout.

If you take the short request route, HTTP permits you to return 202 Accepted to indicate a request has been accepted for processing but without indicating success or failure of the request. If this were used as a matter of course, conventions could be set up to return the HTTP response via a request back to a call-back url. Alternatively, the response could be modelled as a resource on the server which is periodically polled by the client until it exhibits a success or failure status. Neither of these approaches is directly supported by today's browser software, however the latter could be performed using a little meta-refresh magic.

You may not have sufficient information at the application level to support sampling at the TCP/IP level. You would need to know the current sequence numbers of the stack in order to generate a packet that would be rejected by the server in an appropriate way. In practice what you need is a closer vantage point. Someone who is close in terms of network topology to both the old and the new master can easily tell when a failover occurs and publish that information for clients to monitor. On the face of it this just moving the problem around, however a specialised service can more easily ensure that it doesn't ever spend a long time responding to requests. This allows us to employ the techniques which rely on quick responses.

Like the state of http subscriptions, the state of http requests must be sampled if a client is to wait indefinately for a response. How long it should wait depends on the client's service guarantees, and has little to do with what the server considers an appropriate timeframe. Nevertheless, the client's demands put hard limits on the profile of behaviour acceptable on the server side. In subscription the server can simply renew whenever a renew is requested of it, and time a subscription out after a long period. It seems that the handling of a simple request/response couples clients and servers together more closely than even a subscription does, because of the hard limits client timeout puts onto the server side.

Benjamin

Sat, 2005-Nov-19

HTTP in Control Systems

HTTP may not be the first protocol that comes to mind when you think SCADA, or when you think of other kinds of control systems. Even the Internet Protocol is not a traditional SCADA component. SCADA traditionally works of good old serial or radio communications with field devices, and uses specialised protocols that keep bandwidth usage to an absolute minimum. SCADA has two sides, though, and I don't just mean the "Supervisory Control" and the "Data Acquisition" sides. A SCADA system is an information concentration system for operational control of your plant. Having already gotten your information into a concentrated form and place, it makes sense to feed summaries of that data into other systems. In the old parlence of the corporation I happen to work for this was called "Sensor to Boardroom".

One of my drivers in trying to understand some of the characteristics of the web as a distributed architecture has been in trying to expose the data of a SCADA system to other ad hoc systems that may need to utilise SCADA data. SCADA has also come a long way over the years, and now stands more for integration of operational data from various sources than simple plant control. It makes sense to me to think about whether the ways SCADA might expose its data to other systems may also work within a SCADA system composed of different parts. We're in the land of ethernet here, and fast processors. Using a more heavy-weight protocol such as HTTP shouldn't be a concern from the performance perspective, but what else might we have to consider?

Let's draw out a very simple model of a SCADA system. In it we have two server machines running redundantly, plus one client machine seeking information from the servers. This model is effectively replicated over and over for different services and extra clients. I'll quickly summarise some possible issues and work through them one by one:

  1. Timely arrival of data
  2. Deciding who to ask
  3. Quick failover between server machines
  4. Dealing with redundant networks

Timely Data

When I use the word timely, I mean that our client would not get data that is any fresher by polling rapidly. The simplest implementation of this requirement would be... well... to poll rapidly. However, this loads the network and all CPUs unnecessarily and should be avoided in order to maintain adequate system performance. Timely arrival of data in the SCADA world is all about subscription, either ad hoc or preconfigured. I have worked fairly extensively on the appropriate models for this. A client requests subscription of a server. The subscription is periodically renewed and may eventually be deleted. While the subscription is active it delivers state updates to a client URL over some appropriate protocol. Easy. The complications start to appear in the next few points.

Who is the Master?

Deciding who to ask for subscriptions and other services is not as simple as you might think. You could use DNS (or a DNS-like service) in one of two ways. You could use static records, or your could change your records as the availability of servers changes. Dynamic updates would work through some DNS updater application running on one or more machines. It would detect the failure of one host, and nominate the other as the IP address to connect to for your service. Doing it dynamically has a problem that you're working from pretty much a single point of view. What you as the dynamic DNS modifier sees may not be the same as what all clients see. In addition you have the basic problem of the static DNS: Where do you host it? In SCADA everything has to be redundant and robust against failure. No downtime is acceptable. The static approach also pushes the failure detection problem to clients, which may be a problem they aren't capable of solving due to their inherent "dumb" generic functionality.

Rather than solving the problem at the application level you could rely on IP-level failover, however this works best when machines are situated on the same subnet. It becomes more complex to design when main and backup servers are situated in separate control centres for disaster recovery.

Whichever way you turn there are issues. My current direction is to use static DNS (or eqivalent) that specifies all IP addresses that are or may be relevant for the name. Each server should forward requests onto the main if it is not currently master, meaning that it doesn't matter which one is chosen when both servers are up (apart from a slight additonal lag should the wrong server be chosen). Clients should connect to all IP addresses simultaneously if they want to get their request through quickly when one or more servers are down. They should submit their request to the first connected IP, and be prepared to retry on failure to get their message through. TCP/IP has timeouts tuned for operating over the Internet, but these kinds of interactions between clients and servers in the same network are typically much faster. It may be important to ping hosts you have connections to in order to ensure they are still responsive.

It would be nice if TCP/IP timeouts could be tuned more finely. Most operating systems allow tuning of the entire system's connections. Few support tuning on a per-connection basis. If I know the connection I'm making is going to a host that is very close to me in terms of network topology it may be better to declare failures earlier using the standard TCP/IP mechanisms rather than supplimenting with ICMP. Also, the ICMP method for supplimenting TCP/IP in this way relies on not using an IP-level failover techniques between servers.

Client Failover

Quick failover follows on from discovering who to talk to. The same kinds of failture detection mechanisms are required. Fundamentally clients must be able to quickly detect any premature loss of their subscription resource and recreate it. This is made more complicated by the different server side implementations that may make subscription loss more or less likely, and thus the necessary corrective actions that clients may need to take. If a subscription is lost when a single server host fails, it is important that clients check their subscriptions often and also monitor the state of the host that is maintaining their subscription resource. If the host goes down then the subscription must be reestablished as soon as this is discovered. As such the subscription must be periodically tested for existence, preferrably through a RENEW request. Regular RENEW requests over an ICMP-supported TCP/IP connection as described above should be sufficent for even a slowly-responding server application to adequately inform clients that their subscriptions remain active and they should not reattempt creation.

Redundant Networks

SCADA systems typically utilise redundant networks as well as redundant servers. Not only can clients access the servers on two different physical media, the servers can to the same to clients. Like server failover, this could be dealt with at the IP level... however your IP stack would need to work in a very well-defined way with respect to packets you send. I would suggest that each packet be sent to both networks with duplicates discarded on the recieving end. This would very neatly deal with temporary outages in either network without any delays or network hiccups. Ultimately the whole system must be able to run over the single network, so trying to load balance while both are up may be hiding inherent problems in the network topology. Using them both should provide the best network architecture overall.

Unfortunately, I'm not aware of any network stacks that do what I would like. Hey, if you happen to know how to set it up feel free to drop me a line. In the mean-time this is usually dealt with at the application level with two IP addresses per machine. I tell you what: This complicates matters more than you'd think. You end up needing a DNS name for the whole server pair with four IP addresses. You then need an additional DNS name for each of the servers, each with two IP addresses. When you subscribe to a resource you specify the whole server pair DNS name on connection, but the subscrpition resource may only exist on one service. It would be returned with only that sevice's DNS name, but that's still two IP addresses to deal with and ping. All the way through your code you have to deal with this multiple address problem. In the end it doesn't cause a huge theoretical problem to deal with this at the application level, but it does make development and testing a pain in the arse all around.

Conclusion

Because this is all SIL2 software you end up having to write most of it yourself. I've been developing HTTP client and sever software is spurts over the last six months or so, but concertedly over the last few weeks. The beauty is that once you have the bits that need to be SIL2 in place you can access them with off the shelf implementation of both interfaces. Mozilla and curl both get a big workout on my desktop. I expect Apache, maybe Tomcat or Websphere will start getting a workout soon. By rearchitecting around existing web standards it should make it easier for me to produce non-SIL2 implementations of the same basic principles. Parts of the SCADA system that are not safety-related could be built out of commodity components while the ones that are can still work through carefully-crafted proprietary implementations. It's also possible that off the shelf implementations will eventually become so accepted in the industry that they can be used where safety is an issue. We may one day think of apache like we do the operating systems we use. They provide a commodity service that we undertand and have validated very well in our own industry and environment to help us to only have to write software that really adds value to our customers.

On that note, we do have a few jobs going at Westinghouse Rail Systems Australia's Brisbane office to support a few projects that are coming up. Hmm... I don't seem to be able to find them on seek. Email me if you're intersted and I'll pass them on to my manager. You'd be best to use my ben.carlyle at invensys.com address for this purpose.

Benjamin

Sat, 2005-Oct-15

Internet-scale Subscription Lease Durations

Depending on your standpoint you may have different ideas about how long a subscription lease should be. From an independent standpoint we may say that infinite subscription leases are the best way forward. That produces the lowest overall network and processing overhead and thus the best result overall. There are, however, competing interests that influence this number downwards. It is likely the lease should be of finite duration and that duration is likely to count more on the reliability of the server and the demands of the client than on anything else.

As a server I want to free up resources as soon as I can after clients that are uncontactable go away. This is especially the case when the same clients may have reregistered and are effectively consuming my resources twice. The new live registration takes up legitimate resources, but the stale ghost registration takes additional illegitimate resources. I want to balance the cost of holding onto resources against the cost of subscription renewals to decide my desired lease period. I'll probably choose something in the order of the time it takes for a tcp/ip connection to expire, but may choose a smaller number if I expect this to be happening regularly. I don't have an imperative to clean up except for resource consumption. In fact, whenever I'm delivering messages to clients that are up but have forgotten about their subscriptions I should get feedback from them indicating they think I'm sending them spam. It's only subscriptions that are both stale and inactive that chew my resources unnecessarily, and it doesn't cost a lot to manage a subscription in that mode.

As a client, if I lease a subscription I expect the subscription to be honoured. That is to say that I expect to be given timely updates of the information I requested. By timely I mean that I couldn't get the information any sooner by polling. Waiting for the notification should get me the data first. The risk to a client is that the subscription will not be honoured. I may get notifications too late. More importantly my subscription might be lost entirely. REST says that the state of any client and server interaction should be held within the last message that passed between them. Subscription puts a spanner in these works and places an expectation of synchronised interaction state between a falliable client and server.

Depending on the server implementation it may be possible to see a server fail and come back up without any saved subscriptions. It might fail over to a backup instance that isn't aware of some or all of the subscriptions. This would introduce a risk to the client that its data isn't timely. The client might get its data more quickly by polling, or by checking or renewing the subscription at the rate it would otherwise poll. This period for sending renewal messages is defined by need rather than simple resource utilisation. The client must have the data in a timely manner or it may fail to meet its own service obligations. Seconds may count. It must check the subscription over a shorter duration than the limit it itself can have on how out of date its data may be under these circumstances. If it is responsible for getting data to an operator console from the field within five (5) seconds it must check its subscription more frequently than at that rate, or someone must do it on their behalf.

Non-failure subscription loss conditions may exist. It may be more convenient for a server to drop subscriptions and allow clients to resubscribe than to maintain them over certain internal reconfiguration activities. These cases are potentially easier to resolve than server death. They don't result in system failure so the owner of subscriptions can notifiy clients as appropriate. It must in fact do so, and once clients have recieved timely update of the end of their subscriptions they should be free to attempt resubscription. It is server death which is tricky. Whichever way you paint things there is always the chance your server and its cluster will terminate in such a way that your subscriptions are lost. Clients must be able to measure this risk and poll at a rate that provides adequate certainty that timely updates are still being sent.

Benjamin

Sat, 2005-Oct-08

Application Consolidation

Sean McGrath talks about consolidating the data from different applications for the purpose of business reporting. He says that the wrong way to do it is usually to redevelop the functions of both applications into one. He says the right way to do it is usually to grab reports from both system and combine them. There are two issues that must be solved during this process. The first is one of simultaneous agreement. The second is a common language of discourse. I'll address the second point first.

Without a common terminology the information of the two systems can't be combined. If one system thinks of your business as the monitoring of network equipment by asset id while another thinks of your business as the monitoring of network equipment by IP address the results aren't going to mesh together well. You need a reasonably sophisticated tool to combine the data, especially when there isn't a 1:1 mapping between asset id and IP address.

Without simultaneous agreement you may be combining reports that are about different things. If a new customer signs on and has been entered only into one system the combined report may be a nonsense. Even if they are enetered at the same time there is necessarily some difference beteween the time queries are issued to each system. The greater the latency between the report consolidation application and the original systems the less likely it is that the data you get back will still be correct when it is recieved. The greater the difference in latencies between the systems you are consolidating the greater the likelyhood that those reports will be referring to different data. This problem is discussed in some detail in Rohit Khare's 2003 dissertation from the point of view of a single datum. I suspect the results regarding simultaneous agreement for different data will be even more complicated.

If the report is historical in nature or for some other reason isn't sensitive to instantaneous change and if the systems you are consolidating do speak a common language, I suggest that Sean is right. Writing a report consolidator application is probably going to be easier than redeveloping the applications themselves. If you lie in the middle somewhere you'll have some thinking to do.

Benjamin

Sat, 2005-Oct-01

Use of HTTP verbs in ARREST architectural style

This should be my last post on Rohit Khare's Decentralizing REST thesis. I apologise to my readers for somewhat of a blogging glut around this paper but there have been a number of topics I wanted to touch apon. This post concerns the use of HTTP verbs for subscription and notification activities.

In section 5.1 Rohit describes the A+REST architectural style. It uses a WATCH request and and NOTIFY response. By the time he reaches the R+REST and ARREST styles of sections 5.2 and 5.3 he is using SUBSCRIBE requests and POST responses. I feel that the jump to use POST (a standard HTTP request) is unfortunate.

I think Rohit sees POST as the obvious choice here. The server wants to return something to the client, therefore mutating the state of the client, therefore POST is appropriate. rfc2616 has this to say about POST:

The POST method is used to request that the origin server accept the entity enclosed in the request as a new subordinate of the resource identified by the Request-URI in the Request-Line. POST is designed to allow a uniform method to cover the following functions:

  • Annotation of existing resources;
  • Posting a message to a bulletin board, newsgroup, mailing list, or similar group of articles;
  • Providing a block of data, such as the result of submitting a form, to a data-handling process;
  • Extending a database through an append operation.

POST is often used outside of these kinds of context, especially as means of tunnelling alternate protocols or architectural styles over HTTP. In this case though, I think that its use is particularly aggregious. Consider this text from section 9.1.1 of rfc2616:

the convention has been established that the GET and HEAD methods SHOULD NOT have the significance of taking an action other than retrieval. These methods ought to be considered "safe". This allows user agents to represent other methods, such as POST, PUT and DELETE, in a special way, so that the user is made aware of the fact that a possibly unsafe action is being requested.

Naturally, it is not possible to ensure that the server does not generate side-effects as a result of performing a GET request; in fact, some dynamic resources consider that a feature. The important distinction here is that the user did not request the side-effects, so therefore cannot be held accountable for them.

When the server issues a POST it should be acting on behalf of its user. Who its user is is a little unclear at this point. There is the client owned by some agency, the server owned by another, and finally the POST destination possibly owned by an additional agency. If the server is acting on behalf of its owner it should do so with extreme care and be absolutely sure it is willing to take responsibilty for the actions taken in response to the POST. It should provide its own credentials and act in a responsible manner, operating in accordance with the policy its owner sets forward for it.

I see the use of POST in this way as a great security risk. If the server generating POST requests is trusted by anybody then by using POST as a notification it is transferring that trust to its client. Unless client and server are owned by the same organisation or individual an interagency conflict exists and an unjustified trust relationship is created. Instead, the server must provide the client's credentials only or notify the destination server in a non-accountable way. It is important that the server not be seen to be requesting any of the sideeffects the client may generate in response to the notification but instead that those sideeffects are part of the intrinsic behaviour of the destination when provided with trustworthy updates.

Ultimately I don't think it is possible or reasonable for a server to present its client's credentials as its own. There is too much risk that domain name or IP address records will be taken into account when processing trust relationships. I think therefore that a new method is required, just as is provided in the GENA protocol which introduces NOTIFY for the purpose.

It's not a dumb idea to use POST as a notification mechanism. I certainly thought that way when I first came to this area. Other examples also exist. Rohit himself talks about the difficulty of introducing new methods to the web and having to work around this problem in mod_pubsub using HTTP headers. In the end though, I think that the introduction of subscription to the web is something worthy of at least one new verb.

I'm still not sure whether an explicit SUBSCRIBE verb is required. Something tells me that a subscribable resource will always be subscribed to anyway and that GET is ultimately enough to setup a subscription if appropriate headers are supplied. I'm still hoping I'll be able to do something in this area to reach a standard approach.

The established use of GENA in UPnP may tip the cards. The fact that it exists and is widely deployed may outweigh its lack of formal specification and standardisation. Its defects mostly appear asthetic rather than fundamental, and it still may be possible to extend it in a backwards-compatible way.

Benjamin

Sat, 2005-Oct-01

Infinite Buffering

Rohit Khare's 2003 paper on Decentralizing REST makes an important point in section 7.2.4 during his discussion of REST+E (REST with Estimates):

It is impossible to sustain an information transfer rate in excess of a channel's capacity. Unchecked, guaranteed message delivery can inexorably increase latency. To ensure this does not occur - ensuring that buffers, even if necessary, remain finite - information buffered for transmission may need to be summarized, updated, or even dropped while still queued.

It is a classic mistake in message-oriented middleware to combine guaranteed delivery of messages with guaranteed acceptance of messages. Just when the system is overloaded or stressed, everything starts to unravel. Latency increases, and can even reach a point where feedback loops are created in the middleware: Messages designed to keep the middleware operating normally are delayed so much that they cause more messages to be generated that can eventually consume all available bandwidth and resources.

Rohit cites summarisation as a solution, and it is. On the other hand it is important to look at where this summarisation should take place. Generic middleware can't summarise data effectively. It has few choices but to drop the data. I favour an end-to-end solution for summarisation: The messaging middleware must not accept a message unless it is ready to deliver it into finite buffers. The source of the message must wait to send and when it becomes possible to send should be able to alter its choice of message. It should be able to summarise in its own way for its own purposes. Summarisation should only take place at this one location. From this location it is possible to summarise, not based on a set of messages that have not been sent, but on any other currently- accessible data. For instance, a summariser can be written that always sends the current state of a resource rather than the state it had at the first change after the last successful delivery. This doesn't require any extra state information (except for changed/not-changed) if the summariser has access to a common representation of the origin resource. The summariser's program can be as simple as:

  1. Wait for change to occur
  2. Wait until a message can be sent, ignoring further changes
  3. Send a copy of the current resource representation
  4. Repeat

More complicated summarisation is likely to be useful on complex resources such as lists. Avoiding sending the whole list over and over again can make significant inroads to reducing bandwidth and processing costs. This kind of summariser requires more sophisticated state, including the difference between the current and previosly-transmitted list contents.

Relying on end-to-end summarisers on finite buffers allows systems to operate efficiently under extreme load conditions.

Benjamin

Sat, 2005-Oct-01

REST Trust Relationships

Rohit Khare's 2003 paper Decentralizing REST introduces the ARREST architectural style for routed event notifications when agency conflicts exist. The theory is that it can be used according R+REST principles to establish communication channels between network nodes that don't trust each other but which are both trusted by a common client.

Rohit has this to say about that three-way trust relationship in chapter 5.3.2:

Note that while a subscription must be owned by the same agency that owns S, the event source, it can be created by anyone that S's owner trusts. Formally, creating a subscription does not even require the consent of D's owner [the owner of the resource being notified by S], because any resource must be prepared for the possibilty of unwanted notifications ("spam").

If Rohit was only talking about R+REST's single routed notifications I would agree. One notification of the result of some calculation should be dropped by D as spam. Certainly no unauthorised alterations to D should be permitted by D, and this is the basis of Rohit's claim that D need not authorise notifications. Importantly, however, this section is not referring to a one-off notification but to a notification stream. It is essential in this case to have the authority of D before sending more than a finite message sequence to D. Selecting a number of high-volume event sources and directing them to send notifications to an unsuspecting victim is a classic denial of service attack technique. It is therefore imperative to establish D's complicity in the subscription before pummeling the resource with an arbitrary number of notifications.

The classic email technique associated with mailing lists is to send a single email first, requesting authorisation to send further messages. If a positive confirmation is recieved to the email (either as a return email, or a web site submission) then further data can flow. Yahoo has the concept of a set of email addresses which a user has demonstrated are their own, and any new mailing list subscriptions can be requested by the authorised user to be sent to any of those addresses. New addresses require individual confirmation.

I believe that a similar technique is required for HTTP or XMPP notifications before a flood arrives. The receiving resource must acknowledge successful receipt of a single notification message before the subsequent flood is permitted. This avoids the notifying server becoming complicit in the nefarious activities of its authorised users. In the end it may come down to what those users are authorised to do and who they are authorised to do it with. Since many sites on the internet are effectively open to any user, authorised or not, the importance of handling how much trust your site has in its users may be important in the extreme.

Benjamin

Sat, 2005-Oct-01

Routed REST

I think that Rohit Khare's contribution with his 2003 paper on Decentralizing REST holds many valuable insights. In some areas, though, I think he has it wrong. One such area is his chapter on Routed REST (R+REST), chapter 5.2.

The purported reason for deriving this architectural style is to allow agencies that don't trust each other to collaborate in implementing useful functions. Trust is not uniform across the Internet. I may trust my bank and my Internet provider, but they may not trust each other. Proxies on the web may or may not be trusted, so authentication must be done in an end-to-end way between network nodes. Rohit wants to build a collaboration mechanism between servers that don't trust each other implicitly, but are instructed to trust each ohter in specific ways by their client which trusts each service along the chain.

Rohit gives the example of a client, a printer, a notary watermark, and an accounting system. The client trusts its notary watermark to sign and authenticate the printed document, and trusts the printer. The printer trusts the accounting system and uses it to bill the client. Rohit tries to include the notary watermark and the accounting system not as communication endpoints in their own right, but as proxies that transform data that passes through them. To this end he places the notary watermark between the client and printer to transform the request, but places the accounting system betwen printer and client on an alternate return route. He seems to get very excited about this kind of composition and starts talking about MustUnderstand headers and about SOAP- and WS- routing as existing implementations. The summary of communications is as follows:

  1. Send print job from client to notary
  2. Forward notarised job from notary to printer
  3. Send response not back to the notary, but to the printer's accounting system
  4. The accounting system passes the response back to the client

I think that in this chapter he's off the rails. His example can be cleanly implemented in a REST style by maintaining state at relevant points in the pipeline. Instead of transforming the client's request in transit, the client should ask the notary to sign the document and have it returned to the client. The client should only submit a notarised document to the printer. The printer in turn should negotiate with the accounting system to ensure proper billing for the client before returning any kind of success code to the client. The summary of communications is as follows:

  1. Send print job from client to notary
  2. Return notarised job from notary to client
  3. Send notarised job from client to printer
  4. Send request from printer to accounting system
  5. Receive response from accounting system back to printer
  6. Send response from printer back to client

Each interaction is independent of the others and only moves across justified trust relationships.

Rohit cites improved performance out of a routing based system. He says that only four message transmissions need to occur, instead of six using alternate approaches. His alternate approach is different to mine, but both his alternate and my alternate have the same number of messages needing transmission: six. But let's consider that TCP/IP startup requires at least three traversals of the network and also begins transmission slowly. Establishing a TCP/IP connection is quite expensive. If we consider the number of connections involved in his solution (four) compared to the number in my solution and his non-ideal alternative (three) it starts to look more like the routing approach is the less efficient one. This effect should be multiplied over high latency links. When responses are able to make use of the same TCP/IP connection as the request was made on, the system as a whole should actually be more efficient and responsive. Even when it is not more efficient, I would argue that it is signficantly simpler.

Rohit uses this style to build the ARREST style, however using this style as a basis weakens ARREST. He uses routing as a basis for subscription, however in practice whether subscription results come back over the same TCP/IP connection or are routed to a web server using a different TCP/IP connection is a matter of tradeoff of server resources and load.

Benjamin