Sound advice - blog

Tales from the homeworld

My current feeds

Sun, 2005-Oct-30

Was Windows at fault in the customs computer crash?

Disclaimer: I do not speak for my employer, and their views may differ from my own.

Well, obviously no it wasn't. Pascal Klien worries about the inability to sue Microsoft over the failure of Botany Bay customs computers here in Australia. As a software engineer who works in industries where the kind of money he mentions are thrown around I can tell you that the operating system is usually not the hardest working component of the system. We still do a fair bit for our money. I can tell you that whether we were delivering on Windows, Linux, or Solaris the hardest working pieces of the operational system will be hardware and our software rather than any commodity operating system. The OS provides a few essential commodity services, but it doesn't have any great influcence over system performance in even moderately complicated systems. It doesn't greatly affect scalabilty. Most scaling and performance capability comes down to how you structure a cluster of several machines rather than how any one piece behaves.

Not knowing all the finer details of this case I find the chance that the operating system played any direct role in these problems remote. I also find the suggestion that not being able to sue Microsoft for failures is a little absurd coming from a free software advoate. Should Linus allow himself to be sued when Linux is used in mission-critical applications? Surely not. It is the responsibility of the contractor who sold Customs their new computer system to make sure there are no problems, and you can be sure they are both sueable and that they have sufficient insurance or collateral to pay out any claim. That's assuming that Customs gave them accurate and appropriate specs. If not, then Customs is to blame here and the contractor should be being loaded up with money sufficient to have come in and save the day.

If a contractor chooses to reduce their costs by using commodity software such as operating systems in mission-critical environments they must also bear responsibility for the adequacy of that software in their own operational profile. Writing their own operating system is almost certainly counter-productive for the kind of system being examined here, and as far as commodity operating systems go Windows has a long and sufficiently good track record in many industries.

Benjamin

Sun, 2005-Oct-23

Selling developer hours (generalising the AJ market)

Anthony Towns updates us on the status of the AJ market. He's a free software developer who is heavily involved in the Debian project.

Putting money in the market

I believe Anthony is hoping to do a number things with this market. Benefits that he might be able to achieve include:

These are all benefits that putting money and an economy into the system should be a good means of achieving. You want a new feature but can't do it yourself? Put your money where your mouth is and pay the man who can. A market that uses a low-inflation commodity to send signals between buyer and seller should provide accurate information about what most needs achieving. In a free software world commodities of money and developer time are both low inflation so probably both have a role to play.

Possible problems with AJ's market

Such an economy does need to be of sufficient scale to keep noise low. At the dollar figures AJ has reeled in so far it is an even bet whether the information he is recieving about customer demands is representative of his userbase or not. He may also not sending effective signals back to buyers about what is possible given the time available. His market information bulletin doesn't indicate how much he expects to be paid for any of the items in his todo list. He doesn't quote hours estimates and the bids are not for hours but features of possibly irregular size. Without a mechanism where AJ is able to say "I'm not willing to do this work for less than..." the market is only a one-way information system. These problems may be shaken out with time or prove irrelevant as obviously this market is very new indeed.

I wish, I wish

I've occasionally dreamed about a market like this myself. A nobody in the free software movement like myself would probably not get very far marketing in this way, at least initially. That has put me off investing any significant time in a model. I've tended to dislike the service-orientated redhat or mysql business models. I think these approaches encourage paid developers to think in terms of the services they are going to provide rather than in terms of the quality of the software. This could actively work against software quality by encouraging turnkey "just works" features to be excluded as reducing the value of services that might otherwise be provided. It may be better to use business models that directly encourage end user solutions that are of the highest quality necessary to do the job.

Problems with the proprietary approach

Proprietary software development is based around the production of a feature set that can sold over and over again. In the case of companies like microsoft this can lead to huge profit margins, which equate to inefficiency in the economy as a whole. If features have already been developed then making businesses pay for them over and over again is counter-productive. The amount that the business can spend on its own needs is reduced by the paying of licence fees while the software company itself is not solving new problems for the business. An efficient model would see the software company make a fair profit on the costs to produce features for other businesses while retaining the incentive to produce new features and further improve the productivity of their customers. Software companys that aren't producing more and better efficiency improvements for their customers should wither and die.

Problems with the free approach

In any free model you either need to get the money to produce features up front or you need to make the money back on services. As I mentioned earlier I'm not a fan of the services avenue, although I admit this can be a handy revenue stream. AJ's market attempts to make the money up front because once the work is done and is freely distributed there isn't going to be any more money coming in for that feature or efficiency improvement. Proprietary companies can afford to take a loss during development and make the money up later in sales. Free developers must be paid in advance.

As a free developer you really have two choices:

  1. You team up with an entity with deep pockets that wants your software
  2. You provide a means for collective entities with deep pockets to form

We see the first choice at work in services companies. They are willing to pay developers so that they can earn dollars providing services and addons to the software developed. The problem dealing with an entity of this type is that greed will eventually change your relationship. The company will want to see something from the development that its competitors don't have. This desire may conflict with a free software sensibility. It may only be "we want to have this for six months before everyone else". That's how copyright and patents work, of course. From experience with those two vehicles we know that companies will inevitably want longer and longer periods attached. It may come in the form of "we want this proprietary addon". The only counters to these forms of greed would be outrage in the customer community who really don't want the lock-in.

The second approach is that of AJ's market. Try to attract ad hoc collectives who want a feature to supply small contributions to the goal. To get as many people together as a proprietary company does with its license fee model may be a difficult feat, but you know that you're working towards real customer needs when a large number are willing to supply even small amounts.

The role of eXtreme Programming

Earlier in this article I critisized AJ for not supplying estimates for work and thus not providing adequate feedback to buyers in his economy. I've actually always seen this kind of market through the XP model and viewed it alongside the bounty system sometimes seen in free softare. The basic framework would consist of the following stages:

  1. Define a set of concrete, testable goals
  2. Allow bidding for a concrete non-inflationary commodity towards those goals
  3. Work for the highest bidder according to the amount of commodity bought
  4. Allow goals to be changed and rebid

This approach would accomodate two-way feedback between buyer and seller, including allowance for goal correction should the costs begin to outweigh the benefits of a particular unit of work. I propose the following implementations of each stage:

  1. Definition of testable goals
    Buyers should be able to submit goals to the seller. Seller should make appropriate adjustments to keep scope within a basic regular time period, such as splitting up the work into smaller units. Sellers should ignore or negotiate changes to goals that fall outside the basic project charter. Sellers may also propose and lobby for their own changes and may submit free hours towards goals they are passionate about.
  2. Bidding

    I think AJ has the basic bidding structure right. A bid is an actual payment of dollars rather than a promise of dollars. An alternative model would be to do an eBay-style bidding and invoicing separation where only the work with the highest return would be invoiced. Due the the nature of this work where weeks may pass between the first pledge of money for a particular bug and the actioning of work associated with the bug an invoicing system is probably too complex with the additional complication of having to price in non-payment.

    I think that bidding should be in the form of dollars for hours. Both are non-inflatable commodities and a fair average price should be able to be determined over the longer term. It should be possible to compare the hourly rate over a long period of time for trends and it should be possible for buyers to see what other people think an hour of your time is worth. Buyers may also submit code they hope will reduce or eliminate the effort required to fulfil their chosen goals.

    In my view, the best way to achieve a dollars for hours bidding model is to associate an estimate with each bug. Buyers make payments indicating their preferred bug. The payment total is divided by the estimate to produce an hourly rate. The seller works on the bug with the highest available hourly rate.

    The problem with bidding dollars for hours is how to handle inaccurate estimation. You could spend the estimated hours, pocket the money, and update the estimate for the bug. That might strain relations with your userbase and probably requires refunds for bugs that take less than the allotted time to resolve. Another approach would be for the seller to wear the risk of inaccurate estimates by always completing the work regardless of budget overrun. This would encourage the seller to underpromise, which is usually a good thing for a seller to do.

  3. Working
    There should be clear targets for time worked and the tests for completion should be defined by buyers as part of the goal definition. These should be prioritised so that the most important tests are met early. Bidders should be able to be tracked and contacted during the purchased time for clarification to ensure that the most applicable work is produced.
  4. Rebidding
    Goal changes should be made as necessary with the winning bidders for your time. Once the won hours have been expended buyers may choose to bid for more hours towards their goal or may abandon the goal as uneconomic.

Conclusion

Maybe someone can make use of these ideas. Maybe one day I'll be able to make use of the myself :) In the end software freedom should act as a contract to ensure successful bidders don't get a leg up on each other and that all market forces are openly played out. I hope that great free software may come out of a collective monetary process in the future. At present we have a system based on bidders paying with their own hours, and that's great... however it would be nice to see people with money be able to play the game as well and most importantly it would be nice to see free software being able to be created professionally without the need for paid services to fund the professionals

Benjamin

Sun, 2005-Oct-16

Efficiency of Internet-scale Subscription Leases

Subscription requires synchronisation of subscription state between client and server. Failliable clients and servers may forget about subscriptions. To avoid these stale subscriptions consuming unnecessary resources we lease the subscription. If it is forgotten about and not renewed it will eventually be purged. Lease renewal also allows clients to know whether their own subscription is stale. For clients the need is more pressing as they are not just concerned about resource consumption. They have service obligations that may not be met by a stale subscription.

When client and server correspond using a single subscription leasing is a simple and effective means of ensuring resources aren't tied up and clients can meet their obligations when many subscriptions are in play. Between client and server the effect is less efficient. For n subscriptions between two points, at least n messages are sent per lease duration for long-lived subscriptions. If the client is renewing its subscription more frequently than the rate determined by the lease duration the n requests occur over a correspondingly shorter period.

This is less efficient than could be achieved. Instead of making n requests per period the client could make only one. This should be enough to answer the two questions at issue: "Has my client forgotten about me?" and "Has my server forgotten about me?". This approach would change the subscription mechanism into a more general heartbeating model.

Heartbeating complicates the subscription model. A simple "I'm here" message isn't enough. It is important to tie the set of subscriptions to the identifier of exactly which client is "here". If the client fails to make contact in a designated time all subscriptions would be scrubbed. If the client successfully makes contact its subscriptions would continue to be honoured. This leads to one more thing that must be synchronised between client and server: The set of subscriptions associated with a particular client identifier.

If the client always carries the same identification (say, an IP address) it may find that it successfully renews its subscriptions, but also the set of subscriptions it successfully renewed is much smaller than anticipated. If several subscriptions are established before a server briefly goes down and several more after it is restored to service, it makes sense that the client could renew "all of it's subscriptions" without the client or the server ever becoming the wiser that half of them were lost over the restart. There must be some concept of a session that is unique across both client and server failure events.

Here's how it might work:

  1. Client requests session creation
  2. Server creates session resource and returns URI
  3. Client makes any number of subscriptions, each quoting the session URI
  4. Client periodically renews the session

This approach could still be supported alongside normal lease-based subscription. If no session ID is quoted by a client the server could safely follow standard leasing semantics. If no session creation resource is known by the client it could assume that the server doesn't understand session-based leasing. Server code could run the normal lease expiry process for each subscription, except instead of simply testing "Did I get a renew request this period?", it would ask "Did I get a renewal reuqest this period? If not, does my session still exist?"

A simpler way to achieve session handling that I've used in the past has been to treat a TCP/IP connection as the session. Heartbeats are periodically sent down the connection by both client and server ends to establish liveness. If either end passes data across the connection then there is no need to send heartbeat messages in that period. Closure of the connection terminates all related subscriptions. Unfortunately this model does not work with HTTP, especially once proxies get into the mix. In a HTTP world you can't rely on TCP/IP connection stability.

Admittedly, there is another way to go. You could set up resource groups rather than client<->server sessions. OPC more or less does this in its subscription model. A group can either be predefined by the server for subscription, or can be set up by the client in a way equivalent to using a session. The client then just subscribes to the group, so a straightforward leasing model applies. Clients create group subscriptions and renew group subscriptions. This approach negates problems of efficiency in a similar way to session-based leasing.

Benjamin

Sat, 2005-Oct-15

Internet-scale Subscription Lease Durations

Depending on your standpoint you may have different ideas about how long a subscription lease should be. From an independent standpoint we may say that infinite subscription leases are the best way forward. That produces the lowest overall network and processing overhead and thus the best result overall. There are, however, competing interests that influence this number downwards. It is likely the lease should be of finite duration and that duration is likely to count more on the reliability of the server and the demands of the client than on anything else.

As a server I want to free up resources as soon as I can after clients that are uncontactable go away. This is especially the case when the same clients may have reregistered and are effectively consuming my resources twice. The new live registration takes up legitimate resources, but the stale ghost registration takes additional illegitimate resources. I want to balance the cost of holding onto resources against the cost of subscription renewals to decide my desired lease period. I'll probably choose something in the order of the time it takes for a tcp/ip connection to expire, but may choose a smaller number if I expect this to be happening regularly. I don't have an imperative to clean up except for resource consumption. In fact, whenever I'm delivering messages to clients that are up but have forgotten about their subscriptions I should get feedback from them indicating they think I'm sending them spam. It's only subscriptions that are both stale and inactive that chew my resources unnecessarily, and it doesn't cost a lot to manage a subscription in that mode.

As a client, if I lease a subscription I expect the subscription to be honoured. That is to say that I expect to be given timely updates of the information I requested. By timely I mean that I couldn't get the information any sooner by polling. Waiting for the notification should get me the data first. The risk to a client is that the subscription will not be honoured. I may get notifications too late. More importantly my subscription might be lost entirely. REST says that the state of any client and server interaction should be held within the last message that passed between them. Subscription puts a spanner in these works and places an expectation of synchronised interaction state between a falliable client and server.

Depending on the server implementation it may be possible to see a server fail and come back up without any saved subscriptions. It might fail over to a backup instance that isn't aware of some or all of the subscriptions. This would introduce a risk to the client that its data isn't timely. The client might get its data more quickly by polling, or by checking or renewing the subscription at the rate it would otherwise poll. This period for sending renewal messages is defined by need rather than simple resource utilisation. The client must have the data in a timely manner or it may fail to meet its own service obligations. Seconds may count. It must check the subscription over a shorter duration than the limit it itself can have on how out of date its data may be under these circumstances. If it is responsible for getting data to an operator console from the field within five (5) seconds it must check its subscription more frequently than at that rate, or someone must do it on their behalf.

Non-failure subscription loss conditions may exist. It may be more convenient for a server to drop subscriptions and allow clients to resubscribe than to maintain them over certain internal reconfiguration activities. These cases are potentially easier to resolve than server death. They don't result in system failure so the owner of subscriptions can notifiy clients as appropriate. It must in fact do so, and once clients have recieved timely update of the end of their subscriptions they should be free to attempt resubscription. It is server death which is tricky. Whichever way you paint things there is always the chance your server and its cluster will terminate in such a way that your subscriptions are lost. Clients must be able to measure this risk and poll at a rate that provides adequate certainty that timely updates are still being sent.

Benjamin

Sat, 2005-Oct-08

Application Consolidation

Sean McGrath talks about consolidating the data from different applications for the purpose of business reporting. He says that the wrong way to do it is usually to redevelop the functions of both applications into one. He says the right way to do it is usually to grab reports from both system and combine them. There are two issues that must be solved during this process. The first is one of simultaneous agreement. The second is a common language of discourse. I'll address the second point first.

Without a common terminology the information of the two systems can't be combined. If one system thinks of your business as the monitoring of network equipment by asset id while another thinks of your business as the monitoring of network equipment by IP address the results aren't going to mesh together well. You need a reasonably sophisticated tool to combine the data, especially when there isn't a 1:1 mapping between asset id and IP address.

Without simultaneous agreement you may be combining reports that are about different things. If a new customer signs on and has been entered only into one system the combined report may be a nonsense. Even if they are enetered at the same time there is necessarily some difference beteween the time queries are issued to each system. The greater the latency between the report consolidation application and the original systems the less likely it is that the data you get back will still be correct when it is recieved. The greater the difference in latencies between the systems you are consolidating the greater the likelyhood that those reports will be referring to different data. This problem is discussed in some detail in Rohit Khare's 2003 dissertation from the point of view of a single datum. I suspect the results regarding simultaneous agreement for different data will be even more complicated.

If the report is historical in nature or for some other reason isn't sensitive to instantaneous change and if the systems you are consolidating do speak a common language, I suggest that Sean is right. Writing a report consolidator application is probably going to be easier than redeveloping the applications themselves. If you lie in the middle somewhere you'll have some thinking to do.

Benjamin

Sat, 2005-Oct-01

Bootchart under Debian

My wife has been complaining for a while that my computer is a little slow to boot. It's an aging PC with an 800MHz celeron processor and 384 meg of RAM. Actually, the motherboard on this machine was changed at one point and cat /proc/cpuinfo claims the chip runs at 565MHz so I'm suspicious something was diddled at the time.

I pulled down the bootchart package to have a look at how long things were really taking. Lo and behold she was not exaggerating. My machine was taking 154 seconds to boot. Further investigation showed that hotplug seemed to account for much of the time. hotplug ran from just before the 24 second mark to just before 122 seconds. Following some advice found on the internet I decided to mess with hotplug a bit. The first step was just backgrounding hotplug during boot so that other operations could run in parallel by modifying /etc/init.d/hotplug. This yielded a good speedup with boot time down to 102 seconds, a 52 second improvement. The only ill effect I got from this is my net didn't get initialised. I suspect that the separate hotplug net script didn't like running before hotplug had finished its startup, however I haven't fully explored this problem yet. It's easy enough to do an "ifup eth1" once the boot is finished. It also probably wouldn't have worked except that I still have a reasonably complete /etc/modules file that loads my mouse drivers and the like.

The new bootchart still showed a problem. Hotplug started just before the 30 second mark and finished (possibly was truncated) at the 102 second mark. For that entire time the CPU was maxed out, just as it had been during the hotplug startup when it was not executing in parallel with other activities. I decided to solve the hotplug startup issue conclusively by adding a sleep before it executed in /etc/init.d/hotplug. /etc/init.d/hotplug now has the effect of returning immediately, sleeping for 60 seconds in parallel, and starting hotplug at the expiry of those 60 seconds. This yeilded the best results so far. My machine now boots in 66 seconds, an 88 second or 57% reduction. I wonder if my wife will notice...

Clearly, hotplug's startup is a problem on low-end (cpu-bound) hardware. I don't recommend running it (or at least I don't recommend starting it up as part of your normal boot process) on slow a CPU. This may improve in the future with alternate implementations to the current bash shell scripting approach starting to emerge. A couple of sample bootcharts are available comparing the bash version to a rewrite in perl. I've even seen a compiled version written in C proposed.

A couple of notes on bootchart itself: It works well. I use grub to boot and manually modified the kernel command line each time I wanted bootchart to run. This was just a matter of pressing "e" before the boot process began, positioning the cursor over the kernel command-line, pressing "e" again, appending "init=/sbin/bootchartd" and using "b" to get the boot process started. My only frustration with the bootchart program that generates the pretty bootchart diagrams is that if you stuffed up step one and don't have any bootchart data from which to produce the diagram it returns silenty without explaining what went wrong or where it is looking for the information. Once the information was actually there it worked without any hassles. Well done to its contributors.

Update:
Ziga Mahkovec wrote to me via email on Monday October 3, 2005 regarding the silent return when no data was available:

This was actually a bug that I fixed in CVS now -- the renderer will report a "/var/log/bootchart.tgz not found" error.

Great job!

Update:
I've upgraded to the latest Debian unstable udev as of October 15, 2005. This package replaces the hotplug scripts with its own event generation system. My bootchart now weighs in at 95s. This is a significant improvement over the older hotplug time of 154s, and only adds about 50% over the theoretical minimum of not doing any hotplug work. The bootchart still shows my CPU being maxed out for this time with no I/O. This might be the ideal time to start preloading files into memory to improve the performance of the subsequent boot load process. I'm having trouble with correct initialisation a few devices in this setup. My sound card device isn't being created despite relevant modules being loaded correctly, so I had to tweak /etc/udev/links.conf. My usb-connected motorola sb4200 cable modem also gets its modules loaded but doesn't automatically ifup at the moment. I have just been executing ifup eth1 after boot, but I may go searching for the right place to hack this into the standard boot process.

With this performance work well underway and clear improvements showing I'm confident that the linux boot process will continue to be refined and streamlined. Good work to the udev developers, and also to those on the gnome camp who seem to be getting very busy around the topic of gnome startup for the upcoming 2.14 release.

Benjamin

Sat, 2005-Oct-01

Use of HTTP verbs in ARREST architectural style

This should be my last post on Rohit Khare's Decentralizing REST thesis. I apologise to my readers for somewhat of a blogging glut around this paper but there have been a number of topics I wanted to touch apon. This post concerns the use of HTTP verbs for subscription and notification activities.

In section 5.1 Rohit describes the A+REST architectural style. It uses a WATCH request and and NOTIFY response. By the time he reaches the R+REST and ARREST styles of sections 5.2 and 5.3 he is using SUBSCRIBE requests and POST responses. I feel that the jump to use POST (a standard HTTP request) is unfortunate.

I think Rohit sees POST as the obvious choice here. The server wants to return something to the client, therefore mutating the state of the client, therefore POST is appropriate. rfc2616 has this to say about POST:

The POST method is used to request that the origin server accept the entity enclosed in the request as a new subordinate of the resource identified by the Request-URI in the Request-Line. POST is designed to allow a uniform method to cover the following functions:

  • Annotation of existing resources;
  • Posting a message to a bulletin board, newsgroup, mailing list, or similar group of articles;
  • Providing a block of data, such as the result of submitting a form, to a data-handling process;
  • Extending a database through an append operation.

POST is often used outside of these kinds of context, especially as means of tunnelling alternate protocols or architectural styles over HTTP. In this case though, I think that its use is particularly aggregious. Consider this text from section 9.1.1 of rfc2616:

the convention has been established that the GET and HEAD methods SHOULD NOT have the significance of taking an action other than retrieval. These methods ought to be considered "safe". This allows user agents to represent other methods, such as POST, PUT and DELETE, in a special way, so that the user is made aware of the fact that a possibly unsafe action is being requested.

Naturally, it is not possible to ensure that the server does not generate side-effects as a result of performing a GET request; in fact, some dynamic resources consider that a feature. The important distinction here is that the user did not request the side-effects, so therefore cannot be held accountable for them.

When the server issues a POST it should be acting on behalf of its user. Who its user is is a little unclear at this point. There is the client owned by some agency, the server owned by another, and finally the POST destination possibly owned by an additional agency. If the server is acting on behalf of its owner it should do so with extreme care and be absolutely sure it is willing to take responsibilty for the actions taken in response to the POST. It should provide its own credentials and act in a responsible manner, operating in accordance with the policy its owner sets forward for it.

I see the use of POST in this way as a great security risk. If the server generating POST requests is trusted by anybody then by using POST as a notification it is transferring that trust to its client. Unless client and server are owned by the same organisation or individual an interagency conflict exists and an unjustified trust relationship is created. Instead, the server must provide the client's credentials only or notify the destination server in a non-accountable way. It is important that the server not be seen to be requesting any of the sideeffects the client may generate in response to the notification but instead that those sideeffects are part of the intrinsic behaviour of the destination when provided with trustworthy updates.

Ultimately I don't think it is possible or reasonable for a server to present its client's credentials as its own. There is too much risk that domain name or IP address records will be taken into account when processing trust relationships. I think therefore that a new method is required, just as is provided in the GENA protocol which introduces NOTIFY for the purpose.

It's not a dumb idea to use POST as a notification mechanism. I certainly thought that way when I first came to this area. Other examples also exist. Rohit himself talks about the difficulty of introducing new methods to the web and having to work around this problem in mod_pubsub using HTTP headers. In the end though, I think that the introduction of subscription to the web is something worthy of at least one new verb.

I'm still not sure whether an explicit SUBSCRIBE verb is required. Something tells me that a subscribable resource will always be subscribed to anyway and that GET is ultimately enough to setup a subscription if appropriate headers are supplied. I'm still hoping I'll be able to do something in this area to reach a standard approach.

The established use of GENA in UPnP may tip the cards. The fact that it exists and is widely deployed may outweigh its lack of formal specification and standardisation. Its defects mostly appear asthetic rather than fundamental, and it still may be possible to extend it in a backwards-compatible way.

Benjamin

Sat, 2005-Oct-01

Infinite Buffering

Rohit Khare's 2003 paper on Decentralizing REST makes an important point in section 7.2.4 during his discussion of REST+E (REST with Estimates):

It is impossible to sustain an information transfer rate in excess of a channel's capacity. Unchecked, guaranteed message delivery can inexorably increase latency. To ensure this does not occur - ensuring that buffers, even if necessary, remain finite - information buffered for transmission may need to be summarized, updated, or even dropped while still queued.

It is a classic mistake in message-oriented middleware to combine guaranteed delivery of messages with guaranteed acceptance of messages. Just when the system is overloaded or stressed, everything starts to unravel. Latency increases, and can even reach a point where feedback loops are created in the middleware: Messages designed to keep the middleware operating normally are delayed so much that they cause more messages to be generated that can eventually consume all available bandwidth and resources.

Rohit cites summarisation as a solution, and it is. On the other hand it is important to look at where this summarisation should take place. Generic middleware can't summarise data effectively. It has few choices but to drop the data. I favour an end-to-end solution for summarisation: The messaging middleware must not accept a message unless it is ready to deliver it into finite buffers. The source of the message must wait to send and when it becomes possible to send should be able to alter its choice of message. It should be able to summarise in its own way for its own purposes. Summarisation should only take place at this one location. From this location it is possible to summarise, not based on a set of messages that have not been sent, but on any other currently- accessible data. For instance, a summariser can be written that always sends the current state of a resource rather than the state it had at the first change after the last successful delivery. This doesn't require any extra state information (except for changed/not-changed) if the summariser has access to a common representation of the origin resource. The summariser's program can be as simple as:

  1. Wait for change to occur
  2. Wait until a message can be sent, ignoring further changes
  3. Send a copy of the current resource representation
  4. Repeat

More complicated summarisation is likely to be useful on complex resources such as lists. Avoiding sending the whole list over and over again can make significant inroads to reducing bandwidth and processing costs. This kind of summariser requires more sophisticated state, including the difference between the current and previosly-transmitted list contents.

Relying on end-to-end summarisers on finite buffers allows systems to operate efficiently under extreme load conditions.

Benjamin

Sat, 2005-Oct-01

REST Trust Relationships

Rohit Khare's 2003 paper Decentralizing REST introduces the ARREST architectural style for routed event notifications when agency conflicts exist. The theory is that it can be used according R+REST principles to establish communication channels between network nodes that don't trust each other but which are both trusted by a common client.

Rohit has this to say about that three-way trust relationship in chapter 5.3.2:

Note that while a subscription must be owned by the same agency that owns S, the event source, it can be created by anyone that S's owner trusts. Formally, creating a subscription does not even require the consent of D's owner [the owner of the resource being notified by S], because any resource must be prepared for the possibilty of unwanted notifications ("spam").

If Rohit was only talking about R+REST's single routed notifications I would agree. One notification of the result of some calculation should be dropped by D as spam. Certainly no unauthorised alterations to D should be permitted by D, and this is the basis of Rohit's claim that D need not authorise notifications. Importantly, however, this section is not referring to a one-off notification but to a notification stream. It is essential in this case to have the authority of D before sending more than a finite message sequence to D. Selecting a number of high-volume event sources and directing them to send notifications to an unsuspecting victim is a classic denial of service attack technique. It is therefore imperative to establish D's complicity in the subscription before pummeling the resource with an arbitrary number of notifications.

The classic email technique associated with mailing lists is to send a single email first, requesting authorisation to send further messages. If a positive confirmation is recieved to the email (either as a return email, or a web site submission) then further data can flow. Yahoo has the concept of a set of email addresses which a user has demonstrated are their own, and any new mailing list subscriptions can be requested by the authorised user to be sent to any of those addresses. New addresses require individual confirmation.

I believe that a similar technique is required for HTTP or XMPP notifications before a flood arrives. The receiving resource must acknowledge successful receipt of a single notification message before the subsequent flood is permitted. This avoids the notifying server becoming complicit in the nefarious activities of its authorised users. In the end it may come down to what those users are authorised to do and who they are authorised to do it with. Since many sites on the internet are effectively open to any user, authorised or not, the importance of handling how much trust your site has in its users may be important in the extreme.

Benjamin

Sat, 2005-Oct-01

Routed REST

I think that Rohit Khare's contribution with his 2003 paper on Decentralizing REST holds many valuable insights. In some areas, though, I think he has it wrong. One such area is his chapter on Routed REST (R+REST), chapter 5.2.

The purported reason for deriving this architectural style is to allow agencies that don't trust each other to collaborate in implementing useful functions. Trust is not uniform across the Internet. I may trust my bank and my Internet provider, but they may not trust each other. Proxies on the web may or may not be trusted, so authentication must be done in an end-to-end way between network nodes. Rohit wants to build a collaboration mechanism between servers that don't trust each other implicitly, but are instructed to trust each ohter in specific ways by their client which trusts each service along the chain.

Rohit gives the example of a client, a printer, a notary watermark, and an accounting system. The client trusts its notary watermark to sign and authenticate the printed document, and trusts the printer. The printer trusts the accounting system and uses it to bill the client. Rohit tries to include the notary watermark and the accounting system not as communication endpoints in their own right, but as proxies that transform data that passes through them. To this end he places the notary watermark between the client and printer to transform the request, but places the accounting system betwen printer and client on an alternate return route. He seems to get very excited about this kind of composition and starts talking about MustUnderstand headers and about SOAP- and WS- routing as existing implementations. The summary of communications is as follows:

  1. Send print job from client to notary
  2. Forward notarised job from notary to printer
  3. Send response not back to the notary, but to the printer's accounting system
  4. The accounting system passes the response back to the client

I think that in this chapter he's off the rails. His example can be cleanly implemented in a REST style by maintaining state at relevant points in the pipeline. Instead of transforming the client's request in transit, the client should ask the notary to sign the document and have it returned to the client. The client should only submit a notarised document to the printer. The printer in turn should negotiate with the accounting system to ensure proper billing for the client before returning any kind of success code to the client. The summary of communications is as follows:

  1. Send print job from client to notary
  2. Return notarised job from notary to client
  3. Send notarised job from client to printer
  4. Send request from printer to accounting system
  5. Receive response from accounting system back to printer
  6. Send response from printer back to client

Each interaction is independent of the others and only moves across justified trust relationships.

Rohit cites improved performance out of a routing based system. He says that only four message transmissions need to occur, instead of six using alternate approaches. His alternate approach is different to mine, but both his alternate and my alternate have the same number of messages needing transmission: six. But let's consider that TCP/IP startup requires at least three traversals of the network and also begins transmission slowly. Establishing a TCP/IP connection is quite expensive. If we consider the number of connections involved in his solution (four) compared to the number in my solution and his non-ideal alternative (three) it starts to look more like the routing approach is the less efficient one. This effect should be multiplied over high latency links. When responses are able to make use of the same TCP/IP connection as the request was made on, the system as a whole should actually be more efficient and responsive. Even when it is not more efficient, I would argue that it is signficantly simpler.

Rohit uses this style to build the ARREST style, however using this style as a basis weakens ARREST. He uses routing as a basis for subscription, however in practice whether subscription results come back over the same TCP/IP connection or are routed to a web server using a different TCP/IP connection is a matter of tradeoff of server resources and load.

Benjamin

Sat, 2005-Oct-01

The Estimated Web

Rohit Khare's dissertation describes the ARREST+E architectural style for building estimated systems on a web-like foundation. He derives this architecture based on styles developed to reduce the time it takes for a client and server to reach consensus on a value (they have the same value). The styles he derives from are based on leases of values which mirror the web's cache expiry model. An assumption of this work is that servers must not change their value until all caches have expired, which is to say that the most recent expiry date provided to any client has passed. He doesn't explicitly cover the way the web already works as an estimated system: by lying about the expiry field.

Most applications on the web do not treat the expirty date as a lease that must be honoured. Instead, they change whenever they want to change and provide an expiry date to clients that represents a value much less than they expect the real page to change at. Most DNS server records remain static for at least months at a time, so an expiry model that permits caching for 24 hours saves bandwith while still ensuring that clients have a high probability of having records that are correct. Simply speaking, if a DNS record changes once every thirty days then a one-day cache expiry gives clients something like a 29 in 30 chance of having data that is up to date. In the mean time caching saves each DNS server from having to answer the same queries every time someone wants to look up a web page or send some email.

The question on the web of today becomes "What will it cost me to have clients with the wrong data?". If you're managing a careful transition then both old and new DNS records should be valid for the duration of the cache expiry. In this case it costs nothing for clients to have the old data. Because it costs them nothing it also costs you nothing, excepting the cost of such a careful transition. When we're talking about web resources the problem becomes more complicated because of the wide variety of things that a web resource could represent. If it is time-critical data of commercial importance to your clients then even short delays in getting the freshest data can be a problem for clients. Clients wishing to have a commercial advantage over each other are likely to poll the resource rapidly to ensure they get the new result first. Whether cache expiry dates are used or not clients will consume your bandwidth and processing power in proportion to their interest.

The ARREST+E style provides an alternative model. Clients no longer need to poll because you're giving them the freshest data possible as it arrives. ARREST+E invovles subscription by clients and notification back to the clients as data changes. It allows for summarisation of data to avoid irrelevant information from being transmitted (such as stale updates) and also for prediction on the client side to try and reduce the estimated error in their copy of the data. If your clients simply must know the result and are willing to compete with each other by effectively launching denial of service attacks on your server then the extra costs of ARREST+E style may be worth it. Storing state for each subscription (including summariser state) may be cheaper than handling the excess load.

On the other hand, most of the Internet doesn't work this way. Clients are polite because they don't have a strong commercial interest in changes to your data. Clients that aren't polite can be denied access for a while until their manners improve. A server-side choice of how good an estimate your copy of the resource representation will be is enough to satisfy everyone.

Whether or not you use ARREST+E with its subscription model some aspects of the architecture may still be of some use. I periodically see complaints about how inefficient HTTP is at accessing RSS feeds. A summariser could reduce the bandwith associated with transferring large feeds. It would be nice to be able to detect feed reader clients to a resource specifically tailored for them each time they arrive. Consider the possibility of returning today's feed data to a client, but directing them next time to a resource that represents only feed items after today. Each time they get a new set of items they would be directed onto a resource that represents items since that set.

Benjamin

Sat, 2005-Oct-01

Consensus on the Internet Scale

I've had a little downtime this week and have used some of it to read a 2003 paper by Rohit Khare, Extending the REpresentational State Transfer (REST) Architectural Style for Decentralized Systems. As of 2002, Rohit was a founder and Chief Technology Officer of KnowNow, itself a commercialisation of technology originally developed for mod_pubsub. I noticed this paper through a Danny Ayers blog entry. I have an interest in publish subscribe systems and this paper lays the theoretical groundwork for the behaviour of such systems on high-latency networks such as the internet. Rohit's thesis weighs in at almost three hundred pages, including associated bibliography and supporting material. As such, I don't intend to cover all of my observations in this blog entry. I'll try to confine myself to the question of consensus on high-latency networks and on the "now" horizon.

One of the first things Rohit does in his thesis is establish a theoretical limit to the latency between two nodes and the rate at which data can change at a source node while being agreed at the sink node. He bases his model around the HTTP concept of expiry and calculates if it takes at most d seconds for the data to get from point A to point B, then A must hold its value constant until at least d seconds and note this period as an expiry in the message it sends. Clearly if A wants to change its value more frequently the data will be expired before it reaches B. This result guides much of the subsequent work as he tries to develop additional REST-like styles to achieve as close as possible to the theoretical result in practice. He ends up with A being able to change its value about every 2d if appropriate architectural components are included.

Two times maximum latency may not be terribly significant on simple short distance networks, however even on ethernet d is theoretically unbounded. It follows then that over a network it isn't possible to always have a fresh copy of an origin server's value. This is where things get interesting. Rohit defines the concept of an "estimated" system. He contrasts this from a centralised system by saying that an estimated system does not guarantee consensus between network nodes. Instead of saying "I know my copy of the variable is the same as the server holds, and thus my decisions will be equivalent to everyone else's", we say "I have a P% chance that my copy is the same as the server holds, and thus my decisions may differ from those of other network nodes". A "now" horizon exists between those components that have a low enough latency to maintain consensus with the origin server's data and those that are too far from the source of data to track changes at the rate they occur.

In SCADA systems we measure physical phenomenon. Even if we only sample that phenomenon at a certain rate, the actual source change can occur at any frequency. If we consider the device that monitors an electric current (our RTU) as the origin server we could attempt to synchronise with the RTU's data before it resamples. If latencies are low enough we could possibly indicate to the user of our system the actual value as we saw it within the "now" horizon and thus in agreement with the RTU's values. However, another way to see the RTU is as a proxy. Because we can't control the frequency of change of the current, the "now" horizon is arbitrarily small. All SCADA systems are estimated rather than consensus-based.

Rohit doesn't directly cover the possibility of an infinitely-small "now" horizon, however his ultimate architecture for estimated systems generally is ARREST+E (Asynchronous Routed REST+Estimation). It contains a client that retries requests to the field device when they fail in order to traverse unreliable networks. It includes credentials for the client to access the field device. It includes a trust manager that rejects bad credentials at the field device, and includes the capability to drop duplicate messages from clients that have retried when a retry wasn't required. The field device would send updates back to the client as often as the network permitted. To keep latency low it would avoid passing on stale data and may perform other summarisation activities to maximise the relevance of data returned to the client. On the other side of this returned data connection the client would include a predictor to provide the best possible estimate of actual state based on the stale data it has at its disposal. Finally, this return path also includes credential transmission and trust management.

This grand unifying model is practical when the number of clients for our field device is small. Unlike the base REST and HTTP models this architecture requires state to be maintained in our field device to keep track of subscriptions and to provide appropriate summaries. The predictor on the client side can be as simple as the null predictor (we assume that the latest value at our disposal is still current) or could include domain-specific knowledge (we know the maximum rate at which this piece of equipment can change state and we know the direction it is moving so we perform a linear extrapolation based on past data input). As I mentioned in my previous entry state may be too expensive to maintain on behalf of the arbitrarily large client base that exists on the Internet at large. There must be some means of aquiring the money to scale servers up appropraitely. If that end of things is secure, however, I think it is feasible.

Importantly, I think it is necessary. In practice I think that most mutable resources on the web are represnting physical phenomenon of one kind or another. Whether it is "the state the user left this document in last time they wrote it" or "the current trading price of MSFT on wall street", people are physical phenomenon who can act on software at any time and in any way. The existing system on the web of using expiry dates is actually a form of estimation that is driven by the server deciding how often a client should poll or recheck their data. This server-driven approach may not suit all clients, which is why web browsers need a "reload" button!

I'll talk some specific issues I had with the thesis in a subsequent post.

Benjamin