Sound advice - blog

Tales from the homeworld

My current feeds

Sat, 2005-Oct-01

Bootchart under Debian

My wife has been complaining for a while that my computer is a little slow to boot. It's an aging PC with an 800MHz celeron processor and 384 meg of RAM. Actually, the motherboard on this machine was changed at one point and cat /proc/cpuinfo claims the chip runs at 565MHz so I'm suspicious something was diddled at the time.

I pulled down the bootchart package to have a look at how long things were really taking. Lo and behold she was not exaggerating. My machine was taking 154 seconds to boot. Further investigation showed that hotplug seemed to account for much of the time. hotplug ran from just before the 24 second mark to just before 122 seconds. Following some advice found on the internet I decided to mess with hotplug a bit. The first step was just backgrounding hotplug during boot so that other operations could run in parallel by modifying /etc/init.d/hotplug. This yielded a good speedup with boot time down to 102 seconds, a 52 second improvement. The only ill effect I got from this is my net didn't get initialised. I suspect that the separate hotplug net script didn't like running before hotplug had finished its startup, however I haven't fully explored this problem yet. It's easy enough to do an "ifup eth1" once the boot is finished. It also probably wouldn't have worked except that I still have a reasonably complete /etc/modules file that loads my mouse drivers and the like.

The new bootchart still showed a problem. Hotplug started just before the 30 second mark and finished (possibly was truncated) at the 102 second mark. For that entire time the CPU was maxed out, just as it had been during the hotplug startup when it was not executing in parallel with other activities. I decided to solve the hotplug startup issue conclusively by adding a sleep before it executed in /etc/init.d/hotplug. /etc/init.d/hotplug now has the effect of returning immediately, sleeping for 60 seconds in parallel, and starting hotplug at the expiry of those 60 seconds. This yeilded the best results so far. My machine now boots in 66 seconds, an 88 second or 57% reduction. I wonder if my wife will notice...

Clearly, hotplug's startup is a problem on low-end (cpu-bound) hardware. I don't recommend running it (or at least I don't recommend starting it up as part of your normal boot process) on slow a CPU. This may improve in the future with alternate implementations to the current bash shell scripting approach starting to emerge. A couple of sample bootcharts are available comparing the bash version to a rewrite in perl. I've even seen a compiled version written in C proposed.

A couple of notes on bootchart itself: It works well. I use grub to boot and manually modified the kernel command line each time I wanted bootchart to run. This was just a matter of pressing "e" before the boot process began, positioning the cursor over the kernel command-line, pressing "e" again, appending "init=/sbin/bootchartd" and using "b" to get the boot process started. My only frustration with the bootchart program that generates the pretty bootchart diagrams is that if you stuffed up step one and don't have any bootchart data from which to produce the diagram it returns silenty without explaining what went wrong or where it is looking for the information. Once the information was actually there it worked without any hassles. Well done to its contributors.

Ziga Mahkovec wrote to me via email on Monday October 3, 2005 regarding the silent return when no data was available:

This was actually a bug that I fixed in CVS now -- the renderer will report a "/var/log/bootchart.tgz not found" error.

Great job!

I've upgraded to the latest Debian unstable udev as of October 15, 2005. This package replaces the hotplug scripts with its own event generation system. My bootchart now weighs in at 95s. This is a significant improvement over the older hotplug time of 154s, and only adds about 50% over the theoretical minimum of not doing any hotplug work. The bootchart still shows my CPU being maxed out for this time with no I/O. This might be the ideal time to start preloading files into memory to improve the performance of the subsequent boot load process. I'm having trouble with correct initialisation a few devices in this setup. My sound card device isn't being created despite relevant modules being loaded correctly, so I had to tweak /etc/udev/links.conf. My usb-connected motorola sb4200 cable modem also gets its modules loaded but doesn't automatically ifup at the moment. I have just been executing ifup eth1 after boot, but I may go searching for the right place to hack this into the standard boot process.

With this performance work well underway and clear improvements showing I'm confident that the linux boot process will continue to be refined and streamlined. Good work to the udev developers, and also to those on the gnome camp who seem to be getting very busy around the topic of gnome startup for the upcoming 2.14 release.


Sat, 2005-Oct-01

Use of HTTP verbs in ARREST architectural style

This should be my last post on Rohit Khare's Decentralizing REST thesis. I apologise to my readers for somewhat of a blogging glut around this paper but there have been a number of topics I wanted to touch apon. This post concerns the use of HTTP verbs for subscription and notification activities.

In section 5.1 Rohit describes the A+REST architectural style. It uses a WATCH request and and NOTIFY response. By the time he reaches the R+REST and ARREST styles of sections 5.2 and 5.3 he is using SUBSCRIBE requests and POST responses. I feel that the jump to use POST (a standard HTTP request) is unfortunate.

I think Rohit sees POST as the obvious choice here. The server wants to return something to the client, therefore mutating the state of the client, therefore POST is appropriate. rfc2616 has this to say about POST:

The POST method is used to request that the origin server accept the entity enclosed in the request as a new subordinate of the resource identified by the Request-URI in the Request-Line. POST is designed to allow a uniform method to cover the following functions:

  • Annotation of existing resources;
  • Posting a message to a bulletin board, newsgroup, mailing list, or similar group of articles;
  • Providing a block of data, such as the result of submitting a form, to a data-handling process;
  • Extending a database through an append operation.

POST is often used outside of these kinds of context, especially as means of tunnelling alternate protocols or architectural styles over HTTP. In this case though, I think that its use is particularly aggregious. Consider this text from section 9.1.1 of rfc2616:

the convention has been established that the GET and HEAD methods SHOULD NOT have the significance of taking an action other than retrieval. These methods ought to be considered "safe". This allows user agents to represent other methods, such as POST, PUT and DELETE, in a special way, so that the user is made aware of the fact that a possibly unsafe action is being requested.

Naturally, it is not possible to ensure that the server does not generate side-effects as a result of performing a GET request; in fact, some dynamic resources consider that a feature. The important distinction here is that the user did not request the side-effects, so therefore cannot be held accountable for them.

When the server issues a POST it should be acting on behalf of its user. Who its user is is a little unclear at this point. There is the client owned by some agency, the server owned by another, and finally the POST destination possibly owned by an additional agency. If the server is acting on behalf of its owner it should do so with extreme care and be absolutely sure it is willing to take responsibilty for the actions taken in response to the POST. It should provide its own credentials and act in a responsible manner, operating in accordance with the policy its owner sets forward for it.

I see the use of POST in this way as a great security risk. If the server generating POST requests is trusted by anybody then by using POST as a notification it is transferring that trust to its client. Unless client and server are owned by the same organisation or individual an interagency conflict exists and an unjustified trust relationship is created. Instead, the server must provide the client's credentials only or notify the destination server in a non-accountable way. It is important that the server not be seen to be requesting any of the sideeffects the client may generate in response to the notification but instead that those sideeffects are part of the intrinsic behaviour of the destination when provided with trustworthy updates.

Ultimately I don't think it is possible or reasonable for a server to present its client's credentials as its own. There is too much risk that domain name or IP address records will be taken into account when processing trust relationships. I think therefore that a new method is required, just as is provided in the GENA protocol which introduces NOTIFY for the purpose.

It's not a dumb idea to use POST as a notification mechanism. I certainly thought that way when I first came to this area. Other examples also exist. Rohit himself talks about the difficulty of introducing new methods to the web and having to work around this problem in mod_pubsub using HTTP headers. In the end though, I think that the introduction of subscription to the web is something worthy of at least one new verb.

I'm still not sure whether an explicit SUBSCRIBE verb is required. Something tells me that a subscribable resource will always be subscribed to anyway and that GET is ultimately enough to setup a subscription if appropriate headers are supplied. I'm still hoping I'll be able to do something in this area to reach a standard approach.

The established use of GENA in UPnP may tip the cards. The fact that it exists and is widely deployed may outweigh its lack of formal specification and standardisation. Its defects mostly appear asthetic rather than fundamental, and it still may be possible to extend it in a backwards-compatible way.


Sat, 2005-Oct-01

Infinite Buffering

Rohit Khare's 2003 paper on Decentralizing REST makes an important point in section 7.2.4 during his discussion of REST+E (REST with Estimates):

It is impossible to sustain an information transfer rate in excess of a channel's capacity. Unchecked, guaranteed message delivery can inexorably increase latency. To ensure this does not occur - ensuring that buffers, even if necessary, remain finite - information buffered for transmission may need to be summarized, updated, or even dropped while still queued.

It is a classic mistake in message-oriented middleware to combine guaranteed delivery of messages with guaranteed acceptance of messages. Just when the system is overloaded or stressed, everything starts to unravel. Latency increases, and can even reach a point where feedback loops are created in the middleware: Messages designed to keep the middleware operating normally are delayed so much that they cause more messages to be generated that can eventually consume all available bandwidth and resources.

Rohit cites summarisation as a solution, and it is. On the other hand it is important to look at where this summarisation should take place. Generic middleware can't summarise data effectively. It has few choices but to drop the data. I favour an end-to-end solution for summarisation: The messaging middleware must not accept a message unless it is ready to deliver it into finite buffers. The source of the message must wait to send and when it becomes possible to send should be able to alter its choice of message. It should be able to summarise in its own way for its own purposes. Summarisation should only take place at this one location. From this location it is possible to summarise, not based on a set of messages that have not been sent, but on any other currently- accessible data. For instance, a summariser can be written that always sends the current state of a resource rather than the state it had at the first change after the last successful delivery. This doesn't require any extra state information (except for changed/not-changed) if the summariser has access to a common representation of the origin resource. The summariser's program can be as simple as:

  1. Wait for change to occur
  2. Wait until a message can be sent, ignoring further changes
  3. Send a copy of the current resource representation
  4. Repeat

More complicated summarisation is likely to be useful on complex resources such as lists. Avoiding sending the whole list over and over again can make significant inroads to reducing bandwidth and processing costs. This kind of summariser requires more sophisticated state, including the difference between the current and previosly-transmitted list contents.

Relying on end-to-end summarisers on finite buffers allows systems to operate efficiently under extreme load conditions.


Sat, 2005-Oct-01

REST Trust Relationships

Rohit Khare's 2003 paper Decentralizing REST introduces the ARREST architectural style for routed event notifications when agency conflicts exist. The theory is that it can be used according R+REST principles to establish communication channels between network nodes that don't trust each other but which are both trusted by a common client.

Rohit has this to say about that three-way trust relationship in chapter 5.3.2:

Note that while a subscription must be owned by the same agency that owns S, the event source, it can be created by anyone that S's owner trusts. Formally, creating a subscription does not even require the consent of D's owner [the owner of the resource being notified by S], because any resource must be prepared for the possibilty of unwanted notifications ("spam").

If Rohit was only talking about R+REST's single routed notifications I would agree. One notification of the result of some calculation should be dropped by D as spam. Certainly no unauthorised alterations to D should be permitted by D, and this is the basis of Rohit's claim that D need not authorise notifications. Importantly, however, this section is not referring to a one-off notification but to a notification stream. It is essential in this case to have the authority of D before sending more than a finite message sequence to D. Selecting a number of high-volume event sources and directing them to send notifications to an unsuspecting victim is a classic denial of service attack technique. It is therefore imperative to establish D's complicity in the subscription before pummeling the resource with an arbitrary number of notifications.

The classic email technique associated with mailing lists is to send a single email first, requesting authorisation to send further messages. If a positive confirmation is recieved to the email (either as a return email, or a web site submission) then further data can flow. Yahoo has the concept of a set of email addresses which a user has demonstrated are their own, and any new mailing list subscriptions can be requested by the authorised user to be sent to any of those addresses. New addresses require individual confirmation.

I believe that a similar technique is required for HTTP or XMPP notifications before a flood arrives. The receiving resource must acknowledge successful receipt of a single notification message before the subsequent flood is permitted. This avoids the notifying server becoming complicit in the nefarious activities of its authorised users. In the end it may come down to what those users are authorised to do and who they are authorised to do it with. Since many sites on the internet are effectively open to any user, authorised or not, the importance of handling how much trust your site has in its users may be important in the extreme.


Sat, 2005-Oct-01

Routed REST

I think that Rohit Khare's contribution with his 2003 paper on Decentralizing REST holds many valuable insights. In some areas, though, I think he has it wrong. One such area is his chapter on Routed REST (R+REST), chapter 5.2.

The purported reason for deriving this architectural style is to allow agencies that don't trust each other to collaborate in implementing useful functions. Trust is not uniform across the Internet. I may trust my bank and my Internet provider, but they may not trust each other. Proxies on the web may or may not be trusted, so authentication must be done in an end-to-end way between network nodes. Rohit wants to build a collaboration mechanism between servers that don't trust each other implicitly, but are instructed to trust each ohter in specific ways by their client which trusts each service along the chain.

Rohit gives the example of a client, a printer, a notary watermark, and an accounting system. The client trusts its notary watermark to sign and authenticate the printed document, and trusts the printer. The printer trusts the accounting system and uses it to bill the client. Rohit tries to include the notary watermark and the accounting system not as communication endpoints in their own right, but as proxies that transform data that passes through them. To this end he places the notary watermark between the client and printer to transform the request, but places the accounting system betwen printer and client on an alternate return route. He seems to get very excited about this kind of composition and starts talking about MustUnderstand headers and about SOAP- and WS- routing as existing implementations. The summary of communications is as follows:

  1. Send print job from client to notary
  2. Forward notarised job from notary to printer
  3. Send response not back to the notary, but to the printer's accounting system
  4. The accounting system passes the response back to the client

I think that in this chapter he's off the rails. His example can be cleanly implemented in a REST style by maintaining state at relevant points in the pipeline. Instead of transforming the client's request in transit, the client should ask the notary to sign the document and have it returned to the client. The client should only submit a notarised document to the printer. The printer in turn should negotiate with the accounting system to ensure proper billing for the client before returning any kind of success code to the client. The summary of communications is as follows:

  1. Send print job from client to notary
  2. Return notarised job from notary to client
  3. Send notarised job from client to printer
  4. Send request from printer to accounting system
  5. Receive response from accounting system back to printer
  6. Send response from printer back to client

Each interaction is independent of the others and only moves across justified trust relationships.

Rohit cites improved performance out of a routing based system. He says that only four message transmissions need to occur, instead of six using alternate approaches. His alternate approach is different to mine, but both his alternate and my alternate have the same number of messages needing transmission: six. But let's consider that TCP/IP startup requires at least three traversals of the network and also begins transmission slowly. Establishing a TCP/IP connection is quite expensive. If we consider the number of connections involved in his solution (four) compared to the number in my solution and his non-ideal alternative (three) it starts to look more like the routing approach is the less efficient one. This effect should be multiplied over high latency links. When responses are able to make use of the same TCP/IP connection as the request was made on, the system as a whole should actually be more efficient and responsive. Even when it is not more efficient, I would argue that it is signficantly simpler.

Rohit uses this style to build the ARREST style, however using this style as a basis weakens ARREST. He uses routing as a basis for subscription, however in practice whether subscription results come back over the same TCP/IP connection or are routed to a web server using a different TCP/IP connection is a matter of tradeoff of server resources and load.


Sat, 2005-Oct-01

The Estimated Web

Rohit Khare's dissertation describes the ARREST+E architectural style for building estimated systems on a web-like foundation. He derives this architecture based on styles developed to reduce the time it takes for a client and server to reach consensus on a value (they have the same value). The styles he derives from are based on leases of values which mirror the web's cache expiry model. An assumption of this work is that servers must not change their value until all caches have expired, which is to say that the most recent expiry date provided to any client has passed. He doesn't explicitly cover the way the web already works as an estimated system: by lying about the expiry field.

Most applications on the web do not treat the expirty date as a lease that must be honoured. Instead, they change whenever they want to change and provide an expiry date to clients that represents a value much less than they expect the real page to change at. Most DNS server records remain static for at least months at a time, so an expiry model that permits caching for 24 hours saves bandwith while still ensuring that clients have a high probability of having records that are correct. Simply speaking, if a DNS record changes once every thirty days then a one-day cache expiry gives clients something like a 29 in 30 chance of having data that is up to date. In the mean time caching saves each DNS server from having to answer the same queries every time someone wants to look up a web page or send some email.

The question on the web of today becomes "What will it cost me to have clients with the wrong data?". If you're managing a careful transition then both old and new DNS records should be valid for the duration of the cache expiry. In this case it costs nothing for clients to have the old data. Because it costs them nothing it also costs you nothing, excepting the cost of such a careful transition. When we're talking about web resources the problem becomes more complicated because of the wide variety of things that a web resource could represent. If it is time-critical data of commercial importance to your clients then even short delays in getting the freshest data can be a problem for clients. Clients wishing to have a commercial advantage over each other are likely to poll the resource rapidly to ensure they get the new result first. Whether cache expiry dates are used or not clients will consume your bandwidth and processing power in proportion to their interest.

The ARREST+E style provides an alternative model. Clients no longer need to poll because you're giving them the freshest data possible as it arrives. ARREST+E invovles subscription by clients and notification back to the clients as data changes. It allows for summarisation of data to avoid irrelevant information from being transmitted (such as stale updates) and also for prediction on the client side to try and reduce the estimated error in their copy of the data. If your clients simply must know the result and are willing to compete with each other by effectively launching denial of service attacks on your server then the extra costs of ARREST+E style may be worth it. Storing state for each subscription (including summariser state) may be cheaper than handling the excess load.

On the other hand, most of the Internet doesn't work this way. Clients are polite because they don't have a strong commercial interest in changes to your data. Clients that aren't polite can be denied access for a while until their manners improve. A server-side choice of how good an estimate your copy of the resource representation will be is enough to satisfy everyone.

Whether or not you use ARREST+E with its subscription model some aspects of the architecture may still be of some use. I periodically see complaints about how inefficient HTTP is at accessing RSS feeds. A summariser could reduce the bandwith associated with transferring large feeds. It would be nice to be able to detect feed reader clients to a resource specifically tailored for them each time they arrive. Consider the possibility of returning today's feed data to a client, but directing them next time to a resource that represents only feed items after today. Each time they get a new set of items they would be directed onto a resource that represents items since that set.


Sat, 2005-Oct-01

Consensus on the Internet Scale

I've had a little downtime this week and have used some of it to read a 2003 paper by Rohit Khare, Extending the REpresentational State Transfer (REST) Architectural Style for Decentralized Systems. As of 2002, Rohit was a founder and Chief Technology Officer of KnowNow, itself a commercialisation of technology originally developed for mod_pubsub. I noticed this paper through a Danny Ayers blog entry. I have an interest in publish subscribe systems and this paper lays the theoretical groundwork for the behaviour of such systems on high-latency networks such as the internet. Rohit's thesis weighs in at almost three hundred pages, including associated bibliography and supporting material. As such, I don't intend to cover all of my observations in this blog entry. I'll try to confine myself to the question of consensus on high-latency networks and on the "now" horizon.

One of the first things Rohit does in his thesis is establish a theoretical limit to the latency between two nodes and the rate at which data can change at a source node while being agreed at the sink node. He bases his model around the HTTP concept of expiry and calculates if it takes at most d seconds for the data to get from point A to point B, then A must hold its value constant until at least d seconds and note this period as an expiry in the message it sends. Clearly if A wants to change its value more frequently the data will be expired before it reaches B. This result guides much of the subsequent work as he tries to develop additional REST-like styles to achieve as close as possible to the theoretical result in practice. He ends up with A being able to change its value about every 2d if appropriate architectural components are included.

Two times maximum latency may not be terribly significant on simple short distance networks, however even on ethernet d is theoretically unbounded. It follows then that over a network it isn't possible to always have a fresh copy of an origin server's value. This is where things get interesting. Rohit defines the concept of an "estimated" system. He contrasts this from a centralised system by saying that an estimated system does not guarantee consensus between network nodes. Instead of saying "I know my copy of the variable is the same as the server holds, and thus my decisions will be equivalent to everyone else's", we say "I have a P% chance that my copy is the same as the server holds, and thus my decisions may differ from those of other network nodes". A "now" horizon exists between those components that have a low enough latency to maintain consensus with the origin server's data and those that are too far from the source of data to track changes at the rate they occur.

In SCADA systems we measure physical phenomenon. Even if we only sample that phenomenon at a certain rate, the actual source change can occur at any frequency. If we consider the device that monitors an electric current (our RTU) as the origin server we could attempt to synchronise with the RTU's data before it resamples. If latencies are low enough we could possibly indicate to the user of our system the actual value as we saw it within the "now" horizon and thus in agreement with the RTU's values. However, another way to see the RTU is as a proxy. Because we can't control the frequency of change of the current, the "now" horizon is arbitrarily small. All SCADA systems are estimated rather than consensus-based.

Rohit doesn't directly cover the possibility of an infinitely-small "now" horizon, however his ultimate architecture for estimated systems generally is ARREST+E (Asynchronous Routed REST+Estimation). It contains a client that retries requests to the field device when they fail in order to traverse unreliable networks. It includes credentials for the client to access the field device. It includes a trust manager that rejects bad credentials at the field device, and includes the capability to drop duplicate messages from clients that have retried when a retry wasn't required. The field device would send updates back to the client as often as the network permitted. To keep latency low it would avoid passing on stale data and may perform other summarisation activities to maximise the relevance of data returned to the client. On the other side of this returned data connection the client would include a predictor to provide the best possible estimate of actual state based on the stale data it has at its disposal. Finally, this return path also includes credential transmission and trust management.

This grand unifying model is practical when the number of clients for our field device is small. Unlike the base REST and HTTP models this architecture requires state to be maintained in our field device to keep track of subscriptions and to provide appropriate summaries. The predictor on the client side can be as simple as the null predictor (we assume that the latest value at our disposal is still current) or could include domain-specific knowledge (we know the maximum rate at which this piece of equipment can change state and we know the direction it is moving so we perform a linear extrapolation based on past data input). As I mentioned in my previous entry state may be too expensive to maintain on behalf of the arbitrarily large client base that exists on the Internet at large. There must be some means of aquiring the money to scale servers up appropraitely. If that end of things is secure, however, I think it is feasible.

Importantly, I think it is necessary. In practice I think that most mutable resources on the web are represnting physical phenomenon of one kind or another. Whether it is "the state the user left this document in last time they wrote it" or "the current trading price of MSFT on wall street", people are physical phenomenon who can act on software at any time and in any way. The existing system on the web of using expiry dates is actually a form of estimation that is driven by the server deciding how often a client should poll or recheck their data. This server-driven approach may not suit all clients, which is why web browsers need a "reload" button!

I'll talk some specific issues I had with the thesis in a subsequent post.