Sound advice - blog

Tales from the homeworld

My current feeds

Sat, 2005-Nov-19

HTTP in Control Systems

HTTP may not be the first protocol that comes to mind when you think SCADA, or when you think of other kinds of control systems. Even the Internet Protocol is not a traditional SCADA component. SCADA traditionally works of good old serial or radio communications with field devices, and uses specialised protocols that keep bandwidth usage to an absolute minimum. SCADA has two sides, though, and I don't just mean the "Supervisory Control" and the "Data Acquisition" sides. A SCADA system is an information concentration system for operational control of your plant. Having already gotten your information into a concentrated form and place, it makes sense to feed summaries of that data into other systems. In the old parlence of the corporation I happen to work for this was called "Sensor to Boardroom".

One of my drivers in trying to understand some of the characteristics of the web as a distributed architecture has been in trying to expose the data of a SCADA system to other ad hoc systems that may need to utilise SCADA data. SCADA has also come a long way over the years, and now stands more for integration of operational data from various sources than simple plant control. It makes sense to me to think about whether the ways SCADA might expose its data to other systems may also work within a SCADA system composed of different parts. We're in the land of ethernet here, and fast processors. Using a more heavy-weight protocol such as HTTP shouldn't be a concern from the performance perspective, but what else might we have to consider?

Let's draw out a very simple model of a SCADA system. In it we have two server machines running redundantly, plus one client machine seeking information from the servers. This model is effectively replicated over and over for different services and extra clients. I'll quickly summarise some possible issues and work through them one by one:

  1. Timely arrival of data
  2. Deciding who to ask
  3. Quick failover between server machines
  4. Dealing with redundant networks

Timely Data

When I use the word timely, I mean that our client would not get data that is any fresher by polling rapidly. The simplest implementation of this requirement would be... well... to poll rapidly. However, this loads the network and all CPUs unnecessarily and should be avoided in order to maintain adequate system performance. Timely arrival of data in the SCADA world is all about subscription, either ad hoc or preconfigured. I have worked fairly extensively on the appropriate models for this. A client requests subscription of a server. The subscription is periodically renewed and may eventually be deleted. While the subscription is active it delivers state updates to a client URL over some appropriate protocol. Easy. The complications start to appear in the next few points.

Who is the Master?

Deciding who to ask for subscriptions and other services is not as simple as you might think. You could use DNS (or a DNS-like service) in one of two ways. You could use static records, or your could change your records as the availability of servers changes. Dynamic updates would work through some DNS updater application running on one or more machines. It would detect the failure of one host, and nominate the other as the IP address to connect to for your service. Doing it dynamically has a problem that you're working from pretty much a single point of view. What you as the dynamic DNS modifier sees may not be the same as what all clients see. In addition you have the basic problem of the static DNS: Where do you host it? In SCADA everything has to be redundant and robust against failure. No downtime is acceptable. The static approach also pushes the failure detection problem to clients, which may be a problem they aren't capable of solving due to their inherent "dumb" generic functionality.

Rather than solving the problem at the application level you could rely on IP-level failover, however this works best when machines are situated on the same subnet. It becomes more complex to design when main and backup servers are situated in separate control centres for disaster recovery.

Whichever way you turn there are issues. My current direction is to use static DNS (or eqivalent) that specifies all IP addresses that are or may be relevant for the name. Each server should forward requests onto the main if it is not currently master, meaning that it doesn't matter which one is chosen when both servers are up (apart from a slight additonal lag should the wrong server be chosen). Clients should connect to all IP addresses simultaneously if they want to get their request through quickly when one or more servers are down. They should submit their request to the first connected IP, and be prepared to retry on failure to get their message through. TCP/IP has timeouts tuned for operating over the Internet, but these kinds of interactions between clients and servers in the same network are typically much faster. It may be important to ping hosts you have connections to in order to ensure they are still responsive.

It would be nice if TCP/IP timeouts could be tuned more finely. Most operating systems allow tuning of the entire system's connections. Few support tuning on a per-connection basis. If I know the connection I'm making is going to a host that is very close to me in terms of network topology it may be better to declare failures earlier using the standard TCP/IP mechanisms rather than supplimenting with ICMP. Also, the ICMP method for supplimenting TCP/IP in this way relies on not using an IP-level failover techniques between servers.

Client Failover

Quick failover follows on from discovering who to talk to. The same kinds of failture detection mechanisms are required. Fundamentally clients must be able to quickly detect any premature loss of their subscription resource and recreate it. This is made more complicated by the different server side implementations that may make subscription loss more or less likely, and thus the necessary corrective actions that clients may need to take. If a subscription is lost when a single server host fails, it is important that clients check their subscriptions often and also monitor the state of the host that is maintaining their subscription resource. If the host goes down then the subscription must be reestablished as soon as this is discovered. As such the subscription must be periodically tested for existence, preferrably through a RENEW request. Regular RENEW requests over an ICMP-supported TCP/IP connection as described above should be sufficent for even a slowly-responding server application to adequately inform clients that their subscriptions remain active and they should not reattempt creation.

Redundant Networks

SCADA systems typically utilise redundant networks as well as redundant servers. Not only can clients access the servers on two different physical media, the servers can to the same to clients. Like server failover, this could be dealt with at the IP level... however your IP stack would need to work in a very well-defined way with respect to packets you send. I would suggest that each packet be sent to both networks with duplicates discarded on the recieving end. This would very neatly deal with temporary outages in either network without any delays or network hiccups. Ultimately the whole system must be able to run over the single network, so trying to load balance while both are up may be hiding inherent problems in the network topology. Using them both should provide the best network architecture overall.

Unfortunately, I'm not aware of any network stacks that do what I would like. Hey, if you happen to know how to set it up feel free to drop me a line. In the mean-time this is usually dealt with at the application level with two IP addresses per machine. I tell you what: This complicates matters more than you'd think. You end up needing a DNS name for the whole server pair with four IP addresses. You then need an additional DNS name for each of the servers, each with two IP addresses. When you subscribe to a resource you specify the whole server pair DNS name on connection, but the subscrpition resource may only exist on one service. It would be returned with only that sevice's DNS name, but that's still two IP addresses to deal with and ping. All the way through your code you have to deal with this multiple address problem. In the end it doesn't cause a huge theoretical problem to deal with this at the application level, but it does make development and testing a pain in the arse all around.

Conclusion

Because this is all SIL2 software you end up having to write most of it yourself. I've been developing HTTP client and sever software is spurts over the last six months or so, but concertedly over the last few weeks. The beauty is that once you have the bits that need to be SIL2 in place you can access them with off the shelf implementation of both interfaces. Mozilla and curl both get a big workout on my desktop. I expect Apache, maybe Tomcat or Websphere will start getting a workout soon. By rearchitecting around existing web standards it should make it easier for me to produce non-SIL2 implementations of the same basic principles. Parts of the SCADA system that are not safety-related could be built out of commodity components while the ones that are can still work through carefully-crafted proprietary implementations. It's also possible that off the shelf implementations will eventually become so accepted in the industry that they can be used where safety is an issue. We may one day think of apache like we do the operating systems we use. They provide a commodity service that we undertand and have validated very well in our own industry and environment to help us to only have to write software that really adds value to our customers.

On that note, we do have a few jobs going at Westinghouse Rail Systems Australia's Brisbane office to support a few projects that are coming up. Hmm... I don't seem to be able to find them on seek. Email me if you're intersted and I'll pass them on to my manager. You'd be best to use my ben.carlyle at invensys.com address for this purpose.

Benjamin

Sun, 2005-Nov-13

The Makings of a Good HTTP API

I've had the opportunity over the last few weeks to develop my ideas about how to build APIs for interfacing over HTTP. Coming from the REST world view I don't see a WSDL or IDL -derived header file or baseclass definition as a fundamentally useful level of abstraction. I want to get at the contents of my HTTP messages in the way the HTTP protocol demands, but I may also want to do some interfacing to other protocols.

The first component of a good internet-facing API is decent URI parsing. Most URI parsing APIs of today use the old rfc2396 model of a URI. This model was complex, allowing only a very basic level of URI parsing without knowledge of the URI scheme. For example, a http URI reference such as https://example.com:8080/some/path?query#fragment could be broken into

scheme
http
authority
example.com:8080
path
/some/path
query
query
fragment
fragment

while an unknown URI could only be deconstructed into "scheme" and "scheme-specific-part". A URI parser that understood HTTP and another that did not would produce different results!

January 2005's rfc3986 maps out a solution to URI parsing that doesn't depend on whether you understand the URI scheme or not. All URIs must now conform to the generic syntax of (scheme, authority, path, query, fragment), but all elements of the URI except the path are strictly optional. This is great for API authors who want to provide a consistent interface, however most APIs for URI handling were developed before 2005 and feel clunky in light of the newer definitions. A good API is necessarily a post January 2005 API.

Once you have your URI handling API in place, the next thing to consider is how your client and server APIs work. Java makes a cardinal error on both sides of this equation by defining a set of HTTP verbs it knows how to use, and effectively prohibiting the transport of other verbs. In fact, the set of HTTP verbs has changed over time and may continue to change. Extensions like WEBDAV and those required to support subscription are important considerations in desiging a general purpose interface of this kind. rfc2616 is clear that extension methods are part of the HTTP protocol, and that there is a natural expectation that methods defined outside the core standard will be seen in the wild. A client API should behave like a proxy that passes requests through that it does not understand. It should invalidate any cache entries it may have associated with the named resource, but otherwise trust that the client code knows what it is doing.

On the server side the option to handle requests that your API never dreamed of is just as important. Java embeds the operations "GET", "HEAD", "OPTIONS", "POST", "PUT", "DELETE", and "TRACE" into its HttpServlet class, but this is a mistake. If anything this is a REST resource, rather than a simple HTTP resource. The problem is that your view of REST and mine may differ. REST only says that a standard set of methods be used. It doesn't say what those methods are. GET, HEAD, OPTIONS, POST, PUT, DELETE, and TRACE have emerged from many years of standardisation activity and from use in the wild... however other methods have been tried along the way and more will be tried in the future. HttpServlet should be what it says it is and let met get at any method tried on me. I should be able to define my own "RestServlet" class with my own concept of the set of standard verbs if I like. Using this Java interface I have to override the service method and do an ugly call up to the parent class to finish the job it one of my own methods isn't hit. Python (and various other languages, such as smalltak) actually allow the neatest solution to this problem: Just call the method and get an exception thrown if one doesn't exist. No need to override anything but the methods you understand.

Another thing I've found useful is to separate the set of end-to-end headers from those that are hop-by-hop. When developing a HTTP proxy it is important that some headers be stripped from any request before passing it on. I've found that putting those headers into a separate map from those end-to-end headers makes life simpler, and since these headers usually carry a level of detail that regular clients don't need be involed with they can be handed into the request formatting and transmission process separately. That way API-added headers and client added headers don't have to be combined.

I guess this brings me to my final few criticisms of the j2se and j2ee HTTP APIs. I think it's worthwhile having a shared concept of what a HTTP message looks like between client and server. Currently the servlet model requires HttpServletRequest and HttpServletResponse objects, however the client API has a HttpURLConnection class that has no relationship to either object. Also, the HttpURLConnection class itself looks nothing like a servlet. If we had started from a RESTful perspective, I would suggest that the definition of a servlet (a resource) and the definitions of the messages that pass between resources would be the first items on the list. It would certainly make writing HTTP proxies in Java easier, and should be more consistent overall. In fact there is very little difference between HTTP request and response messages, so they could share a common baseclass. There is very little difference between HTTP and SMTP messages, once you boil away the hop-by-hop headers. There are even some good synergies with FTP, and any other protocol that uses a URI for location. Transferring data between these different protocols shouldn't be difficult with a basic model of resources in place internal to your program.

I think that ultimately the most successful APIs will attempt to model the web and the internet within your program rather than simply provide onramps for access to different protocols. The web does not have a tightly-controlled data model, even at the protocol level. It's important to keep things light and easy rather than tying them down in an overly strict and strongly-typed Object-Oriented way. The web isn't like that, and to some extent I believe that our programming styles should be shifting away also. There's always going to be a need to express something that two objects in an overall system will understand completely, but those objects in-between that have to handle requests and responses have only a sketchy picture of.

Benjamin

Sun, 2005-Nov-06

Microformats

I have been reading about microformats in various blogs for a while, but only recently decided to go and see what they actually were. I'm a believer. Here is an example from the hCalendar microformat:

Web 2.0 Conference: October 5- 7, at the Argent Hotel, San Francisco, CA

It's just a snippet of xhtml, but it has embedded machine-readable markup, as follows:

<span class="vevent">
 <a class="url" href="https://www.web2con.com/">
  <span class="summary">Web 2.0 Conference</span>: 
  <abbr class="dtstart" title="2005-10-05">October 5</abbr>-
  <abbr class="dtend" title="2005-10-08">7</abbr>,
 at the <span class="location">Argent Hotel, San Francisco, CA</span>
 </a>
</span>

The same information could have been encoded in a separate calendar file or into hidden metadata in the xhtml, however the microformat approach allows the data to be written once in a visually verifiable way rather than repeating it in several different places. Using this method the human and the machine are looking at the same input and processing it in different ways.

Here is my quick summary of how to use a microformat in your html document, summarised from the hCalendar design principles:

  1. Use standard xhtml markup, just as you would if you weren't applying a microformat
  2. Add <span> or <div> tags for data that isn't naturally held within appearance-affecting markup
  3. Use class attributes on the relevant xhtml nodes for your data
  4. Where the machine readable data really can't be displayed as is, use <abbr> tags and insert the machine-readble form in the title attribute

Ian Davis has been working on a microformat for rdf. This neatly allows the microformat approach to be applied to foaf data and other rdf models. To demonstrate how cool this is I've embedded some foaf and dublin core metadata into my blog main page. You can access this data directly with an appropriate parser, or take advantage of Ian's online extractor to read the metadata in a more traditional rdf-in-xml encoding.

Benjamin

Tue, 2005-Nov-01

Open Source Capitalism, or "How to run your project as a business"

I wrote recently about the AJ Market, which allows people or organisations with a few dollars to spare to influence the focus of AJ's contribution to open source software development. If I contradict myself this time around please forgive me. I was running a little low on the sleep tank during my first attempt at a brain dump on the subject. This time around I'll try to stick to a few fundamentals.

Who writes the source?

If an open source software developer is to make money writing open source, he or she must be paid up front rather than making up the up front costs in license fees. There are different motiviations for funding open source. The contributor may be able to make direct use of the software produced. They may feel they can make money out of complimentary products such as services. They may be trying to favour curry with individuals in the software's target audience, leading to a return of good faith. The contributor may or may not be the same person as the developer. Traditionally the two have consistently been the same person. Developers wrote software to "scratch an itch" that affected them personally. This is a great model of software development where the software's users are the people most actively involved in driving the product as a whole. I see the possibility of opening up this market to also include people who can't scratch their itch directly but have the money to pay someone to do it.

The choice of license

Firstly, I think it is important to have a license that promotes trust in the community of users. My experience is that the GPL does this effectively in many cases by guaranteeing contributions are not unfairly leveraged by non-contributors. Eric Raymond chants that the GPL is no longer necessary because business sees open source as the most productive way forward and that businesses who fail to see this will fail to prosper. I disagree on the basis that through all of nature there are always some cheats. What works on the grand economic scale doesn't always make immediate business sense. The search for short term gain can wipe out trust and cooperation too quickly to give up on the protections that the GPL provides. When the global community of nation states no longer needs treaties to prevent or limit the use of tarriffs I'll start to look again at whether the GPL is required.

Voting with your wallets

My view of an economically vibrant project starts with Bugzilla. I intuitively like the concept of a bounty system in free software and think it ties in nicely with eXtreme Programming (XP) concepts. When you provide bounties for new work you should increase the supply of developers willing to do the work associated with the bug. When you allow the contribution of bounties to be performed by the broader user base you may align the supply of software development hours to the needs of the customer base. Bugzilla already has a concept of voting where registered users indicate bugs they really care about by incrementing a counter. If that counter were powered by dollars rather than clicks the figure may be both a more accurate statement of the desirability of a fix and a more motivating incentive for developers to contribute.

The tie in to me with XP is in breaking down the barrier between developer and customer. As I mentioned earlier, they are often already the same people in open source. An open flow of ideas of what work is most important and what the work's success criteria are is proving important in software development generally. In XP a customer representative is available at all times for developers to speak to. In open source, bugzilla is an open channel for discussion outside regular mailing lists. Adding money to the process may be a natural evolution of the bug database's role.

A question of scale

As I mentioned in my earlier article, the biggest problem in setting up a marketplace is ensuring that there is enough money floating around to provide meaningful signals. If a software developer can't make a living wage working on a reasonable number of projects this form of monetary open source is never going to really work. Open source also has a way of reducing duplication that might otherwise occur in proprietary software, so there is potentially far less work out there for software developers in a completely free software market. Whether it could really work or not is still a hypothetical question and starting work now on systems that support the model may still be exiting enough to produce reasonable information. Even if this model could only pay for one day a week of full-time software development it would be a massive achievement.

Governance and Process

If your project is a business you can pretty much run it any way you choose, but the data you collect and transmit through a closed system will be of lower quality than that of a free market. To run your project as a market I think you need a significant degree of transparancy to your process. I suggest having open books in your accounting system. When someone contributes to the resolution of a bug it should be recorded as money transferred from a bug-specific liability account to your cash at bank account. When the conditions for payout are met money should be tranferred from a developer-specific payable account back to the bug. Finally payment should be made from the cash at bank account to the payable account to close the loop. If the project has costs or is intended to run at a profit some percentage of the original transfer should be transferred from an appropriate income account rather than being transferred wholely from the liability. This approach clearly tracks the amount invested by users in each bug and transparently indicates the amounts being payed to each developer. I suggest that while contributions may be anonymous that developer payments should be clealy auditable.

Estimates and Bug Lifecycle

I'm going back on my previous suggestion to provide work estimates on each bug. I'm now suggesting that the amount of interest in supply be the main feedback mechanism for users who want to know how much a bug resolution is worth. More factors than that simply of money contribute to the amount of supply available for fixing a bug. There is the complexity of the code that is needed to work with to consider. There is the fun factor, as well as the necessity to interface to other groups and spend additional time. I would also suggest that different phases in a bug's lifecycle may be worth contributing to explicitly. If a bug is in the NEW state then money may be made available to investigate and specify the success criteria of later stages. Contributions may be separate for the development phase and a separate review phase. Alternatively, fixed percentages may be supplied to allow different people to become involved during different stages.

Bug Assignment Stability

Stablility of bug assignments is important as soon as money comes into the equation. There's no point in having all developers working individually towards the bug with the biggest payoff, only to find the bug is already fixed and payed out when they go to commit. Likewise, showing favouratism in assigning high value bugs the the same people every time could be deadly to project morale. I would take another leaf out of the eXtreme Programming book and suggest that leases be placed on bug assignments. The developer willing to put in the shortest lease time should win a lease bidding war. Once the lease is won the bug is left with them for that period. If they take longer then reassignment may occur.

Benjamin