Sound advice

Sun, 2011-May-22

Scanning data with HTTP

As part of my series on SCADA and REST I'll talk in this article a little about doing telemetry with REST. There are a couple of approaches, and most SCADA protocols accidentally incorporate at least some elements of REST theory. I'll take a very web-focused approach and talk about how HTTP can be used directly for telemetry purposes.

HTTP is by no means designed for telemetry. Compared to Modbus, DNP, or a variety of other contenders it is bandwidth-hungry and bloated. However, as we move towards higher available bandwidth with Ethernet communications and wider networks that already incorporate HTTP for various other purposes it becomes something of a contender. It exists. It works. It has seen off every other contender that has come its way. So, why reinvent the wheel?

HTTP actually has some fairly specific benefits when it comes to SCADA and DCS. As I have already mentioned it works well with commodity network components due to its popularity on the Web and within more confined network environments. In addition to that it brings with it benefits that made it the world's favourite protocol:

It is a mature standard with little interoperabilty risk
It is text-based, easy to intercept, and easy for a human to debug. I can't tell you how useful that is on the Web, but in SCADA environments that might be deployed for upwards of 15 to 20 years it's a potential godsend. How many protocols around today would you bet money on will still be supported by tooling that far into the future?
It is able to be passed through multiple layers of middleware such as routers, security devices, inspectors and loggers, policy enforcers, and pretty much anything else you can think of. Importantly, as with many SCADA protocols it follows a regular enough structure for these devices to make sense of the messages and do the right thing with them. The operation each request is trying to perform is always in the same place in the message. The subject of the operation is always available as the URL of the request. The data being transferred itself can be also be understood and modified as needed by the device. Not only that, but unlike most SCADA protocols every firewall on earth knows how to deal with it already. It's a doddle to be able to restrict the type of operation allowed through a particular access point or the specific addresses that should be allowed.
HTTP is strightforward to secure in ways that are compatible with modern network architecture.

So how do we bridge this gap between grabbing web pages from a server to grabbing analogue and digital values from a field device? Well, I'll walk down the naive path first.

Naive HTTP-based telemetry

The simplest way to use HTTP for telemetry is to:

Identify each data item you want to scan from the device, and give each one a URL
Have the master (aka the client) periodically request each data item
Have the slave (aka the server) respond using a media type that both client and server understand

So for example, if I want to scan the state of circuit breaker from a field device I might issue the following HTTP request:

GET https://rtu20.prc/CB15 HTTP/1.1
Expect: text/plain

The response could be:

HTTP/1.1 200 OK
Content-Type: text/plain

CLOSED

... which in this case we would take to mean circuit breaker 15 closed. Now this is a solution that has required us to do a fair bit of configuration and modelling within the device itself, but that is often reasonable. An interaction that moves some of that configuration back into the master might be:

GET https://rtu20.prc/0x13,2 HTTP/1.1
Expect: application/xsd-int

The response could be:

HTTP/1.1 200 OK
Content-Type: application/xsd-int

2

This could mean, "read 2 bits from protocol address 13 hex" with a response of "bit 0 is low and bit 1 is high" resulting in the same closed status for the breaker.

HTTP is not fussy about the exact format of URLs. Whatever appears in the path component is up to the server, and ends up acting as a kind of message from the server to itself to decide what the client actually wants to do. More context or less context could be included in order to ensure that the response message is what was expected. Different devices all using HTTP could have different URL structures and so long as the master knew which URL to look up for a given piece of data would continue to interoperate correctly with the master.

Scanning a whole device

So scanning individual inputs is fine if you don't have too many. When you use pipelined HTTP requests this can be a surprisingly effective way of performing queries to an ad hoc input set. However, in SCADA we usually do know ahead of time what we want the master to scan. Therefore it makes sense to return multiple objects in one go.

This can again be achieved simply in HTTP. You need one URL for every "class" of scan you want to do, and then the master can periodically scan each class as needed to meet its requirements. For example:

GET https://rtu20.prc/class/0 HTTP/1.1
Expect: application/xsd-int+xml

The response could be:

HTTP/1.1 200 OK
Content-Type: application/xsd-int+xml

<ol>
	<li>2</li>
	<li>1</li>
	<li>0</li>
</ol>

Now, I've thrown in a bit of xml there but HTTP can handle any media type that you would like to throw at it. That includes binary types, so you could even reuse elements of existing SCADA protocols as content for these kinds of requests. That said, the use of media types for even these simple interactions is probably the key weakness of the current state of standardisation for the use of HTTP in this kind of setting. This is not really HTTP's fault, as it is designed to be able to evolve independently of the set of media types in use. See my earlier article on how the facets of a REST uniform contract are designed to fit together and evolve. However, this is where standardisation does need to come into the mix to ensure long-term interoperability of relevant solutions.

The fundamental media type question is, how best to represent the entire contents of an I/O list in a response message. Now, the usual answer on the Web is XML and I would advocate a specific XML schema with a specific media type name to allow the client to select it and know what it has when the message is returned.

In this case once the client has scanned class 0, they are likely to want to scan class 1, 2, and 3 at a more rapid rate. To avoid needing to configure all of this information into the master, the content returned on the class 0 scan could even include this information. For example the response could have been:

HTTP/1.1 200 OK
Content-Type: application/io-list+xml

<ol>
	<link rel="class1" href="https://rtu20.prc/class/1"/>
	<link rel="class2" href="https://rtu20.prc/class/2"/>
	<link rel="class3" href="https://rtu20.prc/class/3"/>
	<li>2</li>
	<li>1</li>
	<li>0</li>
</ol>

The frequency of scans could also be included in these messages. However, I am a fan of using cache control directives to determine scan rates. Here is an example of how we can do that for the class 0 scan.

HTTP/1.1 200 OK
Date: Sat, 22 May 2011 07:31:08 GMT
Cache-Control: max-age=300
Content-Type: application/io-list+xml

<ol>
	<link rel="class1" href="https://rtu20.prc/class/1"/>
	<link rel="class2" href="https://rtu20.prc/class/2"/>
	<link rel="class3" href="https://rtu20.prc/class/3"/>
	<li>2</li>
	<li>1</li>
	<li>0</li>
</ol>

This particular response would indicate that the class 0 scan does not need to be repeated for five minutes. What's more, caching proxies along the way will recognise this information and are able to return it on behalf of the field device for this duration. If the device has many different master systems scanning it then the proxy can take some of the workload off the device itself in responding to requests. Master systems can still cut through any caches along the way for a specific integrity scan by specifying "Cache-Control: no-cache".

Delta Encoding

Although using multiple scan classes can be an effective way of keeping up to date with the latest important changes to the input of a device, a more general model can be adopted. This model provides a general protocol for saying "give me the whole lot", and "now, give me what's changed".

Delta encoding can be applied to general scanning, but is particularly appropriate to sequence of event (SOE) processing. For a sequence of events we want to see all of the changes since our last scan, and usually we also want these events to be timestamped. Some gas pipeline and other intermittently-connected systems have similar requirements to dump their data out onto the network and have the server quickly come up to date, but not lose its place for subsequent data fetches. I have my own favourite model for delta encoding, and I'll use that in the examples below.

GET https://rtu20.prc/soe HTTP/1.1
Expect: application/sequence-of-events+xml

The response could be:

HTTP/1.1 200 OK
Content-Type: application/sequence-of-events+xml
Link: <https://rtu20.prc/soe?from=2011-05-22T08:00:59Z>; rel="Delta"

<soe>
	<event
		source="https://rtu20.prc/cb20"
		updated="2011-05-22T08:00:59Z"
		type="application/xsd-int+xml"
		>2</event>
</soe>

The interesting part of this response is the link to the next delta. This response indicates that the device is maintaining a circular buffer of updates, and so long as the master fetches the deltas often enough it will be able to continue scanning through the updates without loss of data. The next request and response in this sequence are:

GET https://rtu20.prc/soe?from=2011-05-22T08:00:59Z HTTP/1.1
Expect: application/sequence-of-events+xml

The response could be:

HTTP/1.1 200 OK
Content-Type: application/sequence-of-events+xml
Link: <https://rtu20.prc/soe?from=2011-05-22T08:03:00Z>; rel="Next"

<soe>
	<event
		source="https://rtu20.prc/cb20"
		updated="2011-05-22T08:02:59.98Z"
		type="application/xsd-int+xml"
		>0</event>
	<event
		source="https://rtu20.prc/cb20"
		updated="2011-05-22T08:03:00Z"
		type="application/xsd-int+xml"
		>1</event>
</soe>

The master has therefore seen the circuit breaker transition from having the closed contact indicating true, through a state where neither the closed or open contact were firing and within 20ms to a state where the open contact is list up.

Conclusion

Although a great deal of effort has been expended in trying to bring SCADA protocols up to date with TCP and the modern Internet, we would perhaps have been better off spending our time leveraging the protocol that the Web has already produced for us and concentrating our efforts on standardising the media types need to convey telemetry data in the modern networking world.

There is still plenty of time for us to make our way down this path, and many benefits in doing so. It is clearly a feasible approach comparable to those of conventional SCADA protocols and is likely to be a fundamentally better solution due primarily it its deep acceptance across a range of industries.

Benjamin

in links: google google blogsearch technorati delicious
[/industrial-rest] permanent link

Tue, 2011-May-03

The REST Constraints (A SCADA perspective)

REST is an architectural style that lays down a predefined set of design decisions to achieve desirable properties. Its most substantial application is on the Web, and is commonly confused with the architecture of the Web. The Web consists of browsers and other clients, web servers, proxies, caches, the Hypertext Transfer Protocol (HTTP), the Hypertext Markup Language (HTML), and a variety of other elements. REST is a foundational set of design decisions that co-evolved with Web architecture to both explain the Web's success and to guide its ongoing development.

Many of the constraints of REST find parallels in the SCADA world. The formal constraints are:

Client-Server

The architecture consists of clients and servers that interact with each other via defined protocol mechanisms. Clients are generally anonymous and drive the communication, while servers have well-known addresses and process each request in an agreed fashion.

This constraint is pretty ubiquitous in modern computing and is in no way specific to REST. In service-orientation the terms client and server are usually replaced with "service consumer" and "service provider". In SCADA we often use terms such as "master" and "slave".

The client-server constraint allows clients and servers to be upgraded independently over time so long as the contract remains the same, and limits coupling between client and server to the information present in the agreed message exchanges.

Stateless

Servers are stateless between requests. This means that when a client makes a request the server side is allowed to keep track of that client until it is ready to return a response. Once the response has been returned the server must be allowed to forget about the client.

The point of this constraint is to scale well up to the size of the world wide web, and to improve overall reliability. Scalability is improved because servers only need to keep track of clients they are currently handling requests for, and once they have returned the most recent response they are clean and ready to take on another request from any client. Reliability is improved because the server side only has to be available when requests are being made, and does not need to ensure continuity of client state information from one request to another across restart or failover of the server.

Stateless is a key REST constraint, but is one that needs to be considered carefully before applying it to any given architecture. In terms of data acquisition it means that every interaction has to be a polling request/response message exchange as we would see in conventional Modbus telemetry. There would be no means to provide unsolicited notifications of change between periodic scans.

The benefits of stateless on the Web are also more limited within an industrial control system environment, where are more likely to see one concurrent client for a PLC or RTU's information rather than the millions we might expect on the Web. In these settings stateless is often applied in practice for reasons of reliability and scalability. It is much easier to implement stateless communications within an remote terminal unit than it is to support complex stateful interactions.

Cache

The cache constraint is designed to counter some of the negative impact that comes about through the stateless constraint. It requires that the protocol between client and server contain explicit cacheability information either in the protocol definition or within the request/response messages themselves. It means that multiple clients or the same polling client can reuse a previous response generated by the server under some circumstances.

The importance of the cache constraint depends on the adherence of an architecture to the stateless constraint. If clients are being explicitly notified about changes to the status of field equipment then there is little need for caching. The clients will simply accept the updates as they come in and perform integrity scans at a rate they are comfortable with.

Cache is not a common feature of SCADA systems. SCADA is generally built around the sampling of input that can change at any time, or at least can change very many times per second. In this environment the use of caching doesn't make a whole lot of sense, but we still see it in places such as data concentrators. In this setting a data concentrator device scans a collection of other devices for input. A master system can then scan the concentrator for its data rather than reaching out to individual servers. Cache can have significant benefits as systems get larger and as interactions between devices becomes more complex.

Layered System

The layered constraint is where we design in all those proxies that have become so troublesome, but much of the trouble has come from SCADA protocols not adhering well to this constraint. It says that when a client talks to a server, that client should not be able to tell whether it is talking to the "real" server or only to a proxy. Likewise, a server should not be able to tell whether it is talking to a "real" client or a proxy. Clients and servers should not be able to see past the layer they are directly interacting with.

This is a constraint that explicitly sets out to do a couple of things. First of all it is intended to let proxies at important locations get involved in the communication in ways that they otherwise could not. We could have a proxy that is aggregating data together for a particular section of a factory, railway line, power distribution network, etc. It could be acting as a transparent data concentrator, the sole device that is scanning the PLCs and RTUs in that area ensuring that each one only has to deal with the demands of a single client. However, that aggregator could itself answer to HMIs and other subsystems all over the place. In a REST architecture that aggregator would be speaking the same protocol to both the PLCs and to its own clients and clients would use the same protocol address to communicate with the proxy as it would the real device. This transparency allows the communications architecture to be modified in ways that were not anticipated in early system design. Proxies can easily be picked up, duplicated, reconfigured, and reused elsewhere and do a similar job without needing someone to reimplement it from scratch and without clients needing to explicitly modify their logic to make use of it.

The second thing it sets out to do is allow proxies to better scrutinise communication that passes through them based on policies that are important to the owner of the proxy. The proxy can be part of a firewall solution that allows some communication and blocks other communication with a high degree of understanding of the content of each message. Part of the success of HTTP can be put down to the importance of the Web itself, but one view of the success of HTTP in penetrating firewalls is that it gives just the right amount of information to network owners to allow them to make effective policy decisions. If a firewall wants to wall off a specific set of addresses it can easily do so. If it want to prevent certain types of interactions then this is straightforward to achieve.

Code on demand

There are really two variants of REST architecture. One that includes the code on demand constraint, and another that does not contain this constraint. The variant of REST that uses code on demand requires that clients include a virtual machine execution environment for server-provided logic as part of processing response messages.

On the Web you can read this constraint as directives like "support javascript" and "support flash" as well as more open-ended directives such as "allow me to deploy my application to you at runtime". The constraint is intended to allow more powerful and specific interactions between users and HMIs than the server would have otherwise been able to make happen. It also allows more permanent changes to be deployed over the network, such as upgrading the HMI software to the latest version.

Code on demand arguably has a place in SCADA environments for tasks like making HMI technology more general and reusable, as well as allowing servers of every kind to create more directed user interactions such as improving support for remotely reconfiguring PLCs or remotely deploying new configuration.

Uniform Interface

Uniform Interface is the big one. That's not only because it is the key constraint that differentiates REST from other styles of architecture, but because it is the feature so similar between REST and SCADA. I covered the uniform interface constraint previously from a SCADA perspective. It is central to REST and SCADA styles of architecture, but is a significant departure from conventional software engineering. It is what makes it possible to plug PLCs and RTUs together in ways that are not possible with conventional software systems. It is the core of the integration maturity of SCADA systems and of the Web that is missing from conventional component and services software.

Benjamin

in links: google google blogsearch technorati delicious
[/industrial-rest] permanent link

Sat, 2011-Apr-23

The REST Uniform Contract

One of the key design decisions of REST is the use of a uniform contract. Coming from a SCADA background it is hard to imagine a world without a uniform contract. A uniform contract a common protocol for accessing a variety of devices, software services, or other places where I/O, logic, or data storage happens. The whole point of SCADA is acquiring data from diverse sources and sending commands and information to the same without having to build custom protocol converters for each individual one. Surprisingly, this is a blind spot to most software engineering. It's a maturity hole that normally requires every service consumer to implement specific code to talk to each service in the architecture.

SOAP and WSDL are built on this style of software architecture, where every service in the system has a unique protocol to access the capabilities of the service. There is no common protocol mechanism to go and fetch information. There is no common mechanism to store information. What commonality exists between the protocols of different services exists at a lower level. Services define a variety of read and write operations. SOAP ensures these custom operation names are encoded into XML in a consistent way that can be encapsulated for transport across a variety of network architectures, and WSDL ensures there is a common way for the service to embed this protocol information into the integrated development environments for service consumers, as well as into the service consumers themselves.

The contract mechanism simplifies the task of processing messages sent between service and consumer, but still couples service and consumer together at the network level and at the software code level so that each consumer can only work with the one service that implements the contract.

OPC-UA and OPC are built on SOAP and COM, respectively. SOAP and COM both share this low level of protocol abstraction and both OPC and OPC-UA compensate for this by defining a service contract that not only one service implements but that every OPC DA Server or related server needs to implement in order for consumers to be able to communicate with them without custom per-service message processing logic and without custom per-service protocol. For this reason they are a good case study to contrast the features of SOAP and HTTP for industrial control purposes.

HTTP is the current standard protocol for one aspect of the REST uniform contract. In fact, there are two other key aspects. A REST uniform contract is a triangle of:

A standard syntax for resource identifiers
A standard protocol ("methods") for accessing resources
A standard set of types ("media types") that can be transferred

As all SCADA systems use some form of uniform contract, it is useful to understand the key design feature of a REST uniform contract compared to a conventional SCADA contract. In a conventional bandwidth-conservative SCADA protocol it is common to define fetch, store, and control operations that are each able to handle a defined set of types. These types might include a range of integer and floating point values, bit-fields, and other types. As I look back over the protocols I have used over my career I consider that some of the protocol churn we have seen over time has been because of the range of types available. Each time we need a new type we either have to change the protocol, start using a different protocol, or start to tunnel binary data through our existing protocol in ways that are custom or special to the particular interaction needed.

REST takes a different approach where the protocol "methods" are decoupled from the set of media types. This adds a little protocol overhead where we need to insert an identifier for the media type along with every message we send, and a long one at that. Examples of media type identifiers on the Web include text/plain, text/html, image/jpeg, image/svg+xml, and application/xhtml+xml. These type names are long, and they have to be to ensure uniqueness. We wouldn't normally tolerate type identifiers of this length in bandwidth-conservative SCADA protocols, but where we can assume the use of Ethernet comms and other fast communication bearers the massive inefficiency in these identifiers can be tolerated.

The reason we would want to tolerate identifiers like this is because they allow our main protocol to be independent of the types that are transferred across it. There is no need to change protocol just because you need to send new types of information. The set of types can evolve separately to the main protocol, and experience on the Web and in SCADA environment suggests that this is an excellent property for the application protocol to have. Types of data that need to be moved around need to be changed, extended, and customised far more often than the ways that the information needs to be moved around. You can essentially think of the REST uniform interface constraint as a decision to use a SCADA-like protocol but to explicitly separate out the types of information to ensure longevity of the protocol in use.

This brings us back to OPC and OPC-UA. Although they are layered on top of COM and SOAP they bring back some of the uniform contract constraint. They allow some variation of media type through the use VARIANT to convey custom types. However, they don't go all the way. In a REST environment we would not have a special protocol for data acquisition, another for alarms and events, and another for historical data. We would be looking to define one application protocol that could be used for all of these purposes in conjunction with specific media types. Perhaps not all of the features of that protocol would be used for all of these purposes, but they would be available and consistent across the architecture.

On the Web the application protocol is HTTP. It has features to GET, and to PUT, and to do all the basic things you would expect for a master/slave protocol. It is relatively efficient, especially when compared to a solution that tunnels SOAP messages over HTTP, and then OPC messages back over the SOAP. A simpler solution would see OPC make use of HTTP directly, and tie its future evolution to that of HTTP rather than to a three layer hierarchy.

It is conceivable that HTTP would require further work or some extension before it is completely suitable for use as a SCADA protocol, and I'll put together a few observations on this front in a later post. However, if HTTP can be adopted as the foundation for future SCADA systems that have reasonable bandwidth available to them then it will result both in a system that is more efficient than something like OPC-UA but is also more at home in a world of web proxies and firewalls. HTTP is the protocol of the Web, and REST is the foundation behind HTTP. HTTP is and will remain more at home in complex internetworking environments than COM, SOAP, or any other custom contract definition mechanism. I would predict that disciplined application of the REST uniform interface constraint in conjunction with HTTP will produce a consistently better and more robust technical solution to the problems of SCADA systems.

Benjamin

in links: google google blogsearch technorati delicious
[/industrial-rest] permanent link

Wed, 2011-Apr-13

Industrial REST

REST is the foundation of the Web, and is increasingly relevant to enterprise settings. I hail from a somewhat different context of industrial control systems. I have been thinking of putting together a series of articles about REST within this kind of setting to share a bit of experience and to contrast various approaches.

REST is an architectural style that lays down a set of design constraints that are intended to create various desirable properties in an architecture. It is geared towards the architecture of the Web, but has many other applications. REST makes and excellent starting point for the development of Supervisory Control and Data Acquisition systems (SCADA).

SCADA systems are usually built around SCADA protocols such as Modbus, OLE, or DNP. Exactly what protocol is used will depend on a variety of factors such as the particular control industry we happen to be working in, preferences of particular customers, and the existing installed base.

The SCADA protocol plays the same role in a SCADA system as HTTP plays on the Web. It is pitched at about the same level, and has many similar properties. If we are to reimagine the SCADA system as a REST-compliant architecture then the SCADA protocol would be the application protocol we would have in use.

SCADA protocols have been developed over a long period of time to be typically very bandwidth-efficient and to solve specific problems well. However, we have been seeing for a long time now across our industries the transition from slow serial connections to faster ethernet. We have been seeing the transition from modem communication to broadband between distant sites. Many of the benefits of existing protocols are being eaten away as they are shoehorned into internet-based environments and are needing to respond to new security challenges and the existence of more complex intermediary components such as firewalls and proxies. We see protocols such as OPC responding by adopting SOAP over HTTP as a foundation layer and then implementing a new SCADA protocol on top of this more complex stack.

I would like to make the case for a greater understanding of REST in the industrial communications world, a new vision of how industrial communications interacts with intranet environments, and to identify some of the areas where HTTP as the main REST protocol of today is not quite up to snuff for the needs of a modern control systems world.

Benjamin

in links: google google blogsearch technorati delicious
[/industrial-rest] permanent link

Sound advice - blog

Lifesigns

Subscribe

RDF

Feedback and Social Software

Support Software

Site Statistics

License

My recent bookmarks

Scanning data with HTTP

Naive HTTP-based telemetry

Scanning a whole device

Delta Encoding

Conclusion

The REST Constraints (A SCADA perspective)

The REST Uniform Contract

Industrial REST

Sound advice - blog

Lifesigns

Subscribe

RDF

Feedback and Social Software

Support Software

Site Statistics

License

My current feeds

My recent bookmarks

Scanning data with HTTP

Naive HTTP-based telemetry

Scanning a whole device

Delta Encoding

Conclusion

The REST Constraints (A SCADA perspective)

The REST Uniform Contract

Industrial REST