A position paper for the
W3C Workshop on Web of Services for Enterprise Computing,
by Benjamin Carlyle of
Westinghouse Rail Systems Australia.
Introduction
The Web and traditional SCADA technology are built on similar principles and
have great affinity. However, the Web does not solve all of the problems that
the SCADA world faces. This position paper consists of two main sections:
The first section describes the SCADA world view as a matter of context for
readers who are not familiar with the industry; the second consists of a series
of "Tier 1" and "Tier 2" positions that contrast with the current Web. Tier 1
positions are those that are based on a direct and immediate impact on our
business. Tier 2 positions are more general in nature and may only impact
business in the longer term.
The SCADA World View
Supervisory Control and Data Acquisition (SCADA) is the name for a broad
family of technologies across a wide range of industries. It has traditionally
been contrasted with Distributed Control Systems (DCS), where distributed
systems operate autonomously and SCADA systems typically operate under direct
human control from a central location.
The SCADA world has evolved to usually be a hybrid with traditional
DCS systems, but its meaning has expanded further. When we talk about SCADA in the
modern era, we might be talking about any system that acquires and concentrates
data on a soft real-time basis for centralised analysis and operation.
SCADA systems or their underlying technologies now underpin most operational
functions in the railway industry. SCADA has come to mean "Integration" as
traditional vertical functions like train control, passenger information,
traction power, and environmental control exchange ever more information.
The demands of our customers for more flexible, powerful, and cost-effective
control over their infrastructure are an ever-increasing set.
Perhaps half of our current software development can be attributed to
protocol development to achieve our integration
aims. This is an unacceptable figure, unworkable, and unnecessary. We tend to
see a wide gap between established SCADA protocols and one-off protocols
developed completely from scratch. SCADA protocols tend to already follow many
of the REST constraints. They have limited sets of methods, identifiers that
point to specific pieces of information to be manipulated, and a small set of
content types. The one-off protocols tend to need more care before they can be
integrated, and often there is no architectural model to be found in the
protocol at all.
We used to think of software development to support a protocol as the
development of a "driver", or a "Front End Processor (FEP)". However, we have
begun to see this consistently as a "protocol converter". SCADA systems are
typically distributed, and the function of protocol support is usually to map
an externally-defined protocol onto our internal protocols. Mapping from ad hoc
protocols to an internally-consistent architectural style turns out to be a
major part of this work. We have started to work on "taming" HTTP for use on
interfaces where we have sufficient control over protocol design, and we hope
to be able to achieve Web-based and REST-based integration more often than not
in the future. Our internal protocols already closely resemble HTTP.
The application of REST-based integration has many of the same motivations
and goals as the development of the Semantic Web. The goal is primarily to
integrate information from various sources. However, it is not integration
with a view to query but with a view to performing system functions. For this
reason it is important to constrain the vocabularies in use down to a set that
in some way relate to system functions.
I would like to close this section with the observation that there seems to
be a spectrum between the needs of the Web at large, and the needs of the
enterprise. Probably all of my Tier 1 issues could be easily resolved within
a single corporate boundary, and continue to interoperate with other parts of
the Web. The solutions may also be applicable to other enterprises. In fact,
as we contract to various enterprises I can say this with some certainty.
However, it seems difficult to get momentum behind proposals that are not
immediately applicable to the real Web. I will mention pub/sub in particular, which
is quickly dismissed as being unable to cross firewalls easily. However, this
is not a problem for the many enterprises that could benefit from a standard
mechanism. Once acceptance of a particular technology is established within the
firewall, it would seem that crossing the firewall would be a more
straightforward proposition. Knowing that the protocol is proven may encourage
vendors and firewall operators to make appropriate provisions when use cases
for the technology appear on the Web at large.
Tier 1: A HTTP profile for High Availability Cluster clients is required
My first Tier 1 issue is the use of HTTP to communicate with High
Availability (HA) clusters. In the SCADA world, we typically operate with no
single
point of failure anywhere in a critical system. We typically have
redundant operator workstations, each with redundant Network Interface Cards
(NICs), and so on and so forth, all the way to a HA cluster.
There are two basic ways to design the network between, either create two
separate networks for traffic, or interconnect. One approach yields multiple
IP addresses to connect to across the NICs of a particular server, and the other
yields a single IP. Likewise, it is possible to perform IP takeover and have
either a single IP shared between multiple server hosts or have multiple IPs.
Other than HA, we typically have a constraint on failover
time. Typically, any single point of failure is detected in less than five
seconds and a small amount of additional time is allocated for the actual
recovery. Demands vary, and while some customers will be happy with a ten or
thirty second total failover time others will demand a "bumpless" transition.
The important thing about this constraint is that it is not simply a matter of
a new server being able to accept new requests. Clients of the HA cluster also
need to make their transition in the specified bounded time.
HTTP allows for a timeout if a request takes too long, typically around
forty seconds. If this value was tuned to the detection time, we could see that
our server had failed and attempt to reconnect. However, this would reduce the
window in which valid responses must be returned. It would be preferable to
send periodic keepalive requests down the same TCP/IP connection
as the HTTP request was established on. This keepalive would allow server
death detection to be handled independently of a fault that causes the HTTP
server not to respond quickly or at all.
We are experimenting with configuring
TCP/IP keepalives on HTTP connections to achieve HA client behaviour.
The first question in such a system is about when the keepalive should be
sent, and when it should be disabled. For HTTP the answer is simple. When a
request is outstanding on a connection, keepalives should be sent by a HA
client. When no requests are outstanding keepalives should be disabled. In
general theory, keepalives need to be sent whenever a client expects responses
on the TCP/IP connection they established. This general case affects the
pub/sub model that I will describe in the next section. If pub/sub updates can
be delivered down a HA client's TCP/IP connection, the client must send
keepalives for the duration of its subscriptions. It is the server who must
send keepalives if the server connects back to the client to deliver
notifications. Such a server would
only need to do so while notification requests are outstanding, but would need
to persist the subscription in a way that left the client with confidence that
the subscription would not be lost.
Connection is also an issue in a high availability environment. A HA client
must not try to connect to one IP, then move onto the others after a timeout.
It should normally connect to all addresses in parallel, then drop all but
the first successful connection. This process should also take place when a
failover event occurs.
Tier 1: A Publish/Subscribe mechanism for HTTP resources is required
One of the constants in the ever-changing SCADA world, is that we perform
soft real-time monitoring of real-world state. That means that data can change
unexpectedly and that we need to propagate that data immediately when we detect
the change. A field unit will typically test an input every few milliseconds,
and on change will want to notify the central system. Loose coupling will often
demand that a pub/sub model be used rather than a push to a set of urls
configured in the device.
I have begun drafting a specification that I think will solve most pub/sub
problems, with a preliminary name of
SENA.
It is loosely based on the GENA protocol, but has undergone significant
revision to attempt to meet the security constraints of the open Web while also
meeting the constraints of a SCADA environment. I would like to continue
working on this protocol or a similar protocol, helping it reach a status where
it is possible to propose it for general use within enterprise boundaries.
We are extremely sensitive to overload problems in the SCADA world. This
leads us to view summarisation as one of the core features of a subscription
protocol. We normally view pub/sub as a way to synchronise state between two
services. We view the most recent state as the most valuable. If we have to
process a number of older messages before we get to the newest value,
latency and operator response time both increase. We are also highly
concerned with situations permanent or temporary where state changes
occur at a rate beyond which the system can adequately deal with. We dismiss
with prejudice, any proposal that involves infinite or arbitrary buffering at
any point in the system. We also expect a subscription model to be able to make
effective use of intermediaries, such as web proxies that may participate in
the subscription.
Tier 2: One architectural framework with a spectrum of compatible architectures
I believe that the architectural styles of the Web can be applied to the
enterprise. However, local conventions need to be permitted. Special methods,
content types, and other mechanisms should all be permitted where required.
I anticipate that the boundary between special and general will shift over
time, and that the enterprise will act as a proving ground for new features of
the wider Web. Once such features are established in the wider Web, I would
also expect the tide to flow back into enterprises that are doing the same
thing in proprietary ways.
If properly nurtured, I see the enterprise as a nursery for ideas
that the Web is less and less able to experiment with itself. I suspect that
the bodies that govern the Web should also be involved with ideas that are
emerging in the enterprise. These bodies can help those involved with
smaller-scale design keep an eye on the bigger picture.
Tier 2: Web Services are too low-level
Web Services are not a good solution space for Web architecture because
they attack integration problems at too low a level. It is unlikely that two
services independently developed against the WS-* stack will interoperate.
That is to say, they will only interoperate if their WSDL files match. HTTP
is ironically a higher-level protocol than the protocol that is layered on top
of it.
That said, we do not rule out interoperating with such systems if the right
WSDL and architectural styles are placed on top of the WS-* stack.
We anticipate a
"HTTP" WSDL eventually being developed for WS-*, and expect to write a protocol
converter back to our internal protocols for systems that implement this WSDL.
The sheer weight of expectation behind Web Services suggests that it will be
simpler for some organisations to head down this path, than down a path based on
HTTP directly.
Tier 2: RDF is part of the problem, not the solution
We view RDF as a non-starter in the machine-to-machine communications space,
though we see some promise in ad hoc data integration within limited enterprise
environments. Large scale integration based on HTTP relies on clear,
well-defined, evolvable document types. While RDF allows XML-like document types
to be created, it provides something of an either/or dilemma. Either use
arbitrary vocabulary as part of your document, or limit your vocabulary to that
of a defined document type.
In the former case you can embed rich
information into the document, but unless the machine on the other side expects
this information as part of the standard information exchange, it will not be
understood. It also increases document complexity by blowing out the number of
namespaces in use. In practice it makes more sense to define a single cohesive
document type with a single vocabulary that includes all of the information you
want to express. However,
in this case you are worse off than if you were to start with XML.
You cannot relate a single cohesive RDF
vocabulary to any other without complex model-to-model transforms. In short,
it is
easier to extract information from a single-vocabulary XML document than
from a single-vocabulary RDF document. RDF does not appear to solve any part
of the system integration problem as we see it. However, again, it may assist
in the storage and management of ad hoc data in some enterprises in place of
traditional RDBMS technology.
We view the future of the semantic web as the development of specific XML
vocabularies that can be aggregated and subclassed. For example, the atom
document type can embed the html document type in an aggregation relationship. This
is used fo elements such as <title>. The must-ignore semantics of atom
also allow sub-classing by adding new elements to atom. The subclassing
mechanism can be used to produce new versions of the atom specification that
interoperate with old implementations. The mechanism can also be used to
produce jargonised forms of atom rather than inventing a whole new vocabulary
for a particular problem domain.
We see the development, aggregation, and jargonisation of XML document types
as the key mechanisms in the development of the semantic web. The graph-based
model used by RDF has currently not demonstrated value in the
machine-to-machine data integration space, however higher-level abstractions
expressed in XML vocabularies are a proven technology set. We anticipate the
formation of communities around particular base document types that work
on resolving their jargon conflicts and folding their jargon back into the
base document types.
We suspect this social mechanism for vocabulary development and evolution will
continue to be cancelled out in the RDF space by RDF's reliance URI namespaces
for vocabulary and by its overemphasis of the graph model.
Tier 2: MIME types are more effective than URI Namespaces
One the subject of XML, we have some concerns over the current direction in
namespaces. The selection of a parser for a document is typically based on its
MIME type. Some XML documents will contain sub-documents, however there is no
standard way to specify the MIME type of the sub-document. We view MIME as more
fully-featured than arbitrary URIs, particularly due to the explicit subclassing
mechanism available.
In MIME we can explicitly indicate that a particular document type is based
on xml: application/some-type+xml. Importantly, we can continue this explicit
sub-typing: application/type2+some-type+xml. We consider this an important
mechanism in the evolution of content types, especially when jargonised
documents are passed to standard processors. It is normal to expect that the
standard processor would ignore any jargon and extract the information
available to it as part of standard vocabulary.
While MIME also has its weaknesses, the explicit subclassing mechanism is
not available in URI name-spaces at all. To use the atom example, again,
atom has a application/atom+xml MIME type but an XML namespace of
<https://www.w3.org/2005/Atom>. We view the former as more useful than
the latter in the development of the Semantic Web and in general machine to
machine integration problems.
Tier 2: Digital Signatures are likely to be useful
We regard the protection of secret data by IP-level or socket-level security measures
as being sufficient at this time. Secret data is known and communicated by few
components of the architecture, so is usually not a scalability issue.
We do not think that secret data should have significant impact on Web
architecture, however, we do view the ability to digitally sign non-secret
data as a likely enabler for future protocol features.
Conclusion
Web technology and architectural style are proven useful tools for systems
integration, but are incomplete. A scalable summarising Publish/Subscribe
mechanism is an essential addition to the suite of tools, as is a client
profile for operating in High Availability environments. These tools must be
defined and standardised in order to gain a wide participation to be useful
to the enterprise.
We have concerns about some current trends in Web Architecture. These relate
to to namespaces in XML, Web Services, and RDF. All of these trends appear to
work against the goal of building integrated architectures from multi-vendor
components. Our goal outcomes would also appear to be the goal outcomes of the
Semantic Web, so we have some hope that these trends will begin to reverse in
the future.