Sound advice - blog

Tales from the homeworld

My current feeds

Tue, 2007-Feb-27

The Architectural Spectrum

I see the ideal software architecture of the world as a spectrum between the widest and most narrowly-defined. Sitting at the widest end today is the architecture of the . Sitting at the narrowest end is code being written as part of a single program. But what are steps between, and how should our view of architectural constraints change between the two extremes?

Architecture and Architecture Spectrum

Firstly, let me define for the purposes of this article. An architecture consists of components that communicate by exchanging messages in interactions. A single component can participate in multiple architectures of differing scales, so we need a way of distinguishing one architecture from another in a fundamental way. I suggest using , which states that the value of a telecommunications network is proportional to the square of the number of users in the system. In software architecture the network effect is bounded to sets of components that each understand a common set of interactions. Each interaction consists of one or more messages being sent from component to component, and each message in the interaction is understood by its recipient.

A specific software architecture is a collection of components that each understand the same set of messages. Messages that are not understood by some component are part of a different architecture. A spectrum of architectures typically involves a base architecture in which a small number of messages are understood by a large number of components, then successively narrows into sub-architectures that understand a more diverse set of messages. While some components may participate in several sub-architectures, we can conceptualise the spectrum from any particular component's point of view as a set of narrowing architectures that it participates in. Metcalfe's law does not apply when interactions are not understood between components, so whenever a new interaction is introduced a new architecture is created within the spectrum.

An Architectural Spectrum Snapshot

The largest software architecture in the world today is the HTML Web. Millions of components participate in this architecture every day, and each time a request is sent or a response returned it is understood by the component that handles the request message or response message. The Web is a beachhead for the development of architecture everywhere. The Web is actually a collection of architectures defined around specific document types. With HTTP as its foundation it defines a meta-architecture that can be realised whenever a particular document type gains enough support.

The components of the various Web architectures are operated by millions of agencies. The representatives of these agencies can be found in organisations such as the w3c and the ietf, and require a great deal of effort to move in any particular direction. Architectures operated by a smaller number of agencies may be more easily influenced. For example, participants in a particular industry or in a particular supply chain may be able to negotiate new interactions to build architectures relevant to them.

On a smaller scale again we might have a single agency in control of an architecture in an enterprise or a corporate department. These architectures are easier to influence than those that require the balancing of more diverse competing interests, however they may still be constrained. An enterprise architecture will still typically consist of separate configuration items... separate components that are not typically or perhaps can never be upgraded or redeployed as a unit. You generally can't just pull the whole system out and plug another one in without significant upfront planning and cost. The enterprise architecture consists of several configuration items that must continue to work with other architecture components post-upgrade.

That leaves the architecture of a configuration item. This is where I would stop using the word architecture and start using the word design. At every point until this one as we scale down from the Web we must normally be able to deal with old versions of components that we interact with. We must deal with today's architecture as well as yesterday's, and tomorrow's. This temporal aspect to architecture participation disappears for architecture defined within a particular configuration item. It becomes more important to ensure that a consistent whole is deployed than that a component continues to work with all architectures when it is upgraded.

Characteristics of the Spectrum

As we move down from the Web we see changes in a number of areas:

With the reduction in the number of participants, the momentum behind existing interactions declines. It is easier to add new interactions and easier to eventually eliminate old ones. Importantly, the network effects also decline. Many of the constraints of can be relaxed in these environments.

Network effects are still important, even in relatively small architectures. This means that it is still worthwhile following constraints such as the uniform interface. There is no point splitting your architecture up into point-to-point integration pairs when you could just as easily have ten or twenty components participating in an architecture and working together for the same cost. The main areas that REST constraints can be relaxed in involve scalability an evolvability, and even there you have something of a newtonian vs einsteinian issue. You may not see the effects of relativity when you are travelling at 60Kph, but they are there. Sure enough, when you really get up to speed at 1/4 the speed of light you'll know it. Every architect should be aware of the constraints and the effect of bending them.

Evolving the Spectrum of Architectures

One particularly interesting aspect of REST is how it evolves. I noted earlier that the Web is really a meta-architecture that is realised in the form of the HTML Web and other Webs. This is a characteristic of the REST style. Instead of deciding on the message format and leaving it at that, REST builds the messages up of several moving parts. The message consists of verbs (including response codes), content types, and identifier schemes. Each time you change the set of verbs, the set of content types, or the way you identify resources you are creating a new architecture. Different kinds of changes have differing effects. Changing the identifier scheme could be a disaster, depending on how radically you change it. Changing the set of methods will affect many components that are agnostic to content type changes. For example, a web proxy is not remotely interested in the content type of a message. It can cache anything. HTTP libraries are similarly agnostic to the messages they send, either in whole or in part.

REST is actively designed around the notion that while all messages need to be understood to be part of the same architecture, architectures must change and be replaced over time. New versions of particular document types can be easily introduced, so long as they are backwards-compatible. New document types can be introduced, so long as they don't require new methods. Even new methods can be introduced occasionally.

It is my view that the most appropriate way to create the smaller architectures of the world is to extend the base architecture of the Web. New methods can be added. New document types can be added. Existing document types can be extended. I think that while REST provides key technical facilities to allow architectures to evolve, there is also a human side to this evolution. Someone must try out the new content type. Someone must try out the new methods. I see these smaller architectures as proving grounds for new ideas and technology that the Web is less and less able to experiment with directly.

It is in these architectures that communities develop around the Web meta-architecture. It is in these architectures that extensions to standard document types will be explored. It is in these architectures that new document types will be forged. Community is the central ingredient to evolution. The most important thing that technology can do is avoid getting in the way of this experimentation. Once the experiments are gaining traction we need simple ways of merging their results back into the wider architecture. We need simple ways of allowing atom extensions to be experimented with and then rolled back into atom itself without introducing esoteric namespaces. In short, we need to be developing a Web where context defines how we interpret some information rather than universal namespaces. When these extensions are moved into the main context of the Web they will define architecture for everyone. Until then, they incubate in a sub-architecture.

Conclusion

I still don't see where fits into the world, or even for that matter. The expense of rolling out a new protocol over the scale of the Web has already been demonstrated to be nearly impossible over the short term. HTTP/1.1 and IPv6 are examples. The Web has reached a point where it takes decades to bring about substantial change, even when the change appears compelling. HTTP can't be unmade at this point, but perhaps it can be extended. So long as their use remains Web-compatible, sub-architectures can extend HTTP and its content types to suit their individual needs. They may even be able to build a second-tier Web that eventually supplants the original Web.

I don't see a place for . I see the Web as a world of mime types and namespace-free xml. I think you need to build communities around document types. I think the sub-architectures that (mis)use and extend the content types of the Web contribute to it, and that XML encourages this more than RDF does. Today we have HTML, atom, pdf, png, svg, and a raft of other useful document types. In twenty years time we will probably have another handful that are so immensely useful to the wider Web that we can't imagine how we ever lived without them. I predict that this will be the way to the semantic web: Hard-fought victories over specific document types that solve real-world problems. I predict that the majority of these document types will be based around the tree structure of XML, but define their own structure on top of it. I don't foresee any great number being built around the graph structure of XML, also defined on top of XML in present-day RDF/XML serialisations. If RDF is still around in that timeframe it will be used behind the firewall to store data acquired through standard non-RDF document types in a way that replaces present day RDBMS and SQL.

Benjamin

Sun, 2007-Feb-25

Remixing REST: Verbs and Interaction Patterns

I have been interested in the boundaries between classical object-orientation and REST for many years. This article attempts to explore the bounaries in one particular area. One of REST's core tenets is that of the uniform interface. Is the uniform interface as important as REST suggests? Could it be done any differently?

A significant proportion of the work that I do involves integrating software components or physical devices from different vendors into a single architecture. This usually involves writing a protocol converter for the purpose, often a one-off converter for a particular customer contract. Internally, we have needed to do this kind of thing less and less as we have embraced the REST style. Instead of inventing a new protocol or new IDL whenever we write a new application we have been tending for some time now to reuse an existing HTTP-derived protocol. We can then focus on document types. Do we need a new one, or will the one of the ones we have already in use do the job?

The need to limit verbs has long been a teaching of REST proponents, but the motivation isn't always abundantly clear. It seems we can look at the web and see that the nouns greatly outnumber the verbs, and see that the web seems to work well because of it. So let me have a go at coming up with a simple reasoning:

Ad hoc interoperation between two components of an architecture relies on those components being capable of participating in a particular common interaction pattern. An interaction pattern between a client and a server involves one or more request messages being sent from the client to the server, and one or more response messages being to the client. Today's Web constrains the interactions to one request and one response per interaction. The interaction is decomposed into request verb and document type, and the response verb and document type. Headers are sometimes also important parts of the interaction.

In traditional Object-Orientation we are used to writing code every time we write a new class or interface. We write new code to implement the classes, and write new code to interact with the classes. Two objects are unlikely to interoperate unless we plan for that interoperation. Interface classes and design patterns can help us decouple classes from each other, however we must still typically design and choose an interaction pattern for a specific functional purpose.

This is all well and good when we control both end-points of the conversation, or when the interface is encapsulated in an industry standard such as the servlet interface. However the Web introduces a broader problem set. We start to need an interface that decouples components from each other, even though they belong to different industries. We need standards that are more generic, lest we have to start writing new browser code every time a web site is added to the Internet.

Let's inspect the Web interaction pattern some more. We have roughly four to eight verbs to work with, with about... urgh... forty-three response verbs. That gives you around 172-344 possible request/response interactions on the web. You also need to multiply that out by the number of content types, so theoretically we have thousands or even tens of thousands of possible interations happening on the Web. That's probably too many.

In practice only a few response verbs are used, and hopefully Waka will make some sort of headway in this respect. If we reduced the response verbs to their basic types we would be left with only about twenty important interaction patterns on the Web, and only clients get the really raw deal. A server needs to understand all of the possible requests that make sense, but doesn't need to undersand any response that it doesn't plan on using. A client should understand all of the requests that are meaningful based on its own specification, but should also understand all of the responses that might be returned to it.

If client and server both implement the request and response verbs that make sense and they both know how to exchange the same document types, they should be able to be configured rather than coded to work together. This is hugely important in big architecture, where it is rarely possible to influence the other side of a conversation into following your individual, corporate, or even industry-specific specifications.

To my mind the difference between design and architecture is one of configuration control. At one extreme we have design. Design is controlled by a single agency, and deployed with a single version number. You can construct a design in a very freeform way, because you test and deploy it as a unit. It makes sense to maintain rigid control over typing. You would rather find inconsistency problems at compile time than have to pick them up during testing.

Pure architecture is the other extreme. An archicture component is deployed as a single entity, but when it is upgraded none of the other architecture components are redeployed. Consistency is no longer a concern, and checking for consistency is extremely counterproductive. It is much more important to interoperate with a range of components an component versions built and deployed by different agencies.

In the middle of these two extremes is a kind of half-design, half-architecture scenario. I'll call it system design. You might version or deploy different components of a system separately, but you own all of the components can do a big upgrade if you need to. System design has characteristics of both design and of architecture. Like architecture, you want to avoid enforcing consistency at build time between components. They might be deployed against various versions of the other components. Like design, you can add special interactions and local conventions. You control both ends of the conversation, so can be sure that your special conventions will be understood correctly.

Another way to look at system design is as a sub-architecture. Your system may participate in a wider architecture over which you have no control, in a smaller architecture over which you have some control, and yet another in which you have significant or total control. The ideal implementation of these architectures would use the interactions that are standard in the widest architecture whenever they are applicable, then scale down to specifics as special semantics are required. An example of this might be to use a HTTP GET request whenever a client wants to retrieve any kind of data from a server, but still allow special interactions such as LoadConfiguration when nothing from the HTTP sphere is a good match.

The widest possible architecture today is the Web, making HTTP and its methods hard to ignore. It seems they should be the defacto standard whenever they are appropriate. However, SOAP appears to be solving real-world problems in smaller architectures or designs today. The two are clearly not compatible on the wire, however gateways between the two protocols may be viable when a WS-*-based architecture facilitates interactions that can be cleanly mapped to HTTP. Two approaches are possible to create a mapping in a WS-* architecture. You could define a WSDL that covered HTTP-compatible interaction, or you could construct individual WSDL to deal with each interaction that HTTP supports. Given these interfaces it would be straightforward for components of the WS-* architecture to also participate in the wider architecture.

While gateways are a short-term technique that can be used to bring these architectures closer together, they don't really solve the longer-term issues. We should be prepared to identify a longer-term objective that allows the needs of both architectures to be met with a single technology set. This could be achieved by starting out every conversation as HTTP, but quickly upgrading to a more sophisticated protocol whenever it is supported. Fielding has suggested this will be a technique used by his Waka protocol, and it could likewise be adopted for a HTTP-compatible SOAP mechanism. However with both WAKA and SOAP the advantages of the new protocol would have to significantly outweigh the costs of effectively replacing the architecture of the Web. I see any such protocol as spending decades incubating in the enterprises of this world before they become remotely important components of the actual Web.

Benjamin

Thu, 2007-Feb-22

SCADA, Architectural Styles, and the Web

A position paper for the W3C Workshop on Web of Services for Enterprise Computing, by Benjamin Carlyle of Westinghouse Rail Systems Australia.

Introduction

The Web and traditional SCADA technology are built on similar principles and have great affinity. However, the Web does not solve all of the problems that the SCADA world faces. This position paper consists of two main sections: The first section describes the SCADA world view as a matter of context for readers who are not familiar with the industry; the second consists of a series of "Tier 1" and "Tier 2" positions that contrast with the current Web. Tier 1 positions are those that are based on a direct and immediate impact on our business. Tier 2 positions are more general in nature and may only impact business in the longer term.

The SCADA World View

Supervisory Control and Data Acquisition (SCADA) is the name for a broad family of technologies across a wide range of industries. It has traditionally been contrasted with Distributed Control Systems (DCS), where distributed systems operate autonomously and SCADA systems typically operate under direct human control from a central location.

The SCADA world has evolved to usually be a hybrid with traditional DCS systems, but its meaning has expanded further. When we talk about SCADA in the modern era, we might be talking about any system that acquires and concentrates data on a soft real-time basis for centralised analysis and operation.

SCADA systems or their underlying technologies now underpin most operational functions in the railway industry. SCADA has come to mean "Integration" as traditional vertical functions like train control, passenger information, traction power, and environmental control exchange ever more information. The demands of our customers for more flexible, powerful, and cost-effective control over their infrastructure are an ever-increasing set.

Perhaps half of our current software development can be attributed to protocol development to achieve our integration aims. This is an unacceptable figure, unworkable, and unnecessary. We tend to see a wide gap between established SCADA protocols and one-off protocols developed completely from scratch. SCADA protocols tend to already follow many of the REST constraints. They have limited sets of methods, identifiers that point to specific pieces of information to be manipulated, and a small set of content types. The one-off protocols tend to need more care before they can be integrated, and often there is no architectural model to be found in the protocol at all.

We used to think of software development to support a protocol as the development of a "driver", or a "Front End Processor (FEP)". However, we have begun to see this consistently as a "protocol converter". SCADA systems are typically distributed, and the function of protocol support is usually to map an externally-defined protocol onto our internal protocols. Mapping from ad hoc protocols to an internally-consistent architectural style turns out to be a major part of this work. We have started to work on "taming" HTTP for use on interfaces where we have sufficient control over protocol design, and we hope to be able to achieve Web-based and REST-based integration more often than not in the future. Our internal protocols already closely resemble HTTP.

The application of REST-based integration has many of the same motivations and goals as the development of the Semantic Web. The goal is primarily to integrate information from various sources. However, it is not integration with a view to query but with a view to performing system functions. For this reason it is important to constrain the vocabularies in use down to a set that in some way relate to system functions.

I would like to close this section with the observation that there seems to be a spectrum between the needs of the Web at large, and the needs of the enterprise. Probably all of my Tier 1 issues could be easily resolved within a single corporate boundary, and continue to interoperate with other parts of the Web. The solutions may also be applicable to other enterprises. In fact, as we contract to various enterprises I can say this with some certainty. However, it seems difficult to get momentum behind proposals that are not immediately applicable to the real Web. I will mention pub/sub in particular, which is quickly dismissed as being unable to cross firewalls easily. However, this is not a problem for the many enterprises that could benefit from a standard mechanism. Once acceptance of a particular technology is established within the firewall, it would seem that crossing the firewall would be a more straightforward proposition. Knowing that the protocol is proven may encourage vendors and firewall operators to make appropriate provisions when use cases for the technology appear on the Web at large.

Tier 1: A HTTP profile for High Availability Cluster clients is required

My first Tier 1 issue is the use of HTTP to communicate with High Availability (HA) clusters. In the SCADA world, we typically operate with no single point of failure anywhere in a critical system. We typically have redundant operator workstations, each with redundant Network Interface Cards (NICs), and so on and so forth, all the way to a HA cluster. There are two basic ways to design the network between, either create two separate networks for traffic, or interconnect. One approach yields multiple IP addresses to connect to across the NICs of a particular server, and the other yields a single IP. Likewise, it is possible to perform IP takeover and have either a single IP shared between multiple server hosts or have multiple IPs.

Other than HA, we typically have a constraint on failover time. Typically, any single point of failure is detected in less than five seconds and a small amount of additional time is allocated for the actual recovery. Demands vary, and while some customers will be happy with a ten or thirty second total failover time others will demand a "bumpless" transition. The important thing about this constraint is that it is not simply a matter of a new server being able to accept new requests. Clients of the HA cluster also need to make their transition in the specified bounded time.

HTTP allows for a timeout if a request takes too long, typically around forty seconds. If this value was tuned to the detection time, we could see that our server had failed and attempt to reconnect. However, this would reduce the window in which valid responses must be returned. It would be preferable to send periodic keepalive requests down the same TCP/IP connection as the HTTP request was established on. This keepalive would allow server death detection to be handled independently of a fault that causes the HTTP server not to respond quickly or at all. We are experimenting with configuring TCP/IP keepalives on HTTP connections to achieve HA client behaviour.

The first question in such a system is about when the keepalive should be sent, and when it should be disabled. For HTTP the answer is simple. When a request is outstanding on a connection, keepalives should be sent by a HA client. When no requests are outstanding keepalives should be disabled. In general theory, keepalives need to be sent whenever a client expects responses on the TCP/IP connection they established. This general case affects the pub/sub model that I will describe in the next section. If pub/sub updates can be delivered down a HA client's TCP/IP connection, the client must send keepalives for the duration of its subscriptions. It is the server who must send keepalives if the server connects back to the client to deliver notifications. Such a server would only need to do so while notification requests are outstanding, but would need to persist the subscription in a way that left the client with confidence that the subscription would not be lost.

Connection is also an issue in a high availability environment. A HA client must not try to connect to one IP, then move onto the others after a timeout. It should normally connect to all addresses in parallel, then drop all but the first successful connection. This process should also take place when a failover event occurs.

Tier 1: A Publish/Subscribe mechanism for HTTP resources is required

One of the constants in the ever-changing SCADA world, is that we perform soft real-time monitoring of real-world state. That means that data can change unexpectedly and that we need to propagate that data immediately when we detect the change. A field unit will typically test an input every few milliseconds, and on change will want to notify the central system. Loose coupling will often demand that a pub/sub model be used rather than a push to a set of urls configured in the device.

I have begun drafting a specification that I think will solve most pub/sub problems, with a preliminary name of SENA. It is loosely based on the GENA protocol, but has undergone significant revision to attempt to meet the security constraints of the open Web while also meeting the constraints of a SCADA environment. I would like to continue working on this protocol or a similar protocol, helping it reach a status where it is possible to propose it for general use within enterprise boundaries.

We are extremely sensitive to overload problems in the SCADA world. This leads us to view summarisation as one of the core features of a subscription protocol. We normally view pub/sub as a way to synchronise state between two services. We view the most recent state as the most valuable. If we have to process a number of older messages before we get to the newest value, latency and operator response time both increase. We are also highly concerned with situations permanent or temporary where state changes occur at a rate beyond which the system can adequately deal with. We dismiss with prejudice, any proposal that involves infinite or arbitrary buffering at any point in the system. We also expect a subscription model to be able to make effective use of intermediaries, such as web proxies that may participate in the subscription.

Tier 2: One architectural framework with a spectrum of compatible architectures

I believe that the architectural styles of the Web can be applied to the enterprise. However, local conventions need to be permitted. Special methods, content types, and other mechanisms should all be permitted where required. I anticipate that the boundary between special and general will shift over time, and that the enterprise will act as a proving ground for new features of the wider Web. Once such features are established in the wider Web, I would also expect the tide to flow back into enterprises that are doing the same thing in proprietary ways.

If properly nurtured, I see the enterprise as a nursery for ideas that the Web is less and less able to experiment with itself. I suspect that the bodies that govern the Web should also be involved with ideas that are emerging in the enterprise. These bodies can help those involved with smaller-scale design keep an eye on the bigger picture.

Tier 2: Web Services are too low-level

Web Services are not a good solution space for Web architecture because they attack integration problems at too low a level. It is unlikely that two services independently developed against the WS-* stack will interoperate. That is to say, they will only interoperate if their WSDL files match. HTTP is ironically a higher-level protocol than the protocol that is layered on top of it.

That said, we do not rule out interoperating with such systems if the right WSDL and architectural styles are placed on top of the WS-* stack. We anticipate a "HTTP" WSDL eventually being developed for WS-*, and expect to write a protocol converter back to our internal protocols for systems that implement this WSDL. The sheer weight of expectation behind Web Services suggests that it will be simpler for some organisations to head down this path, than down a path based on HTTP directly.

Tier 2: RDF is part of the problem, not the solution

We view RDF as a non-starter in the machine-to-machine communications space, though we see some promise in ad hoc data integration within limited enterprise environments. Large scale integration based on HTTP relies on clear, well-defined, evolvable document types. While RDF allows XML-like document types to be created, it provides something of an either/or dilemma. Either use arbitrary vocabulary as part of your document, or limit your vocabulary to that of a defined document type.

In the former case you can embed rich information into the document, but unless the machine on the other side expects this information as part of the standard information exchange, it will not be understood. It also increases document complexity by blowing out the number of namespaces in use. In practice it makes more sense to define a single cohesive document type with a single vocabulary that includes all of the information you want to express. However, in this case you are worse off than if you were to start with XML.

You cannot relate a single cohesive RDF vocabulary to any other without complex model-to-model transforms. In short, it is easier to extract information from a single-vocabulary XML document than from a single-vocabulary RDF document. RDF does not appear to solve any part of the system integration problem as we see it. However, again, it may assist in the storage and management of ad hoc data in some enterprises in place of traditional RDBMS technology.

We view the future of the semantic web as the development of specific XML vocabularies that can be aggregated and subclassed. For example, the atom document type can embed the html document type in an aggregation relationship. This is used fo elements such as <title>. The must-ignore semantics of atom also allow sub-classing by adding new elements to atom. The subclassing mechanism can be used to produce new versions of the atom specification that interoperate with old implementations. The mechanism can also be used to produce jargonised forms of atom rather than inventing a whole new vocabulary for a particular problem domain.

We see the development, aggregation, and jargonisation of XML document types as the key mechanisms in the development of the semantic web. The graph-based model used by RDF has currently not demonstrated value in the machine-to-machine data integration space, however higher-level abstractions expressed in XML vocabularies are a proven technology set. We anticipate the formation of communities around particular base document types that work on resolving their jargon conflicts and folding their jargon back into the base document types. We suspect this social mechanism for vocabulary development and evolution will continue to be cancelled out in the RDF space by RDF's reliance URI namespaces for vocabulary and by its overemphasis of the graph model.

Tier 2: MIME types are more effective than URI Namespaces

One the subject of XML, we have some concerns over the current direction in namespaces. The selection of a parser for a document is typically based on its MIME type. Some XML documents will contain sub-documents, however there is no standard way to specify the MIME type of the sub-document. We view MIME as more fully-featured than arbitrary URIs, particularly due to the explicit subclassing mechanism available.

In MIME we can explicitly indicate that a particular document type is based on xml: application/some-type+xml. Importantly, we can continue this explicit sub-typing: application/type2+some-type+xml. We consider this an important mechanism in the evolution of content types, especially when jargonised documents are passed to standard processors. It is normal to expect that the standard processor would ignore any jargon and extract the information available to it as part of standard vocabulary.

While MIME also has its weaknesses, the explicit subclassing mechanism is not available in URI name-spaces at all. To use the atom example, again, atom has a application/atom+xml MIME type but an XML namespace of <https://www.w3.org/2005/Atom>. We view the former as more useful than the latter in the development of the Semantic Web and in general machine to machine integration problems.

Tier 2: Digital Signatures are likely to be useful

We regard the protection of secret data by IP-level or socket-level security measures as being sufficient at this time. Secret data is known and communicated by few components of the architecture, so is usually not a scalability issue. We do not think that secret data should have significant impact on Web architecture, however, we do view the ability to digitally sign non-secret data as a likely enabler for future protocol features.

Conclusion

Web technology and architectural style are proven useful tools for systems integration, but are incomplete. A scalable summarising Publish/Subscribe mechanism is an essential addition to the suite of tools, as is a client profile for operating in High Availability environments. These tools must be defined and standardised in order to gain a wide participation to be useful to the enterprise.

We have concerns about some current trends in Web Architecture. These relate to to namespaces in XML, Web Services, and RDF. All of these trends appear to work against the goal of building integrated architectures from multi-vendor components. Our goal outcomes would also appear to be the goal outcomes of the Semantic Web, so we have some hope that these trends will begin to reverse in the future.

Sun, 2007-Feb-18

REST in short form

I have been working on a restwiki article called REST in Plain English, inspired by a conversation on rest-discuss some time ago. It is still a work in progress, but you might get some mileage out of it. An executive summary is this:

is an architecture that attempts to enforce as few constraints on developers as possible. While that is all well and good in small well-controlled environments, it doesn't scale up. Unconstrained architecture is another way of saying "none of the pieces can talk to each other without prior planning". constrains the architecture down to a set of uniform interactions using uniform document types. Whenever two components of the architecture support the same interaction pattern (GET, PUT, POST, DELETE) and the same document type (html, atom, plain text) they can be configured to communicate without prior planning and without writing new code.

REST increases the likelihood that arbitrary components of the architecture can talk to each other, but also addresses issues of how the architecture can evolve over decades or more of changing demands and how parts of the architecture can scale to huge sizes. It allows for horizontal scalability by limiting the amount of state different cluster members should share. It allows for vertical scalability by layering caches and other intermediataries between clients and servers. It even scales socially, allowing a huge number of both client- and server- side implementations of its protocols to work together.

A the same time, SOA is trying to solve non-web problems. It is trying to solve problems of a single business or a pair of businesses communicating. It is trying to deal with special problems and special use cases. I think that we are on the verge of seeing the architecture of the web combined with the WS-* understanding of enterprise problems. I think we will see a unified architecture that easily scales between these extremes.

Are the IETF and the w3c still the right forums to solve our special problems, or do we need industry and other special interest groups to figure out what best practice is? Once the practice is established these groups can come back and see if their solutions can be applied to the broader Web. I think the spectrum between pure constrained REST and unconstrained enterprise computing needs some shaking up at both ends. I'm happy to see others excited about the possibilities ahead, too.

Benjamin

Tue, 2007-Feb-06

Software Factories - Raising the Level of Abstraction

Today's take-homes:

I have been reading a book during recent business trips to Melbourne called . It is written by Jack Greenfield, Keith Short, Steve Cook, and Stuart Kent. It can be found as ISBN 0-471-20284-3. I borrowed my copy from WRSA V&V heavy, Brenton Atchison.

The main premise of the book is that we need to be developing better, more reliable, and more industrial software through reuse. It notes the failure to date of Object-Oriented approaches to reuse, and attempts to forumlate a path out of the wilderness based on domain specific languages and modelling techniques. It uses a quote from Michael Jackson:

Because we don't talk about problems, we don't analyze or classify them, and we slip into the childish belief that there can be universal development methods, suitable for solving all development problems.

In its chapter on "Dealing with Complexity" this book nails a design principle I had so far never quite expressed clearly. It talks about refinement and abstraction, implementation and requirements as part of a continuum. If you start at the top of a development with a set of requirements and end up with an implementation, the difference between these two specifications can be called an abstraction gap. If the requirements were complete and consistent, why can't they be executed? Simply because they are not code?

Consider that what we think of as code today is not what executes on our physical machines. Software Factories suggests that we should think of our code as as a specification for a compiler. The compiler automatically constructs machine code from that specification making a number of design decisions about how to optimise for space and time along the way. It also transforms our input to improve its efficiency. In other words, it is an automated way of crossing the abstraction gap between our code and the machine's code.

This sets out a general principle for good design, whether the design be encapsulated in a or a General Purpose Language: The purpose of design is to provide language constructs, classes, and other features that lift the level of abstraction from the basic language and library you start with to specific concepts in the requirements domain. The closer you can get to the concepts in the requirements domain, the better.

I happened to overhear a conversation between two co-workers early on in the SystematICS software development. One was trying to explain the difference between a design I had proposed and the implementation the other was trying to write. They said that when I talked about a particular concept being in the code I meant that it was a literal class, not a fuzzy concept held across several classes. It is important in any kind of design to ensure that the constructs you define map directly to concepts in the requirements specification, or enable those direct mappings to be made in other constructs.

Benjamin

Sat, 2007-Feb-03

Copyright and Orphan Works

writes an interesting article on what he thinks should happen to US copyright law to deal with .

The requirement it imposes after the 14/5 year delay is registration... like a DNS for copyright... Any work subject to the OWMR and failing to register within the proper period shall:

  • [ALTERNATIVE 1]: lose copyright protection
  • [ALTERNATIVE 2]: have its copyright remedies curtailed.

The effect of the proposal is that any copyright holder who fails to register their new work gets fourteen years copyright before their work effectively falls into the public domain. If they are still making a buck from the content or care for some other reason, registration grants them either the full copyright period or copyright protection until they fall off the registry.

I know that Lessig is something of a free culture extremest, but this proposal is interesting in how it relates to property-based views of copyright in modern culture.

I think the general feeling of people today is that if someone created a work they have the right to control that work. That control could be limited to what is required to make a dollar, but stronger control is now often accepted through use of content licensing. combined with the is an extreme end of this, where a copyright owner can curtail even fair use rights just by stating in a software contract that they want the content to be used in a particular way.

Whether you take a free culture or non-free culture viewpoint I think Lessig's approach makes sense. Orphaned content reverts to the copyright periods that were envisaged when copyright law was first written. Content that still matters to the author for financial or non-financial reasons gets a copyright period consistent with the kind of investement/return ratio that Disney expect from their creations.

I'm not sure exactly how this proposal would deal with DRM per se. Orphaned DRM content that isn't registered will still not be available for use unless DRM-cracking technology is available and permitted by applicable law in this case. In the era when big content producers are increasingly tightening controls over all produced content using DRM technologies this isn't really an issue that can be ignored.

Benjamin