Sound advice - blog

Tales from the homeworld

My current feeds

Fri, 2007-Jan-19

Breaking Down Barriers to Communication

When the cut-and-paste paradigm was introduced to the desktop, it was revolutionary. Applications that had no defined means of exchanging data suddenly could. A user cuts or copies data from one application, and pastes it into another. Instead of focusing on new baseclasses or IDL files in order to make communication work, the paradigm broke the comunication problem into three separate domains: Identification, Methods, and Document Types. A single mechanism for identification for a cut or paste point combined with a common set of methods and document types allow ad hoc communication to occur. So why isn't all application collaboration as easy and ad hoc cut-and-paste?

The Importance of Architectural Style

The constraints of REST and of the Cut-and-paste paradigm contain significant overlap. REST also breaks communication down into a single identification scheme for the architecture, a common set of methods for the architecture, and a common set of document types that can be exchanged as part of method invocation. The division is designed to allow an architecture to evolve. It is nigh impossible to change the identification scheme of an architecture, though the addition of new identifiers is an every day occurance. The set of methods rarely change because of the impact this change would have on all components in the architecture. The most commonly-evolving component is the document type, because new kinds of information and new ways of transforming this information to data are created all of the time.

The web is to a significant extent an example of the application of REST principles. It is for this reason that I can perform ad hoc integration between my web browser and a host of applications across thousands of Internet servers. It is comparible to the ease of cut-and-paste, and the antithesis of systems that focus on the creation of new baseclasses to exchange new information. Each new baseclass is in reality a new protocol. Two machines that share common concepts cannot communicate at all if their baseclasses don't match exactly.

The Importance of Agreement

Lee Feigenbaum writes about the weakness of REST:

This means that my client-s[i]de code cannot integrate data from multiple endpoints across the Web unless those endpoints also agree on the domain model (or unless I write client code to parse and interpret the models returned by every endpoint I'm interested in).

Unfortunately, to do large scale information integration you have to have common agreed ways of representing that information as data. This includes mapping to a particular kind of encoding, but more than that. It requires common vocabulary with common understanding of the semantics associated with the vocabulary. In short, every machine-to-machine information exchange relies on humans agreeing on the meaning of the data they exchange. Machines cannot negotiate or understand data. They just know what to do with it. A human told them that, and made the decision as to what to do with the data based on human-level intelligence and agreement.

Every time two programs exchange information there is a human chain from the authors of those programs to each other. Perhaps they agreed on the protocol directly. Perhaps a standards committee agreed, and both human parties in the communication read and followed those standards. Either way, humans have to understand and agree on the meaning of data in order for information to be successfully encoded and extracted.

Constraining the Number of Agreements

In a purely RESTful architecture we constrain the number of document types. This directly implies a constraint on the number of agreements in the architecture to a number that grows more slowly than the number of components participating in the architecture. If we look at the temporal scale we constrain the number of agreements to grow less rapidly than the progress of time. If we can't achieve this we won't be able to understand the documents of the previous generation of humanity, a potential disaster. But is constraining the number of agreements practical?

On the face of it, I suspect not. Everywhere there is a subculture of people operating within an architecture there will be local conventions, extensions, and vocabulary. This is often necessary because concepts that are understood within the context of a subculture may not translate to other subcultures. They may be local rather than universal concepts. This suggests that what we will actually have over any significant scale of architecture is a kind of main body which houses universal concepts above an increasingly fragmented set of sub-architectures. Within each sub-architecture we may be able to ensure that REST principles hold.

Solving the Fragmentation Problem

This leaves us, I think, with two outs: One is to accept the human fragmentation intrinsic to a large architecture, and look for ways to make the sub-architectures work with wider architectures. The other is to forget direct machine to machine communications, involving humans in the loop.

We do both already on the web in a number of ways. In HTML we limit the number of universal concepts such as "paragraph" and "heading 3", but allow domain-specific information to be encoded into class attributes, and allow even more specific semantics to be conveyed in plain text. The class attributes need to work with the local conventions of a web site, but could convey semantics to particular subcultures as microformat-like specifications. The human-readable text conveys no information to a machine, but by adding human-level intelligence a person who is connected to the subculture the text came from can provide an ad hoc interpretation of the data into information.

We see this on the data submission side of things too. We see protocols such as atompub conveying semantics via agreement, but we also have HTML forms which can perform ad hoc information submission when a human is in the loop. The human uses their cultural ties to interpret the source doucument and fill it out for submission back to the server.


I don't think that either or the can ignore the two ends of architectural picture fragmented by human subcultures. Without universal concepts that have standard encodings and vocubulary to convey them we can't perform broad scale information integration across the architecture. Without the freedom to perform ad hoc agreement the architecture opens itself up to competition. Without a bridge between these two extremes the vocabulary that should simply be a few local jargon expressions thrown into a widely-understood conversation will become their own languages that only the locals understand. The RDF propensity to talk about mapping between vocabularies is itself a barrier to communication. It will always be cheaper to have a conversation when a translator is not required between the parties for concepts both parties understand.