Sound advice - blog

Tales from the homeworld

My current feeds

Sun, 2006-Oct-22

SOA does not simplify communication

I had the opportunity to attend Sun Developer Day last week, held in Brisbane Australia. The speakers were fairly good. My manager and I attended together. It was funny viewing as a RESTafarian how on one hand the speakers were talking about Web 2.0, the power of community, and how it was developers driving demand for internet services that would help Sun sell hardware and reap their profits. On the other hand they seemed to equate the success of the web with their vision of the future.

The presentation that brought this in the sharpest focus was that of Ashwin Rao, "SOA, JBI, BPEL : Strategy, Design and Best Practices". Ashwin walked us through the SOA Architectural Big rules. I made notes, crossing out the errors and replacing them with the relevant principles. I have left my notepad at work this weekend so can't bring you the exact list that Ashwin presented. In fact, it is interesting to look over the web and see how different the big rules are between presentations by the same company over the course of only a few years. I'll pick up from a similar list:

Coarse-Resource-grained Services
Summary:Services should be objects with lots of methods and represent effectively a whole application.
Discussion:Look at resources, instead. They are at the granularity they need to be based on the content types they support and the application state they demarcate. The granularity of a services doesn't matter, and can evolve. Resources, on the other hand, are the unit of communication and remain stable.
Mostly Asynchronous Interactions
Summary:Everything goes through an intermeditary that performs the actual interaction for you.
Discussion:I can see some value in this, however the complexity is significantly increased. Centering an architecture around this concept seems frought with problems of restricted evolution. Instead, this capability should be a service in its own right that can be replaced.
ConversationalStateless Services
Summary:Conversation state is maintained by coordinator
Discussion:Stateless services scale better. Conversations should always be short (one request, one response) in order to combat internet-scale latency.
ReliableObservable Messaging
Summary:You can tell your coordinator to deliver your message at most once, at least once, etc.
Discussion:This kind of reliable messaging is often not necessary. When it is necessary it is probably better to put in into the HTTP layer rather than the SOAP layer. Most HTTP requests are idempotent, so the problem is not as big as it might initially seem. There are also reasonable techniques already in place for the web to avoid duplicate submissions. At least once semantics are straightforward for a bit of code in the client to do, rather than needing to push things through a coordinator. Just keep trying until you get a result. This also allows the client more freedom as to when it might want to give up. If the client wants to exit early it could still pass this on to another service. Again, putting the message bus in the middle of the architecture seems like a mistake. It should be at the edge for both performance and evolability reasons.
Orchestrated
Summary:You can use BPEL to invoke methods, handle exceptions, etc.
Discussion:Everything is orchestrated. Whether you use BPEL or some other programming language would seem to matter little.
Registered and DiscoveredUniform
Summary:Service descriptions are available alongside services.
Discussion: In SOA you write client code every time someone develops a new WSDL file that you want use in interactions. In REST you write client code every time someone develops a new content type that you want to use in interactions. Either way, the final contract is held in code: Not in the specification. In REST we have a uniform interface. All of the methods mean essentially the same thing when applied to any resource. Resources themselves are discovered through hyperlinks. Content types are where the real specifications are needed. So far there is little evidence that specifications of this kind structured so stringently that machines can read them are of any special advantage over human-oriented text.

SOA seems to be fundamentially about a message bus. This is supposed to ease the burden of communications by moving functionality from the client into the bus. While this is not necessarily a bad thing, it does nothing to solve the two big issues: Scalability and Communication. The message bus does nothing for scalability, and real communication is just as distant in this model as in any earlier RPC-based model.

REST presents an almost completely distinct view of communication to the SOA big rules. You could use one or both or neither. They barely cross paths. REST is about solving the scalability and communications problems. Scalability is dealt with by statelessness and caching. That is one thing, but communications is where REST really makes inroads when compared to RPC.

It separates the concerns of communcation into nouns, verbs, and content. It provides a uniform namespace for all resources. It requires a limited set of verbs be used in the architecture that everyone can agree on. Finally, it requires a limited set of content types in the architecture that everyone who has a reason to understand does understand. It reduces the set of protocols on the wire, rather than providing tools and encouragement to increase the set.

Tim Bray is wrong when he talks about HTTP Verbs being a red herring. REST's separation then constraining of verbs and content types is what makes it a foray into the post-RPC world.

SOA has no equvalent concept. Instead, it concentrates on the transfer of arbitrary messages belonging to arbitrary protocols. It promotes the idea that object-orientation and RPC with their arbitrary methods on arbitrary objects with arbitrary parmeter lists are a suitable means of addressing the communications issue. It seems to accept that there will be a linear growth in the set of WSDL files in line with the size of the network, cancelling the value of participation down to at best a linear curve.

Object-Orientation is known to work within a single version of a design controlled by a single agency, but across versions and across agencies it quickly breaks down. REST addresses the fundamental question of how programmers agree and evolve their agreements over time. It breaks down the areas of disagreement, solving each one in as generic a way as possible. Each problem is solved independently of the other two. It is based around a roughly constant number of specifications compared to the size of the network, maximising the value of participation. By restricting the freedom of programmers in defining new protocols we move towards a world where communication itself is uniform and consistent.

We know this works. Despite all of the competing interests involved in building the web of HTML, it has become stronger and less contraversial over time. It has evolved to deal with the changing demands of its user base and will continue to evolve. RESTafarians predict the introduction of higher-level semantics to this world following the same principles with similarly successful results. SOA still has no internet-scale case study to work from, and I predict will continue to fail beyond the boundaries of a single agency.

Benjamin

Sat, 2006-Oct-07

RESTful Moving and Swapping

Last week I posted my responses to a number of common REST questions. Today I realised that one of my responses needs some clarification. I wrote:

On swapping: This is something of an edge case, and this sort of thing comes up less often than you think when you are designing RESTfully from the start. The canonical approach would be to include the position of the resource as part of its content. PUTting over the top of that position would move it. This is messy because it crosses between noun and content spaces. Introducing a SWAP operation is also a problem. HTTP operates on a single resource, so there is no unmunged way to issue a SWAP request. Any such SWAP request would have to assume both of the resources of the unordered list are held by the same server, or that the server of one of these resources was able to operate on the ordered list.

I think a useful data point on this question can be found in this email I sent to the rest-discuss list today:

Here, I think the answer is subtly wrong due to the question itself containing a subtle bug. The question assumes that it is meaningful to move a resource from one url to another: that the resource has one canonical name at one time and another at another time. However, cool urls don't change.

The question reveals a bug in the underlying URL-space. If it is possible for a vehicle to move from one fleet to another, then its name should not include the fleet it belongs to. The fleet it belongs to should instead be part of the content. That way, changing the fleet is the same kind of operation as changing any other attribute of the vehicle.

The long and the short of it is that anything not forming the identity of a resource should not be part of that resource's URL. The URL's structure should not be used to imply relationships between resources. That is what hyperlinking is for. Whenever you think you have to move a resource from one part of the uri-space to another, you should reconsider your uri-space. It contains a bug.

Ordered lists can exist, but these are resources in their own right and either contain urls in their representations or include the representation of the list contents. A PUT is the correct way to update such a list, replacing its content. This should not change the name of any resource. For optimisation or collision-avoidance reasons it may be appropriate to perform your PUT to a resource that represents only the subset of the list you intend to modify. Alternatively, it may also be appropriate to consider reviving the PATCH http method as a form of optimised PUT.

The fact that PATCH was effectively never used does tell us that this kind of issue comes up rarely in existing REST practice. Perhaps as REST moves beyond the simple HTML web of today the question of content restructuring will become more important. I don't know. What is important, I think, is not to forget the cool URI lesson. Don't include information in a URL no matter how structural unless you can convince yourself that changing or removing that information would cause the URL to refer to a different resource.

Benjamin

Sun, 2006-Oct-01

The Preconditions of Content Type defintion (and of the semantic web)

The crowd have an admirable mechanism for defining standards. It relies on existing innovation. New innovations and the ego that goes with it are kept to a minimum. The process essentially amounts to this:

Find the closest match, and use that as your starting point. Cover the 80% case and evolve rather than trying to dot every "i" all at once. Sort out any namespace clashes with previously-ratified formats to ensure that terms line up as much as possible and that namespaces do not have to be used. Allow extensions to occur in the wild without forcing them into their own namespaces.

I have already blogged about REST being the underlying model of the . Programs exchange data using a standard set of verbs and content types (i.e. ontologies). All program state is demarcated into resources that are represented in one of the standard forms and operated on using standard methods.

This is a new layer in modern software practice. It is the information layer. Below it is typically an object layer, then a modular or functional layer for implementation of methods within an object. The information layer is crucial because while those layers below work well within a particular product and particular version, they do not work well between versions of a particular product or between products produced by different vendors. The information layer described by the REST principles is known to scale across agency boundaries. It is known to support forwards- and backwards- compatible evolution of interaction over time and space.

I think that the the microformats model sets the basic preconditions under which standardisation of content type can be achieved, and thus the preconditions under which the semantic web can be established:

  1. There must be sufficient examples of content available, produced without reference to any standard. These must be based on application need only, and must imply a common schema.
  2. There must be sufficient examples of existing attempts to standardise within the problem space. Noone is smart enough to get it right the first time, and relying on experience with the earlier attempts is a necessary facet to getting it right next time

I think there need to be in the order of a dozen diverse examples from which an implied schema is extracted, and I think in the order of half a dozen existing formats. The source documents are likely to be extracted from thousands in order to achieve an appropriately diverse set. This means that there is a fixed minimum scale to machine-to-machine information transfer on the inter-agency Internet scale that can't be forced or worked around. Need is not sufficient to produce a viable standard.

My predictions about the semantic web:

  1. The semantic web will be about network effects relating to data which is already published with an implied schema
  2. Information that is of an obscure and ad hoc nature or structure will continue to be excluded from machine understanding
  3. The semantic web will spawn from the microformats effort rather than any -related effort.
  4. The nature of machine understanding will have to be simplified in order for the semantic web to be accepted for what it is, at least for the first twenty years or so

RDF really isn't the cornerstone of the semantic web. RDF is too closely aligned to artificial intelligence and high ideals as to how information can be reasoned with generically to be really useful as an information exchange mechanism. Machine understanding will have to be accepted as something which relies primarily on human understanding in the future. It will be more about which widget a program puts a particular data element into than what other data it can infer automatically from the information at hand. One is simply useful. The other is a dead end.

The semantic web is here today, with or without RDF. Even when simple HTML is exchanged, clients and servers understand each other's notations about paragraph marks and other information. The level of semantics that can be exchanged fundamentally rely on a critical mass of people and machines implicitly exchanging those semantics before standardisation and shared understanding begin. The microformat community is right: Chase identification, time and date, and location. Those semantics are huge and enough formats exist already to pick from. The next round of semantics may have to wait another ten or twenty years, until more examples with their implied schemas have been built up.

Benjamin