Sound advice - blog

Tales from the homeworld

My current feeds

Sun, 2008-Feb-10

Mortal Bloggers

Tim Bray brings up the issue of what happens to private web sites after we die. It may be self-important of me, but I think the state has some responsibility for preserving the public works of its citizens. I first wrote about this problem back in January 2007.

My theory is that digital works such as personal web sites should be considered analogous to other published works, such as books. Libraries should be responsible for performing restoration work on the data by moving it to their servers, and by maintaining it and its associated domain registrations thereafter. The cost of maintaining this store in terms of storage and bandwidth should be easily minimised. It seems therefore reasonable that the state should attempt to preserve all the published digital works of its citizens.

On the other hand, perhaps we vanity bloggers would all be better of moving to hosting on free public sites hosted by existing large companies. If the private sector is already meeting the need, there is little chance that government will step in to stop the rot.


Sat, 2008-Feb-09

SQLite and Polling locks


I agree that the SQLITE_BUSY return code is insane. The root cause is that sqlite is not using blocking operating system locks, meaning it has to poll to determine whether it can obtain access or not. Internally it has already retried its locks a number of times before reporting SQLITE_BUSY back to you.

I hacked sqlite 2 for WRSA's internal use to use blocking locks. Unfortunately, I have never gotten around to figuring out whether blocking locks can be introduced to v3 without causing problems. The v3 locking model is much more complex. Most of this feature is held in a single source file (os.c, iirc), so it should be possible for a single human to get their head around it. If anyone does get it together, perhaps it would be worth submitting a patch back to DRH via the mailing list.


Sat, 2007-Jul-14

The War between REST and WS-*

David Chappell posits that the "war" between REST and WS-* is over. The evidence for this is that platforms such as .NET and Java that have traditionally had strong support for WS-* are now providing better tools for working with REST architectures.

The War

Is the war over? I think it we are getting towards the end of the first battle. WS-* proponents who see no value in REST are now a minority, even if some of those who accept REST's value are only beginning to understand it (both REST, and its value). Moreover, I think that this will in the long run be seen as a battle between Object-Orientation and REST. That battle will be fought along similar lines to the battle between Structured Programming and Object-Orientation.

In the end, both Object-Orientation and Structured Programming were right for their particular use case. Object-Orientation came to encapsulate structured programming, allowing bigger and better programs to be written. We still see structured programming in the methods of our objects. The loops and conditionals are still there. However, objects carve up the state of an application into manageable and well-controlled parts.

My view is that the REST vs Object-Orientation battle will end in the same way. I believe that REST architecture will be the successful style in large information systems consisting of separately-upgradable parts. I take the existing Web as evidence of this. It is already the way large-scale software architecture works. REST accommodates the fact that different parts of this architecture are controlled by different people and agencies. It deals with very old software and very new software exist in the same architecture. It codifies a combination of human and technical factors that make large-scale machine cooperation possible.

The place for Object-Orientation

We will still use Object-Orientation at the small scale, specifically to build components of REST architecture. Behind the REST facade we will build up components that can be upgraded as a consistent whole. Object-Orientation has been an excellent tool for building up complex applications when a whole-application upgrade is possible. Like traditional relational database technology, it is a near perfect solution where the problem domain can be mapped out as a well-understood whole.

Hard-edged Object-Orientation with its crinkly domain-specific methods finds it hard to work between domains, and nearly impossible to work across heterogeneous environments where interacting with unexpected versions of unexpected applications is the norm. Like Structured Programming before it, the Object-Orientated abstraction can only take us so far. An architectural style like REST is required to build much larger systems with better information hiding.


To me, the "war" is over. REST wins on the big scale. Object-Orientation and RDBMS win at the small scale. The remaining battlefield is the area between these extremes. Do we create distributed technology based on Object-Oriented principles using technology like Corba or WS-*, or do we construct it along REST lines?

Like David, I see the case for both. Small well-controlled systems may benefit from taking the Object-Oriented abstraction across language, thread, process, and host boundaries. However, I see the ultimate value of this approach as limited. I think the reasons for moving to a distributed environment often relate to the benefits which REST delivers at this scale, but Object-Orientation does not.


Mark Baker picks up on a specific point in David's article. David says that REST is fine for CRUD-like applications. Mark essentially counters with "incorporate POST into your repertoire". This is where I have to disagree. I think that any operation that is sensible to do across a network can be mapped to PUT, DELETE, GET, or SUBSCRIBE on an appropriate resource. I see the argument that POST can be used to do interesting things as flawed. It is an unsafe method with undesirable characteristics for communication over an unreliable network.

My CRUD mapping is:

C or U

I then add trigger support as SUBSCRIBE.

Mark's example is the placement of a purchase order. I would frame this request as PUT my-purchase-order. This request is idempotent. My order will be placed once, no matter how many times I execute it. All that is needed is for the server to tell me which URL to use before I make my request, or for us to share a convention on how to select new URLs for particular kinds of resources. Using POST for this kind of thing also has the unfortunate effect of creating multiple ways to say the same thing, something that should be avoided in any architecture based on agreement between many individuals and groups.

In my view, the main problem with the CRUD analogy is that it implies some kind of dumb server. REST isn't like this. While your PUT interaction may do something as simple as create a file, it will be much more common for it to initiate a business process. This doesn't require a huge horizontal shift for the average developer. They end up with the same number of url+method combinations as they would have if they implemented a non-standard interface. All they have to do is replace their "do this" functions with "make your state this" PUT requests to an appropriate url. REST doesn't change what happens behind the scenes in your RDBMS or Object-Orientated implementation domain.


Sun, 2007-Jun-17

On ODBMS versus O/R mapping

Debate: ODBMS sometimes a better alternative to O/R Mapping?

Objects see databases as memento and object-graph storage. Databases see objects as data exposed in table rows. RDF databases see objects data exposed in schema-constrained graphs. The private of one is the public of the other. The benefits of each conflict with the design goals of the other.

Perhaps REST is the middle ground that everyone can agree on. Objects interface easily using REST. They simply structure their mementos as standard document types. Now their state can easily be stored and retrieved. Databases interface easily using REST. They just map data to data. So the data in an object and the data in a database don't necessarily have precisely-matched schemas. They just map to the same set of document types and these document types define the O-R mapping. The document type pool can evolve over time based on Web and REST principles, meaning that tugs from one side of the interface don't necessarily pull the other side in exactly the same direction.

If O-R mapping is the Vietnam of computer science, perhaps we should stop mapping between our object and our relational components. Perhaps we should start interfacing between them, instead.


Thu, 2007-May-17

Simplifying Communication

Udi Dahan quotes an email I sent him some time ago when I was trying to get to grips with the fundamentals of SOA in contrast to the fundamentals of REST. He refers to it in a corresponding blog entry: Astoria, SDO, and irrelevance

I concur that adding a REST-like front end to a database isn't a particularly useful thing to do. HTTP is not SQL. It doesn't have transactions. Attempts to add them are unRESTful by the definition of REST's statelessness constraint, or at least to be approached with caution. Udi says that getting data out the REST way is fine... but updating it using a PUT requires a higher level of abstraction. Where I differ from Udi is that he says a higher level of abstraction is required than PUT. I suggest that a PUT to a resource that is pitched at a higher level of abstraction is what is usually required.

Let's take an example. You have a database with a couple of tables. Because we are in a purely relational environment, our customer information is split across these tables. We might have several addresses for each customer, lists of items the customer has bought recently, etc.

Exposing any one row or even a collection of rows from any one of these tables as a single resource is frought with problems. You might need to GET several aspects of the customer's data set in order to form a complete picture, and the GETs could occur across transaction boundaries. You will very likely one day end up with a data set that is inconsistent.

PUT requests to such a low-level object also run us into problems. Any update that requires multiple PUT requests to be successful runs the risk of leaving the database in a temporarily- or permanently- inconsistent state.

The answer here is to raise the level of abstraction. We could introduce transactions to our processing, but this increases complexity and reduces scalability. While it may be the right approach in many situations, it is usually better in client/server environments to expose a simplified API to clients. We don't really want them to know too much about our internal database structure, so we give them a higher-level abstraction to work with.

In this case the starting point would likely be the creation of a customer object or customer resource. In the SOA world where methods and parameter lists are unconstrained, we might have a getTheDataIWantForThisCustomer method and corresponding updateThisDataIHaveForThisCustomer method. In REST, you would do pretty much the same thing. Except in REST, the methods would be GET and PUT to a URL of a widely-understood content type.

So which is better? I would suggest that the REST approach is usually the best one. It can take a little time and research to come up with or to adopt the right content type, but you will be set up for the long-term evolution of your architecture. In the SOA world you'll need to change your baseclass eventually, leading to a proliferation of methods and parameter lists. In the constrained REST world we use well-understood mechanisms for evolving the set of methods, urls, and content types independently.

In the end, REST is very much like SOA. Whatever you are about to do in your SOA you can usually do the same thing with REST's standard messaging rather than by inventing new ad hoc messages for your architecture. Your REST architecture will evolve and perform better, and require less code to be written or generated on both the client and server sides of your interface. For me, the fundamental constraint of REST is to work towards uniform messaging by decoupling the method, data, and address parts of each message. Most other constraints of REST (such as statelessness) are good guidelines that any architect should instinctively apply wherever they are appropriate, and nowhere else.

While we are not using the same terms and are not applying technology in the same way, I don't think that Udi and I are thinking all that differently.


Tue, 2005-Aug-30

Arbitrary Methods in HTTP

Thanks to Robert Collins for your input on my previous blog entry about using the method name in a HTTP request as effectively a function call name. Robert, I would have contacted you directly to have a chat before publically answering your comments, but I'm afraid I wasn't able to rustle up an email address I was confident you were currenly prepared to accept mail on. Robert is right that using arbitrary methods won't help you get to where you want to go on the Internet. Expecting clients to learn more than a vocublary of "GET" is asking a lot already, so as soon as you move past the POST functions available in web forms you are pretty much writing a custom client to consume your web service. The approach is not RESTful, doesn't fit into large scale web architecture, and doesn't play nice with firewalls that don't expect these oddball methods.

My angle of attack is really from one of a controlled environment such as a corporate intranet or a control system in some kind of industrial infrastructure. The problems of large scale web architecture and firewalls are easier to control in this environment, and that's why CORBA has seen some level of success in the past and SOAP may fill a gap in the future. I'm not much of a fan of SOAP, and the opportunities that dealing with a function call as (method, resource, headers, body), or to my mind as (function call, object, meta, parameter data) are intriguing to me. Of particular interest is of how to deal with backwards and forwards-compatability of services through a unified name and method space and the ability to transmit parameter data and return "return" data in various representations depending on the needs and age of the client software.

I'm also interested in the whether the REST approach (or variants of it) can be scaled down to less-than-internet scale, and indeed less-than-distributed scale. I'm curious as to what can happen when you push the traditional boundaries between these domains about a little. I think it's clear that the traditional object model doesn't work on the Internet scale, so to my mind if we are to have a unified model it will have to come back down from that scale and meet up with the rest of us somewhere in the middle. I think the corporate scale is probably where that meeting has to first take place.

My suggestion is therefore that at the corporate scale a mix of restful and non-restful services could cooexist more freely if they could use HTTP directly as their RPC mechanism. Just a step to the left is actual REST, so it is possible to use it wherever it works. A step to the right is traditional Object-Orientation, and maybe that helps develop some forms of quick and dirty software. More importantly from my viewpoint it might force the two world views to acknowledge each other, in particular the strengths and weaknesses possessed by both. I like the idea that on both sies of the fence clients and servers would both be fully engaged with HTTP headers and content types.

I'm somewhat reticent to use a two-method approach (GET and POST only). I don't like POST. As a non-cachable "do something" method I think it too often turns into a tunneling activity rather than a usage of the HTTP protocol. When POST contains SOAP the tunnelling effect is clear. Other protocols have both preceeded and followed SOAP's lead by allowing a single URI to do different things when posted to based on a short string in the payload. I am moderately comfortable with POST as a DOIT method when the same URI always does the same thing. This is effectively doing the same thing as python does when it makes an object callable. It consistently represents a single behaviour. When it becomes a tunnelling activity, however, I'm less comfortable.

Robert, you mention the activity of firewalls in preventing unknown methods passing through them. To be honest I'm not sure this is a bad thing. In fact, I think that hiding the function name in the payload is counter-productive as the next thing you'll see is firewalls that understand SOAP and still don't allow unknown function names passing through them. You might as well be up-front about these things and let IT policy be dictated by what functionality is required. I don't think that actively trying to bypass firewalling capabilities should be the primary force for how a protocol develops, although I understand that in some environments it can have pretty earth-shattering effects.

In the longer term my own projects should end up with very few remaining examples of non-standard methods. As I mentioned in the earlier post I would only expect to use this approach where I'm sending requests to a gateway onto an unfixably non-RESTful protocol. REST is the future as far as I am concerned, and I will be actively working towards that future. This is a stepping-off point, and I think a potentially valuable one. The old protocols that HTTP may replace couldn't be interrogated by firewalls, couldn't be diverted by proxies, and couldn't support generic caching.

Thanks to Peter Hardy for your kind words also. I'll be interested to hear your thoughts on publish/subscribe. Anyone who can come up with an effective means of starting with a URI you want to subscribe to and ending up with bytes dribbling down a TCP/IP connection will get my attention, especially if they can do it without opening a listening socket.