Sound advice - blog

Tales from the homeworld

My current feeds

Sun, 2005-Apr-03

Wouldn't it be nice to have one architecture?

We seem to be evolving towards a more open model of interaction between software components. Client and server are probably speaking HTTP to each other to exchange data in the form of XML, or some other well-understood content. Under the REST architectural style the server is filled with objects with well-known names as URIs. Clients GET, and sometimes PUT but will probably POST to the server.

The uri determines five things things. The protocol looks something like http://. The host and port number look something like my.host:80, or maybe my.host:http. The object name looks like /foo/bar/baz, and the query part looks like ?myquery. That's fine for http over a network with a well-known host and port name. I think it might fall down a little in more tightly-coupled environments.

Let's take it down a notch from something you talk to over the internet to something on the local LAN. A single server might offer multiple services, perhaps it could provide not just regular web data to clients but provide information about system status such as the state of various running processes. Perhaps it has an application that provides access to time data to replace the old unix time service. Perhaps it has an application to provide a quote of the day, or a REST version of SMTP. The server is left with an unpleasant set of options. It can let the programs run independently, each opening distinct ports in the classic UNIX style. The client must then know the port numbers, and needs to negotiate them out of band (IANA has helped us do this in the past). If that's no good, and you want to have a single port open to all of these applications you start to introduce coupling. You either operate like inet or a cgi script and exec the relevant process after opening the connection, or you make all of your processes literally part of the one master process using serverlets.

Not so bad, you say. There are still options there, even if the traditional web approach and the traditional unix approaches differ. You can even argue that they don't differ and that unix only ever intended to open different ports when different protocols are in use. We've now agreed on a simple standard protocol that everyone can understand the gist of, even if you need to know the XML format being transported intimately to actually extract useful data out of the exchange.

In a way, the REST concept introduces many more protocols than we are used to dealing with. Like other advances in architecture development it takes out the icky bits and says: "Right! This is how we'll do message exchange". It then leaves the content of the messages down to individual implementors of problem domains to work out for sure. It builds an ecosystem with a common foundation rather than trying to design a city all in one go.

Anyway, back to the options. When you have multiple applications within a single server the uncoupled options look grim. How do I let my client know that the service they want is available on port 8081? Dns allows me to map my.host to an IP address, but does not cover the resolution of port identifiers. That's left to explicit client-side knowledge, so a client can only reasonably query http://my.host:dict/ if we have previously agreed that dict should appear in their /etc/services file. It's much more likely that we can agree to a URI of http://my.host/dict on the standard HTTP port of 80.

This leaves us with the options of either having an inet-equvalent process starting new a new process for each connection made to the server, or making the application a serverlet. The first option is unsatisfying because it doesn't allow a single long-running program to answer the data, and we need to introduce other interprocess communication mechanisms such as shared memory if forked instances of the same process want to share or distribute processing. You can see this conflict in in application like SAMBA. You get a choice between executing via inet for simplicity and ease of adminstration or executing as standalone processes for improved performance. The second option is to me fairly unsatisfying because it introduces coupling between otherwise unrelated applications. In fact, there's a third option. You could have the server process answer the queries by itself quering back-end applications in weird and wonderful ways. That approach is limited because the server may become both a bottleneck and a single point of failure. When all of the data in your system flows through a single process... well... you get the point.

You can see where I'm headed. If I'm uncomfortable with how you would offer a range of different services in a small LAN scenario, imagine my disquiet over how applications should talk to each other within the desktop environment!

I think the REST architecture remains sound. You really want to be able to identify objects some of which may be applications... others of which may represent your files or other data. You want to be able to send a request that reads something like local://mydesktop/mytaxreturn.rdf?respondwith=taxable-income. There's some sensitive data in this space, so you may feel as I do that opening network sockets to access it is a bit of a worry. Even opening a port on 127.0.0.1 may allow other users of the current machine to access your data. A unix named pipe might work, but may not be portable outside of the unix sphere and may be hard to specify in the URL. After all, how you say "speak http to file:///home/me/desktopdata, and request the tax return uri you find there"? You also start running into the set of options for serving your data that you had with the small LAN server. How do you decouple all of the services behind the access-point name in your URI?

So, let's start again and try to abstract that REST architecture. To me it appears decomposable into the following applications:

  1. A client with a request verb and a URI including protocol, access point, and object identifier
  2. An access point broker that can interpret the access point specification and return a file descriptor
  3. A server with a matching URI

It seems that DNS is a fine access point broker for web servers that all live on the same port. An additional mechanism might still be useful for discovering the port number to connect to by name when multiple uncoupled services are on offer. A new access point broker would be needed for the desktop. A new URI encoding scheme might be avoidable if the access broker is able to associate a particular named pipe with a simpler name such as "desktop:kowari", making a whole address look like http://desktop:kowari/mydatabase. Clients would need to be updated to talk to the appropriate access point provider, which I suggest would have to be provided through a shared library like the one we currently use with DNS. Servers would need to open named pipes instead of network sockets, and may need additional protocol to ensure one file descriptor is created per local "connection".

The definition of the access point is interesting in and of itself. What happens when access point data changes? Can that information be propageted to clients so they know to talk to the new location rather than the old? Can you run the same service redundantly so that when one fails the information of the second instance is propagated and clients fail over without spending more than a few seconds thinking about it?

REST is an interesting model for coordinating application interactions. It seems to work well in the loosely-couple large scale environments it was developed for. I like to see it work on the smaller scale just as well, and to see the difference made transparent to both client and server.

Benjamin

P.S. Is it just me, or is there no difference between SOAP over HTTP and REST POST? In fact, it seems to me that an ideal format for messages to and from the POST request could be SOAP. Am I missing some things about REST? I think I understand the GET side fine, but the POST I'm really not sure about...