Sound advice - blog

Tales from the homeworld

My current feeds

Sun, 2005-Dec-11

The Semantic Spectrum

In software we work with a spectrum of semantics, from the general to the application specific. General semantics gives you a big audience and user base but can be lacking in detail. Application-specific semantics explain everything you need to know in a dialect only a few software components can speak. This is the divide between Internet technologies and Enterprise software.

Encapsulating the bad

In the beginning we worked with machine code. We abstracted up through assembly languages and into structured programming techniques. Structured programming is the use of loops and function calls that allowed us to decompose a linear problem into a sequence of repeating steps. Structured programming ruled the roost until we found that when function 'A' and function 'B' both operate on the same data structures, allowing them to be modified by different programmers tended to break our semantic data models.

Object-orientation added a new level of abstraction, but preserved the structured model for operations that occur within an object. It took the new approach of encapsulating and building on the structured programming layer below it rather than trying to create an entirely new abstraction. Object orientation allowed us to decompose software in new ways (that's the technique rather than any claimed O-O language). We could describe the semantics of an object separately to its implementation, and could even share semantics between differing implementations. The world was peachy. That is, until we found that corba doesn't work on the Internet.

Corba on the Internet

Corba was an excellent attempt to extend the Object-Orientated model to the network. It was a binary format, and some claim that is the reason it failed to gain traction. Others blame the bogging down in standardisation committees. Two technologies exploded in use on the Internet. The use of XML documents to describe ad hoc semantics was a powerful groundswell, however the real kicker was always the web server and web browser.

What was the problem? Why wasn't the Object-Oriented model working? Why weren't people browsing the web with a CORBA browser instead?

I think it is a question of sematics. Object-Orientation ties down semantics in interface classes and other tightly-defined constructs. This leads to problems both with evolvability and with applicability.


Tightly-defined interface classes support efficient programming models well, but this seems to have been at the cost of evolvability. Both HTTP and HTML have must-ignore semantics attached whenever software fails to understand headers, tags, or values. This means that new semantics can be introduced without breaking backwards-compatability, so long as you aren't relying on those semantics being understood by absolutely everone. In terms of Object-Orientation this is like allowing an interface class to have new methods added and called without breaking binary compatability. The use of XML gives developers a tool to help take this experience on board and apply it to their own software, but there is a bigger picture. XML has not been particularly successful on ther Internet yet, either. To see success we must look at that web browser.


A web browser turns our interface class inside out. Instead of communicating application semantics it is based on semantics with a much wider applicability: presentation. The web became successful because an evolvable uniform interface has been available to transport presentation semantics that are good enough to conduct trade and transfer information between a machine and a human.

Looking at the early web it might be reasonable to conclude that general semantics need to be in the form of presentation. This could be modelled as a baseclass with a single method: "renderToUser(Document d)". However, this early concept as started to evolve in curious ways. The semantic xhtml movement has started to hit its mark. The "strict" versions of html 4.1 and xhtml 1.0 shun any kind of presentation markup. Instead, they focus on the structure of a html document and leave presentation details to css. This has benefits for a range of users. Speech synthesizer software is likely to be less confused when it sees a semantic html document, improving accessability. Devices with limited graphical capability may alter their rendering tehcniques. Search engines may also find the site easier to navigate and process.

We can see in the web that presentation semantics are widely applicable, and this contributes to the success of the web. To see widely applicable non-presentation semantics we have to move above the level of simple semantic xhtml into the world of microformats or outside of the html world completely. We already see widely applicable semantics emerging out of formats like rss and atom. They move beyond pure presentation and into usefully specific and generaly-applicable semantics. This allows for innovative uses such as podcasting.

Worlds apart

The semantics of html or atom and the semantics of your nearest Object-Oriented interface class are light years apart from each other, but I think if we can all learn each other's lessons we'll end up somewhere in the middle together. On one hand we have children who grew up in an Object-Orientated mindset. These folk start from a point of rigidly-defined application-specific semantics and try to make their interface classes widely applicable enough to be useful for new things. On the other side we have children who group up in the mindset of the web. They are starting from point of widely applicable and general tools and try to make their data formats semantically rich enough to be useful for new things. Those on our left created SOAP. Those on our right created microformats. Somewhere in the middle we have the old school RDF semantic web folk. These guys created a model of semantics and haven't really managed to take things any further. I think this is because they solve neither the application-specific semantics problems nor ther generally-applicable presentation problems. Without a foothold in either camp they can act as independent umpire, but have yet to really make their own mark.


It looks like the dream of a semantic web is a long way off. It isn't because building a mathematical model of knowledge is insovable. Good inroads have been made by semwebbers. It's just that it isn't useful in and of itself, at least not today. The things that are useful are the two extremes of web browsers and of tightly-coupled object-oriented programming models. Both are proven, but neither defines a semantic web. The trouble is the dual goals of having general semantics and useful semantics are usually at odds with each other. The places that these goals meet are not in ivory tower owl models, but in real application domains. Without a problem to solve there can be no useful semantics. Without a problem that many people face there can be no general semantics. Over the next ten years building the semantic web will be a process of finding widely-applicable problems and solving them. It will require legwork more than the development of philosophy, and people will need be in the loop for most problem domains. True machine learning and useful machine to machine interaction are still the domain of Artificial Intelligence research and won't come into being until we have convincingly solved the human problems first.