Sound advice

The microformats crowd have an admirable mechanism for defining standards. It relies on existing innovation. New innovations and the ego that goes with it are kept to a minimum. The process essentially amounts to this:

Looking for existing formats that overlap the subject area you are attempting to standardise
Looking for already-published data (or perhaps software that already produces data) linked to the subject area

Find the closest match, and use that as your starting point. Cover the 80% case and evolve rather than trying to dot every "i" all at once. Sort out any namespace clashes with previously-ratified formats to ensure that terms line up as much as possible and that namespaces do not have to be used. Allow extensions to occur in the wild without forcing them into their own namespaces.

I have already blogged about REST being the underlying model of the semantic web. Programs exchange data using a standard set of verbs and content types (i.e. ontologies). All program state is demarcated into resources that are represented in one of the standard forms and operated on using standard methods.

This is a new layer in modern software practice. It is the information layer. Below it is typically an object layer, then a modular or functional layer for implementation of methods within an object. The information layer is crucial because while those layers below work well within a particular product and particular version, they do not work well between versions of a particular product or between products produced by different vendors. The information layer described by the REST principles is known to scale across agency boundaries. It is known to support forwards- and backwards- compatible evolution of interaction over time and space.

I think that the the microformats model sets the basic preconditions under which standardisation of content type can be achieved, and thus the preconditions under which the semantic web can be established:

There must be sufficient examples of content available, produced without reference to any standard. These must be based on application need only, and must imply a common schema.
There must be sufficient examples of existing attempts to standardise within the problem space. Noone is smart enough to get it right the first time, and relying on experience with the earlier attempts is a necessary facet to getting it right next time

I think there need to be in the order of a dozen diverse examples from which an implied schema is extracted, and I think in the order of half a dozen existing formats. The source documents are likely to be extracted from thousands in order to achieve an appropriately diverse set. This means that there is a fixed minimum scale to machine-to-machine information transfer on the inter-agency Internet scale that can't be forced or worked around. Need is not sufficient to produce a viable standard.

My predictions about the semantic web:

The semantic web will be about network effects relating to data which is already published with an implied schema
Information that is of an obscure and ad hoc nature or structure will continue to be excluded from machine understanding
The semantic web will spawn from the microformats effort rather than any RDF-related effort.
The nature of machine understanding will have to be simplified in order for the semantic web to be accepted for what it is, at least for the first twenty years or so

RDF really isn't the cornerstone of the semantic web. RDF is too closely aligned to artificial intelligence and high ideals as to how information can be reasoned with generically to be really useful as an information exchange mechanism. Machine understanding will have to be accepted as something which relies primarily on human understanding in the future. It will be more about which widget a program puts a particular data element into than what other data it can infer automatically from the information at hand. One is simply useful. The other is a dead end.

The semantic web is here today, with or without RDF. Even when simple HTML is exchanged, clients and servers understand each other's notations about paragraph marks and other information. The level of semantics that can be exchanged fundamentally rely on a critical mass of people and machines implicitly exchanging those semantics before standardisation and shared understanding begin. The microformat community is right: Chase identification, time and date, and location. Those semantics are huge and enough formats exist already to pick from. The next round of semantics may have to wait another ten or twenty years, until more examples with their implied schemas have been built up.

Benjamin

Sound advice - blog

Lifesigns

Subscribe

RDF

Feedback and Social Software

Support Software

Site Statistics

License

My recent bookmarks

The Preconditions of Content Type defintion (and of the semantic web)