Sound advice - blog

Tales from the homeworld

My current feeds

Sat, 2005-Dec-31

Paid Free Software

hAtom is currently stalled, awaiting a meeting of minds on the subject of css class names. I have been using the progress of hAtom and the related hAtom2Atom.xsl (Luke's repo, Robert's repo) to soak up excess energy over the first two weeks of my three week holiday. With that on hold I have energy to burn on two further projects. One is my old attempt at building a home accounting package. The other is trying to reason out a business model for paid free software.

Free Software is Efficient Software

Free software is a movement where the users of a piece of software are also its contributors. Contributors are secure in knowing the benefits they provide for others by their contribution will flow back to them in the form of contributions from others. With a reciprocal license such as the GPL a contributor knows their code won't be built upon by their competitors in ways that give their competitors advantage against them.

If everyone contributes to the same baseline the community of contributors benefits. The interests of the majority of contributors are served the majority of the time. When the interests of the group diverge, forking appears. These forks are also produced efficiently by body of contributors who remain on each side of the forked divide. Noone has to start from scratch. Everyone still has access to the original code base. Patches can still be transferred between the forks in areas where common need is still felt. If the reasons for divergence disappear, the contributor pool can even fold back together and form a common base once again.

Free software is not just a hippy red socialist cult. It is a free market of excess developer hours. Developers contribute, and gain benefits for themselves and for others. Projects are run efficiently along lines contributors define either explicitly or implicitly. The fact that so much free software has been developed and used based primarily on the contribution of individuals' spare time is a testament to how much can be achived with so little input.

Contributors of Money

Contributions in the free software world are almost exclusively made in terms of working code, or theoretical groundings for working code. There are areas such as translation for internationalisation and web site design and maintenence that come into it as well... but this is primarily a market for excess skills that are not exhausted in the contributor's day job. I see free software's efficiency as a sign that it can and should displace closed software development models in a wide range of fields. One way in which this might be accelerated is if a greater excess of skills could be produced. Perhaps your favourite web server would get the feature you want it to have faster if you paid to give a software developer an extra few hours off their day job.

There are reasons to be wary of accepting money for free software, but I think we can work around them. Full time paid staff on a free software development can gratis developers away, so we should pay our developers in proportion to the needs of money contributors that they meet. Boiled down to the essence of things, a money contributor should pledge to and pay the developer who fixes the bugs this contributor wants fixed. That's capitalism at work!

Running a Macro-money system (the business model)

In practice, the issue of resolving which individual should be paid what amount is more complex than matching a contributor to a developer. Perhaps the reviewer or the applier of the patch should get a slice. Perhaps the project itself should, or the verifier of the bug. I delved into some of the issues invovled in my earlier article, "Open Source Capitalism". I think there is a business to be made here, but it isn't going to be done by making decisions at this level. This is a project decision that should be made by projects. The business model at our global level is simple:

  1. Set up an eBay-style website
  2. Allow free software projects to sign on to be part of it (sign some up before your launch!)
  3. Allow followers of the registered projects to identify bugs, and attach money to them
  4. Monitor the bug tracking systems of each project, and pay the project out when the bug is verified

Two sources of income are available, percentages and earned interest. It seems reasonable to take a small cut of any money transferred through the site to cover expenses. Requiring money to be paid up front by contributors has several advantages, including the project doesn't achieve its goal only to find no reward waiting for it. The benefit to us, however, is the income stream that can be derived through interest earnt on unpaid contributions. Depending on the average hold time of a contributed dollar this could account for a reasonable portion of costs, and if excessive might have to be paid back in part to projects. If the business model is workable at all, keeping costs low from the outset will stave off competition and build project and contributor loyalty.

Projects define policy as to who (if anyone) money acquired in this way is distributed to. Projects also define policy on which bugs pledges can be placed upon (typically, only accepted bugs would be allowed), and when a bug is considered "verified". Policies must be lodged with the website and available for contributor perusal before any contributions are made. Contributors who feel their bugs have been resolved well can leave positive feedback. Contributors who feel their bugs have been resolved poorly or not resolved can leave negative feedback. These ratings are designed to ensure projects provide clear policy and stick to it, however all contributions are non-refundable barring special circumstances.

I don't know the legal status of all of this, particularly the tax implications or export control implications of dealing with various countries around the globe. Expert advice would certainly be required and money invested up front. Community outrage at the suggestion would also be a bad thing. Discussions and negotitions should occur early and the project probably can't proceed without gnome, kde, and mozilla all signed up before you get into it. Another angle of attack would be to sign up sourceforge, however they may see the system as a competitor of sorts to their paypal donations revenue stream. If you were to sign them up I think payments to sf projects would have to go via the sf paypal.

Conclusion

Consider this an open source idea. I would love to be involved in making something like this a reality, but I don't have the resources to do it myself. In fact, I don't necessarily bring any useful expertise or contacts to the table. Nevertheless, if you are in a position to make something like this happen I would like to hear about it. I might want to buy a share, if nothing else. Dear Lazyweb: Does anyone have a wiki space that could be used to develop this concept?

Benjamin

Wed, 2005-Dec-21

RESTful Blogging

Tim Bray and Ken MacLeod disagree (Tim's view, Ken's view) on how to define a HTTP for posting, updating, and deleting blog entries. The two approaches are to use GET and POST to do everything on one resource, or to use GET, PUT and DELETE on individual entry resources. They are both horribly wrong. The correct approach is this:

GET on feed resource
GETs the feed
PUT on feed resource
Replaces the entire feed with a new feed representation
DELETE on feed resource
Deletes the feed
POST on feed resource
Creates a new entry, and returns its URI in the Location header
GET on entry resource
GETs the entry
PUT on entry resource
Replaces the entry with a new entry representation
DELETE on entry resource
Deletes the entry
POST on entry resource
Creates a new comment, and returns its URI in the Location header
GET on comment resource
GETs the comment
PUT on comment resource
Replaces the comment with a new comment representation
DELETE on comment resource
Deletes the comment
POST on comment resource
Creates a new sub-comment, and returns its URI in the Location header

Do you see the pattern? Everything you operate on directly gets a URI. That thing can be replaced by PUTting its new value. It can DELETEd using the obvious method. You can GET it. If it is a container for children it supports the POST of a new child. You don't confuse PUT and POST, and everyone is happy.

I don't know what Tim thinks is clear and obvious, and I don't know what Ken thinks is REST, but isn't this both?

In fairness to both parties their original blog entries both date back to 2003. The reason this has come across my desk is a "REST"-tagged del.icio.us link to this intertwingly.net poll.

Benjamin

Wed, 2005-Dec-21

Integrate, don't go Orange!

Tim Bray has adopted the orange feed link on his website, but isn't happy about it:

Despite that, it is a bad idea; a temporary measure at best. Based on recent experience with my Mom and another Mac newbie, this whole feed-reading thing ain’t gonna become mainstream until it’s really really integrated.

emphais Tim's. Tim suggests having a standard button somewhere on the browser for a one-click subscribe. I take a diffent tact. I think we should be going really really really integrated. After all, what is a feed reader, except a smart web browser?

When presented with a web page that is associated with atom data a real web browser could keep track of which ones you have read and mark them as such. Given a list of feeds to subscribe to, it could periodically cache them like a feed reader does so that you can read them without waiting. Perhaps you could even view poritions of the sites using a standard font and layout engine like the feed readers do. There really isn't any reason why the two applications have to be separate. Feed readers today are used fairly differently to web browsers, but it needn't necessarily be that way forever. I can see more and more of the web being consumed in this feed-centric way of viewing, and keeping up with changes.

One of the technical challenges today is that an atom feed and a web site are in different formats. It is difficult to relate an atom entry with a section of a web page. Someday soon, however, hAtom of microformats.org fame will reach maturity. This is an evolution of the Atom specification that is embeddable within a web page. A browser will soon be able to identify blog entries within a page, and a feed reader will soon be able to use the same page as your web browser sees as input.

So let's not talk about integrated web browsing and feed reading, then in the same breath about the two being provided by separate applications. The time of separation will one day come to a close. The web is changing in a way that is becoming more semantic, and I for one see the web browser staying near the forefront of that change. What was not possible a few years ago when browser innovation had gone stale is now becoming a reality again: An interative web. A read/write web. A web 2.0? I think we will experiment with new semantics in feed readers and other new ways of seeing the web, but that these will eventually folded back into the one true web. The data formats will be folded back, and eventually so will the applications themselves.

Benjamin

Sun, 2005-Dec-11

The Semantic Spectrum

In software we work with a spectrum of semantics, from the general to the application specific. General semantics gives you a big audience and user base but can be lacking in detail. Application-specific semantics explain everything you need to know in a dialect only a few software components can speak. This is the divide between Internet technologies and Enterprise software.

Encapsulating the bad

In the beginning we worked with machine code. We abstracted up through assembly languages and into structured programming techniques. Structured programming is the use of loops and function calls that allowed us to decompose a linear problem into a sequence of repeating steps. Structured programming ruled the roost until we found that when function 'A' and function 'B' both operate on the same data structures, allowing them to be modified by different programmers tended to break our semantic data models.

Object-orientation added a new level of abstraction, but preserved the structured model for operations that occur within an object. It took the new approach of encapsulating and building on the structured programming layer below it rather than trying to create an entirely new abstraction. Object orientation allowed us to decompose software in new ways (that's the technique rather than any claimed O-O language). We could describe the semantics of an object separately to its implementation, and could even share semantics between differing implementations. The world was peachy. That is, until we found that corba doesn't work on the Internet.

Corba on the Internet

Corba was an excellent attempt to extend the Object-Orientated model to the network. It was a binary format, and some claim that is the reason it failed to gain traction. Others blame the bogging down in standardisation committees. Two technologies exploded in use on the Internet. The use of XML documents to describe ad hoc semantics was a powerful groundswell, however the real kicker was always the web server and web browser.

What was the problem? Why wasn't the Object-Oriented model working? Why weren't people browsing the web with a CORBA browser instead?

I think it is a question of sematics. Object-Orientation ties down semantics in interface classes and other tightly-defined constructs. This leads to problems both with evolvability and with applicability.

Evolvability

Tightly-defined interface classes support efficient programming models well, but this seems to have been at the cost of evolvability. Both HTTP and HTML have must-ignore semantics attached whenever software fails to understand headers, tags, or values. This means that new semantics can be introduced without breaking backwards-compatability, so long as you aren't relying on those semantics being understood by absolutely everone. In terms of Object-Orientation this is like allowing an interface class to have new methods added and called without breaking binary compatability. The use of XML gives developers a tool to help take this experience on board and apply it to their own software, but there is a bigger picture. XML has not been particularly successful on ther Internet yet, either. To see success we must look at that web browser.

Applicability

A web browser turns our interface class inside out. Instead of communicating application semantics it is based on semantics with a much wider applicability: presentation. The web became successful because an evolvable uniform interface has been available to transport presentation semantics that are good enough to conduct trade and transfer information between a machine and a human.

Looking at the early web it might be reasonable to conclude that general semantics need to be in the form of presentation. This could be modelled as a baseclass with a single method: "renderToUser(Document d)". However, this early concept as started to evolve in curious ways. The semantic xhtml movement has started to hit its mark. The "strict" versions of html 4.1 and xhtml 1.0 shun any kind of presentation markup. Instead, they focus on the structure of a html document and leave presentation details to css. This has benefits for a range of users. Speech synthesizer software is likely to be less confused when it sees a semantic html document, improving accessability. Devices with limited graphical capability may alter their rendering tehcniques. Search engines may also find the site easier to navigate and process.

We can see in the web that presentation semantics are widely applicable, and this contributes to the success of the web. To see widely applicable non-presentation semantics we have to move above the level of simple semantic xhtml into the world of microformats or outside of the html world completely. We already see widely applicable semantics emerging out of formats like rss and atom. They move beyond pure presentation and into usefully specific and generaly-applicable semantics. This allows for innovative uses such as podcasting.

Worlds apart

The semantics of html or atom and the semantics of your nearest Object-Oriented interface class are light years apart from each other, but I think if we can all learn each other's lessons we'll end up somewhere in the middle together. On one hand we have children who grew up in an Object-Orientated mindset. These folk start from a point of rigidly-defined application-specific semantics and try to make their interface classes widely applicable enough to be useful for new things. On the other side we have children who group up in the mindset of the web. They are starting from point of widely applicable and general tools and try to make their data formats semantically rich enough to be useful for new things. Those on our left created SOAP. Those on our right created microformats. Somewhere in the middle we have the old school RDF semantic web folk. These guys created a model of semantics and haven't really managed to take things any further. I think this is because they solve neither the application-specific semantics problems nor ther generally-applicable presentation problems. Without a foothold in either camp they can act as independent umpire, but have yet to really make their own mark.

Conclusion

It looks like the dream of a semantic web is a long way off. It isn't because building a mathematical model of knowledge is insovable. Good inroads have been made by semwebbers. It's just that it isn't useful in and of itself, at least not today. The things that are useful are the two extremes of web browsers and of tightly-coupled object-oriented programming models. Both are proven, but neither defines a semantic web. The trouble is the dual goals of having general semantics and useful semantics are usually at odds with each other. The places that these goals meet are not in ivory tower owl models, but in real application domains. Without a problem to solve there can be no useful semantics. Without a problem that many people face there can be no general semantics. Over the next ten years building the semantic web will be a process of finding widely-applicable problems and solving them. It will require legwork more than the development of philosophy, and people will need be in the loop for most problem domains. True machine learning and useful machine to machine interaction are still the domain of Artificial Intelligence research and won't come into being until we have convincingly solved the human problems first.

Benjamin