Sound advice - blog

Tales from the homeworld

My current feeds

Sun, 2006-Jan-29

Launch of the Efficient Software Project Plan, and Call for Participation

Free Software is Efficient Software, but Efficient Software won't come into being overnight. The Efficient Software Initiative has hit another (small) milestone, in releasing its first project plan:

  1. Conceive of idea
  2. Promote idea via blogs and other viral media (we are here)
  3. Build a base of discussion between individuals to agree on the right models for putting money into open source, including solid discussions on whether it is necessary or even useful
  4. Build a base of three or more people across disciplines to begin developing concrete policy, business plans, and site mockups. Continue promoting and building awareness until at least one slashdot effect is noted.
  5. Use wider promotion to get feedback on initial discussions and established policies. Learn which parts are defensible and which are not. Adjust as required, and begin building a core implementation group. This group will have access to sufficient funds between them to make the business a reality, and will likely have strong overlap with earlier core team.
  6. Launch business based on developed plans

This plan is likely to require at least twelve months of concerted attention, and will likely strech over several years if successful. The ultimate goal is to build not only a business, but a business model that can be replicated into an industry. The goal is to fund open source development and directly demonstrate the cost differential to closed source software. If you have any comments or feedback on the plan, please let us know via the mailing list.

With it comes a call for participation on the project front page:

You can help us get off the ground!

Do you have a background in law?
Help us understand the Legal Implications of the Business
Do you have a background in business or accounting?
Help us understand the Profit Implications of the Business
Do you have a background in website design?
Help us develop possible website designs (this will help everyone stay grounded!)
Do you have a background in running large websites?
Help us understand how to set everything up
Do you have roots in open source communities?
Talk about Efficient Software! Get feedback! Bring people on board with us, and help us come on board with them

So, why should you get involved? Who am I, and why should you trust me to get a project like this off the ground? Well, I am not asking anyone to trust me at this stage. We are in the discussion stages only. The next stage is to develop a core group of people to push things forward. Perhaps that group will include you. If you really want to know about me, well... here I am:

I have lived in Brisbane, Queensland for most of my life now. I graduated from the University of Queensland with a computer science honours degree back in 1998. Since the time of my course I have been a involved in HUMBUG. My open source credentials are not strong. I have done a few bits here and there at the edges of various communities. I was involved with the sqlite community for a time and have a technical editor credit on the book of the same name. I have recently been involved with the microformats community, particularly with recent hAtom developments. I'm a RESTafarian. I have used debian linux as my sole desktop install for at least five years, and saw the a.out to ELF transition. I'm a gnome user and advocate. I'm something of a GPL biggot, or at least it is still my default license. I work in control systems, and have spent the last few years trying to "figure out" the web. I'm a protocols and bits and bytes kind of guy. I use the terms "open source" and "free software" interchangably. I'm afraid of license incompatability in the open source world leading to a freedom that is no larger than what we see in closed source software. I'm a father to be, with less than seven weeks to go until maybe I won't have much time to do anything anymore. I'm a full-time sofware developer who does a lot of work in C++. I work more hours at my job than I am contractually obliged to, so often don't have time for other things. I have a failed open source accounting package behind me, where I discovered for the first time that user interfaces are hard and boring and thankless things. That project came out of my involvement with gnucash, which I still think would be easier to replace than fix. I think python needs static checking because the most common bug in computer software is not the off-by-one error, but the simple typo. I haven't used ruby on rails because I think that any framework that requires I learn a whole new language needs a few years for the hype to cool down before it gets interesting. I render my blog statically using blosxom, partially because I didn't have the tools at hand for dynamic pages using my old hosting arrangements, but partially because I'm more comfortable authoring from behind my firewall and not allowing submissions from anywhere on the web. I used to be a late sleeper, but have taken the advice of Steve Pavlina and have now gotten up at 6am for the last six weeks straight. With all the extra time I still haven't mowed the lawn. I'm not perfect, and I certainly don't have all the skills required to create a new open source business model on my own. That is why I want this to be a community. I know how effective a community can be at reaching a common goal, and I think that the Efficient Software Initiative represents a goal that many can share.

So sign up for the mailing list. Get into some angry conversations about how open source doesn't need an income stream, or how selling maintenence is the perfect way to make money from open source software. I want to have the discussion about whether the kind of business Efficient Software is trying to build will create a new kind of free software market, or whether it just puts a middleman in the way of an already free market. Let's have the discussions and see where we end up. Efficient Software is something I can get excited about, and I hope it is something you'll be able to get a bit excited about, too.

Benjamin

Sat, 2006-Jan-28

Internet-scale Client Failover

Failover is the process of electing a new piece of hardware to take over the role of a failed piece of hardware (or sometimes software), and the process of bringing everyone on board with the new management structure. Detecting failure and electing a new master are not hard problems. Telling everyone about it is hard. You can attack the problem at various levels. You can have the new master take over the IP of the old and broadcast a new arp reply to take over from the MAC address. You can even have the new master take over the IP and MAC addres of the old. If new and old are not on the same subnet, you can try to solve the problem through DNS. The trouble with all of these approaches is that while they solve the problem for new clients that may come along, they don't solve the problem for clients with existing cached DNS entries or existing TCP/IP connections.

Imagine you are a client app, and you have sent a HTTP request to the server. The server fails over, and a new piece of hardware is now serving that IP address. You can still ping it. You can still see it in DNS. The problem is, it doesn't know about your TCP/IP connection to it, or the connection's "waiting for HTTP response" state. Until a new TCP/IP packet associated with the connection hits the new server it won't know you are there. Only when that happens and it returns a packet to that effect will the client learn its connection state is not reflected by the server side. Such a packet won't usually be generated until new request data is sent by the client, and often that just won't ever happen.

Under high load conditions clients should wait patiently to avoid putting extra strain on the server. If a client knows that a response will eventually be forthcoming it should be willing to wait for as long as it takes to generate the response. With the possibility of failover, the problem is that a client cannot know whether the server state reflects its own and cannot know whether a response really will be forthcoming or not. How often it must sample the remote state is determined by the desired failover time. In industrial applications the time may be as low as four or two seconds, and sampling must take place at a rate several times as quickly to allow for lost packets. If sampling is not possible the desired failover time represents the maximum time a server has to respond to its clients, plus network latency. Another means must be used to return the results of processing if any single request takes longer. Clients must use the desired failover time as their request timeout.

If you take the short request route, HTTP permits you to return 202 Accepted to indicate a request has been accepted for processing but without indicating success or failure of the request. If this were used as a matter of course, conventions could be set up to return the HTTP response via a request back to a call-back url. Alternatively, the response could be modelled as a resource on the server which is periodically polled by the client until it exhibits a success or failure status. Neither of these approaches is directly supported by today's browser software, however the latter could be performed using a little meta-refresh magic.

You may not have sufficient information at the application level to support sampling at the TCP/IP level. You would need to know the current sequence numbers of the stack in order to generate a packet that would be rejected by the server in an appropriate way. In practice what you need is a closer vantage point. Someone who is close in terms of network topology to both the old and the new master can easily tell when a failover occurs and publish that information for clients to monitor. On the face of it this just moving the problem around, however a specialised service can more easily ensure that it doesn't ever spend a long time responding to requests. This allows us to employ the techniques which rely on quick responses.

Like the state of http subscriptions, the state of http requests must be sampled if a client is to wait indefinately for a response. How long it should wait depends on the client's service guarantees, and has little to do with what the server considers an appropriate timeframe. Nevertheless, the client's demands put hard limits on the profile of behaviour acceptable on the server side. In subscription the server can simply renew whenever a renew is requested of it, and time a subscription out after a long period. It seems that the handling of a simple request/response couples clients and servers together more closely than even a subscription does, because of the hard limits client timeout puts onto the server side.

Benjamin

Mon, 2006-Jan-16

On XML Language Design

Just when I'm starting to think seriously about how to fit a particular data schema into xml, Tim Bray is writing on the same subject. His advice is cheifly, "don't invent a new language... use xhtml, docbook, odf, ubl, or atom". His advice continues with "put more effort into extensibility than everything else".

His column was picked up all over the web, including by Danny Ayers. He dives into discussion about how to build an RDF model, rather than an XML language:

When working with RDF, my current feeling (could be wrong ;-) is that in most cases it’s probably best to initially make up afresh a new representation that matches the domain model as closely as possible(/appropriate). Only then start looking to replacing the new terms with established ones with matching semantics. But don’t see reusing things as more important than getting an (appropriately) accurate model. (Different approaches are likely to be better for different cases, but as a loose guide I think this works.)

I've been following more of the Tim/microformats approach, which is to start with an established model and extend minimally. I think Tim's stated advantages to this approach are compelling, with the increased likelyhood that software you didn't write will understand your input. When your machine interfaces to my machine, I want both to require minimal change in order for one to understand the other. I'm not sure the same advantages are available to an rdf schema that simply borrows terms from established vocabularies. Borrowing predicate terms and semantics is useful, but the most useful overlaps between schemas will be terms for specific subject types and instances.

From Tim,

There are two completely different (and fairly incompatible) ways of thinking about language invention. The first, which I’ll call syntax-centric, focuses on the language itself: what the tags and attributes are, and which can contain which, and what order they have to be in, and (even more important) on the human-readable prose that describes what they mean and what software ought to do with them. The second approach, which I’ll call model-centric, focuses on agreeing on a formal model of the data objects which captures as much of their semantics as possible; then the details of the language should fall out.

I think I fall on Tim's syntax-centric side of the fence. I understand the utility of defining a model as part of language design, however I think this will rarely be the model that software speaking the new language will use internally. I think that any software that actually wants to do anything with documents in your language will transform the data into their own internal representation. Sometimes this will be so that they can support more than one langauge. Liferea understands rss, atom, and a number of other formats. Sometimes it will be related to the way a program maps your data onto it's graphical elements. It may be more useful to refer to a list or map than a graph.

I think a trap one could easily fall into with rdf is to think that the model is important and the syntax is not. This changes a syntax->xml-dom-model->internal-model translation in an app that implements the language to a syntax->xml-dom-model->rdf-graph-model->internal-model translation. With the variety of possible rdf encodings (even just considering the variation allowed for xml) it isn't really possible to parse an xml document based on its rdf schema. It must first be handled by rdf-specific libraries, then transformed. I think that transforming from the lists and maps and hierarchy representation of an XML dom is typically easier than transforming from the graphs and triples representation of an RDF model in code.

From Danny:

This [starting with your own model, then seeing which terms you can exchange for more general ones already defined] is generally the opposite of what Tim suggests for XML languages, but there is a significant difference. Any two (or however many) RDF vocabularies/models/syntaxes can be used together and there will be a common interpretation semantics. Versioning is pretty well built in through schema annotations (esp. with OWL).

There isn’t a standard common interpretation semantics for XML beyond the implied containership structure. The syntax may be mixable (using XML namespaces and/or MustIgnore) but not interpretable in the general case.

Extensibility has to be built into the host language in XML. It should be possible to add extension elements with a defined meaning for anyone who understands both the host language and the extension. I don't think aggregation is an important concept yet for XML, although if Google Base proves useful I may start to revise that view. I think that aggregation is presently still something you do from the perspective of a particular host language or application domain, such as atom or "syndication". From that perspective there is currently little value in common interpretation semantics for XML, as it will only be parsed by software that understands the specific XML semantics.

I have not yet seen a use I consider compelling for mustUnderstand to support extensibility, however I am completely convinced by the need for mustIgnore semantics. I am also convinced that one should start with established technologies and extend them minimally wherever there is a good overlap. While this might not always be possible, I think it will be in a reasonable proportion of cases.

Benjamin

Sun, 2006-Jan-15

The Efficient Software Mailing List

Subject: [es] Free Software is Efficient Software

The Efficient Software initiative is growing slowly, but surely. We now have a wiki <https://efficientsoftware.pbwiki.com/>, an irc channel (#efficientsoftware on irc.freenode.net), and this mailing list. The post address is efficientsoftware at rbach.priv.at, and archives can be found at <https://rbach.priv.at/Lists/Archive/efficientsoftware/>.

We are looking forward to making positive connections with software projects, as well as lawyers, businesspeople, accountants, web site maintainers, and many more. To make this thing a reality we need to form a diverse community with a broad skill set.

The thing that will bring us all together is a desire for more efficient software production and maintanence. We can undercut the current players in the industry, and make a profit doing it. We can turn the weekend free software soldiers into a lean regular army with full time pay. We can match customer needs and pain to a speedy resolution. These are the goals of the Efficient Software Initiative.

Welcome to the community!

To join the mailing list, see the EfficientSoftware Info Page. A big thankyou goes out to Robert Bachmann for offering to host the list.

Benjamin

Thu, 2006-Jan-12

Machine Microformats

I find microformats appealing. They solve the problem of putting data on the web simply without having to create extra files at extra urls and provide extra links to go and find the files. The data is in the same page as the human-readable content you provide. Like HTML itself, micrformats allow you put your own custom data into the gaps between the standard data. They effectively plug a gap in the semantic spectrum between univerally-applicable and individually-applicable terms. I have been working on various data formats during my first week back from annual leave, and the question has occured to me: "How do I create machine-only data that plugs the gap in a similar way?".

It doesn't make sense to use microformats directly in a machine-only environment. They are designed for humans first and machines second. However, it does make some sense to try and learn the lessons of html and microformats. When XML became the way people did machine to machine comms a strange thing happened. Instead of learning from html and other successful sgml applications, we jumped straight into strongly-typed thinking. HTML allows new elements to be added to its schema implicitly with "must-ignore" semantics for anything a parser does not understand. This allows for forwards-compability of data structures. New elements and attributes can be added to represent new things without breaking the models that existing parsers use. Instead of following this example in XML we defined schemas that do not assume must-ignore semantics. We defined namespaces, and schema versions. When we introduce version 3.0 of our schema, we expect existing parsers to discard the data and raise an error. This is the way we're used to doing things in the world of remote procedure calls and baseclasses. In fact, it is the wrong way.

My approach so far has been to think of an xml document as a simple tree. A parser should follow the tree down as far for as it knows how to interpret the data, and should ignore data it does not understand. Following the microformat lead, I'm attempting to reuse terminology from existing standards before inventing my own. The data I've been presenting is time-oriented, so most terms and structure have been borrowed from iCalendar. The general theory is that it should be possible to represent universal (cross-industry, cross-application), general (cross-application), and local (application-specific) data in a freely-mixed way. Where there is a general term that could be used instead of a local term, you use it. Where there is a universal term that could be used instead of a general one, you do. The further left you push things, the more parsers will understand all of your data.

At present, I am also following the microformat lead of not jumping into the world of namespaces. I am still not convinced at this stage that they are beneficial. One possible end-point for this development would be to use no namespace for universal terms, and progressively more precise namespaces for general and local terms. Microformats themselves only deal in universal terms so they should be able to continue to get away without using namespaces.

By allowing universal and local terms to mingle freely it is possible to make use of universal terms wherever they apply. I suppose this has been the vision of rdf all along. In recent years the semantic web seems to have somehow transformed into an attempt to invent a new prolog, but I think a view of the semantic web as a meeting place for universal and local terms is of more immediate use. I think it would be useful to forget about rdf schemas for the most part and just refer to traditional standards documentation such as rfcs when dealing with ontology. I think it would be useful to forget about trying to aggregate rdf data for now, and think about a single format for the data rather than about multiple rdf representations. Perhaps thinking less about the data model rdf provides and thinking more about a meeting of semantic terms would make rdf work for the people it has so far disenfranchised.

Benjamin

Thu, 2006-Jan-05

Efficient Software FAQ

Efficient Software has launched its FAQ, currently still on the main page. From the wiki:

Why start this initiative?

Too much money is being funnelled into a wasteful closed source software industry. Initially it is investors money, but then customers pay and pay. Profits to major software companies are uncompetatively high compared to other industries. We want to funnel money away from the wasteful industry and towards a more productive system of software development. Free software can be developed, forked, changed, and distributed without waiting on unresponsive vendors. Free software is open to levels of competition that cannot be matched by the closed source world. Free software contributors don't have to be on any payroll in order to fix the problems they care about. Free software does not maintain the artificial divide between customers and investors. The people who contribute to the development of a free software project are its customers, and all customers benefit when bugs are fixed or enhacements are carried out.

What do projects have to gain?

Our goal is to increase the money supply to projects. Money is not a necessary or sufficient factor in developing free software, but it cannot hurt. Projects often accept donations from users, but it is unclear how much users should give or what their motiviations are. Efficient Software aims to tie a contribution to services rendered. Whether the services are rendered immediately or a year from now is inconsequential. Efficient Software maintains a money supply that can be tapped by projects when they fix the nominated bugs.

Won't this drive the wrong bugs to be fixed?

Projects will nominate which state a bug has to be in for Efficient Software to accept payment. Bugs whose fix would contradict project goals should never be put into eligable states and will never recieve contributions. One way of thinking about the money involved is as bugzilla votes. The difference is that modern world currencies tend to have low inflation rates and limited supply. There is evidence across a number of fields that when people commit money to a goal they tend to make decisions more carefully, even if the amount is small. If your project's money supply has a wide base, the dollar value next to each bug should be a reasonble estimate of the value to users in getting it fixed. This information system alone could be worth your while becoming involved.

What should projects do with the money?

Efficient Software was conceived around the idea that projects would pay developers for the fixes they contribute through a merit-based mechanism. We have some thoughts about how this could work in practice, but we will need to develop them over time. In the end, projects are required to make their own "opt in" decision with Efficient Software and their own decision about how to distribute the money. This policy will be made available to contributors in case it may affect their investment decisions.

What if a project marks bugs verified just to get a payout?

Projects are free to mark bugs verfied under their own policy guidelines. We do not get involved, except to publish those guidelines to investors alongside other policies. However, beware that any investor who has contributed any amount towards a bug will have their say on whether they believe the resolution was in the overall sense positive, netural, or negative. Cumulative scores will be available to potential investors in case this may affect their investment decision.

Benamin

Tue, 2006-Jan-03

The Efficient Software Initiative

The Efficient Software wiki has been launched. From the wiki:

This wiki is designed to concentrate efforts to create a new software industry. It is an industry that does not rely on delivering parallel services in order to fund software development or on the payment of license fees, but instead yields return as software is developed. It leverages the ability of a project's own bug database to bring customer and developer together. Efficient Software is intended to become a business that helps put money into the model. Its business model will be developed in the open and will be free for anyone to adopt. The product is more important than any one implementation of the business model. Cooperation is good, but the threat of competition is healthy too.

The fundamental goal of the Efficient Software initiative is to increase the money available to free software projects and free software developers. Contributors currently make donations to software projects, but it is unclear how much they should give and what their motivation is beyond "these are nice guys, I like them". Efficient Software is designed to create a market in which customers and projects can participate to meet customer needs and project needs.

Have at it!

Also, I would love to hear from anyone who is prepared to host a mailing list for the initiative.

Update: The wiki now has edit capability, and there is an irc channel. Join #efficientsoftware on irc.freenode.net.

Benjamin