Sound advice - blog

Tales from the homeworld

My current feeds

Fri, 2004-May-21

Avoiding Data Islands

I've been working on the models of accounting for useful things in a future accounting system. I'm pretty happy with my understandings of most basic accounting functions, but am still a little unclear on handling of multiple commodities and the like. On the whole things are progressing well, usually during my bouts of insomnia at around 12:30.

I still feel like my biggest problem is coming up with an acccessable technology base.

I'm comfortable with the notion of quite a simple accounting data model of transactions accounts. Each transaction lists a number of entries and each entry lists the identifier of the account it affects. What I'm not comfortable with is how to selectively expose this model to applications, generally. What API should be provided? What kind of query and update language should be used. How can the data in this island be combined with data from other islands?

Again, I'm still trying to work out the details of this. If any accountant-type readers are tuning in right now I'd love to hear your advice on anything I might be getting a little wrong. I think the following is a clear case of wanting to bring data together from different data mines:

Say I own some shares. GAAP requires that I report the value of these shares at the "lesser of cost and maket value". I can account for shares as I would inventory, that is to say in australian dollars at cost basis instead of as share counts. That provides the "cost" part of my query, but if I want to combine this information with current market value to fill out my report I have to know the following:

  1. The number of units in my posession, and
  2. the current market value of those units

Suddenly I have to know about a lot more than that which lives in my general ledger, and I need a general interface to query the information for the generation of reports. It would also be useful to have that information stored in such a way as the backup operations I would apply to my accounting information also cover that other information I might run reports over.

I might want to run less directed queries. I might want to compare the share price of a company with the rainfall statistics that affect that business. I might want to pull in the data of my purchases and sales of the stock and compare my profit or loss to the profit I might have made in an alternative scenareo.

My feeling of how something like this must work is as follows:

Many of the objectives I have appear to be best met by some XML technologies. Others appear to be best met by existing relational database technologies.

As I mentioned earlier, I'm having real trouble trying to find a technology base that's really applicable. Essentially I'm in the market for a transplantable platform that covers all major data handling functions in a cross-platform, beautifully-integrated manner

I heard a cute quote a while back. Just long enough ago that I don't recall where I saw it or the attributation, but the quote itself was as follows: "XML is like violence. If it doesn't solve your problem, you're not using enough of it". I kind of feel that way. I really like where the XML world is heading in many ways, but in terms of data management (as opposed to data exchange) XML still appears to be in a confused place. At the same time traditional database technoligies are looking outmoded and unagile. I think its a question of unsolved problems.

AJ discussed loss of diversification due to competition in this blog entry. If you follow it through to the "see more" part of his post he discusses the fact that competition doesn't seem to have killed off the various email servers of the internet. We essentially have a "big four". AJ refers to email being somewhat of a solved problem where competition is not really required anymore.

I'm of a mind to think that there's always a money angle. Whereas I think AJ is leaning towards technical issues when he talks about solved problems, I would lean towards the economic issues. Mail servers don't suffer a lot of compeition because they've already reached a price point where they're a commodity. The fact that none can gain an effective foothold over the others on a technical basis maintains the commodity status. I think the fact that several offerings are free software helps contribute to the commoditisation of the solutions and therefore the continuing diversity of choice.

Five years ago it looked like data management was a solved problem, too. Relational databases were and still are king, and back then it looked like they would stay king. The XML hype has put question marks over everything. XML has become the standard way of doing data interchange, so the data storage has to become more and more XML friendly. At the same time we've also been transitioning from a world of big backend monolithic databases to a world of loosely-coupled, distributed data and data more closely tied to an individual user and their desktop than to the machine. We want to carry more data around with us so we can look at our data at work as easily as we can at home. We want to be able to look at it again while we're on the train.

I think that some form of XML technology will eventually be involved with filling the gap between what the big databases currently provide and what we actually need. We've had several attempts to fill it so far. sqlite is awesome for little things but it's still hard to get the data in and out. Web services are starting to lean away from the big iron and onto the desktop, especially with Longhorn's Indigo offerings coming in a few years. Actually, I suspect that the only technology we'll still be betting our businesses on in five years time in the data management arena will be XPath, which has already survived quite a few major changes of hosting environment. XPath has even been implimented in silicon. It's really hard to pick what's going to happen above that level. Will XQuery really pick up? Will it be superceeded by something more geared towards querying and collating results from multiple web services? Again, I don't really know.

In the end I think the data management world has some catching up to do before it can fill the new niches and still claim to be mature technology. What we do now will influence that process. As for my usage, I'm still undecided but I'm watching the stars and the blogs and the news for signs that a uniform approach is starting to emerge.