Sound advice - blog

Tales from the homeworld

My current feeds

Sun, 2004-Jul-18

Status update

I was asked tonight at humbug about the status of my accounting software concepts and where I thought they were going. To some bemusement of the onlookers I had to say that it hadn't come very far.

I'm coming to terms with a number of concepts I haven't had to be directly exposed to before, and I'm the kind of person who has to understand a bit of philosophy before continuing onto the real deal.

Here is the summary of my thinking so far:

Firstly, the technology stack. It's short. It's sqlite. I want to create a database schema in sqlite that could become a unifying force rather than a device force in free software accounting circles. Ideally, I would build a back-end API for use in both gnucash and kmymoney2 which could be used by disparate apps to extract and enter accounting data. Although lacking in at least one essential feature, sqlite as a database-in-a-file appeals terrifically to me for various reasons. I'm also intimately familiar with much of the sqlite source code, so could modify it as necessary to suit my goals. This might mean a branch for my proposed works, or may be something I can feed back to the author D Hipp.

Secondly, there's the database schema. I'm without a way of running visio or anything I've encountered that I'd consider up-to-scratch for even this diagram, so I'll describe it. You have a transaction entity. You have a transaction entry entity. You have an account entity. You have a commodity entity. The rules are these: A transaction can have multiple entries. An account can have multiple entries. Each entry has a single transaction and account. Each account has a single currency. The final rule is that all the amount of the transaction entries associated with a single transaction must sum to zero for their respective currency. Positive amounts indicate debit. Negative amounts indicate credit.

There are a couple of special fields in the data model. Each transaction has a date. I'm considering also giving either transactions or their entries a lot number. Each transaction entry has a memo field, and a reconciled status. Accounts are a special case unto themselves.

Since SQL doesn't handle graphs very well, I'd really like to be able to use some kind of bridge into rdf data to manage the relationships between accounts. As it is, I plan to create a table with an rdf feel to it. It will have three columns: subject, predicate, and object. The obvious predicate is something like "parent", where the subject is any account and the object is another account. It's a pity I'll still only be able to use sql over the structure (although it might be possible in alternate realities and future worlds to splice an rdf query mechanism in above the sqlite btree layer...).

Now... since the database effectively "is" the API of the backend I'm proposing, we need to consider how changes to the database structure will affect applications. Since I plan for this structure to be used and reused across applications and for ad-hoc query and reporting applications to be written independent of the application that created the accounting data, I need to consider a versioning scheme. Not only that, but I need to consider what other data might live alongside the accounting data, and be able to ensure that my data doesn't interfere with it or vice-versa while still allowing applications to query across the combination.

My current thoughts aren't well-formed on this issue, but wander along the COM versioning ideas. Essentially you never allow something that is different to the earlier version to be called by the same name. You do this by assigning unique ids to things you care about being the same. I haven't prototyped this concept at all, but my thinking is that all table names would be uuids gathered from a program like uuidgen. Here's one: f013170f-b8ff-419f-abb7-81306e2ccbdb. When the structure or semantics of that data changes, I create a new table to hold the new structure and call it 09c14549-3ab0-4517-a052-aba00af2c30d. I probably also create a table with a well-known uuid to map between uuids and sensible names and version numbers for applications that don't care about certain minor revisions. My thinking is that a minor revision would be one that doesn't cause queries that select specific columns to have to be altered, for example adding a new column. A major change would be one where columns changed names or meaning. Any application that inserts data would likely be sensitive to all schema changes.

Migration programs could be written to take advantage of the explicit versioning structure. When the program finds old data it could move it or replicate it into the new form. Additionally, multiple schemas could live alongside each other. In addition to the accounts themselves, a small payroll system might be included or a table to track market value of your shares.

We end up with a database schema that looks something like this:

BEGIN TRANSACTION;
CREATE TABLE '5ac164f0-78d2-4461-bb6f-12bbb32b39f6'(
        UUID, Source, Name, Major, Minor
        );
INSERT INTO '5ac164f0-78d2-4461-bb6f-12bbb32b39f6' VALUES(
        "5ac164f0-78d2-4461-bb6f-12bbb32b39f6", "https://transactionsafe.org", "Schema", 1, 0
        );
 
CREATE TABLE 'ded2f12d-8fd5-4490-bb9e-3e3b31c46b22'(
        TransactionHeaderId INTEGER PRIMARY KEY,
        Date
        );
INSERT INTO '5ac164f0-78d2-4461-bb6f-12bbb32b39f6' VALUES(
        "ded2f12d-8fd5-4490-bb9e-3e3b31c46b22", "https://transactionsafe.org", "TransactionHeader", 1, 0
        );
CREATE INDEX '11da5cb6-c03f-43c4-917a-c0f2502c5bc2' ON 'ded2f12d-8fd5-4490-bb9e-3e3b31c46b22' (Date);

CREATE TABLE '7ba6ce04-66d2-4f32-9d40-31f5838d5bd4'(
        TransactionEntryId INTEGER PRIMARY KEY,
        TransactionHeaderId,
        Amount,
        AccountId,
        Memo,
        ReconcileStatus
        );
INSERT INTO '5ac164f0-78d2-4461-bb6f-12bbb32b39f6' VALUES(
        "7ba6ce04-66d2-4f32-9d40-31f5838d5bd4", "https://transactionsafe.org", "TransactionEntry", 1, 0
        );
CREATE INDEX 'c234ebb4-11e5-4b09-9036-32a1486fd5fa' ON '7ba6ce04-66d2-4f32-9d40-31f5838d5bd4' (AccountId);
 
CREATE TABLE '08fd9a02-1497-4f31-8bcf-dc9d4fed74fd'(
        AccountId INTEGER PRIMARY KEY,
        AccountName,
        CommodityId
        );
INSERT INTO '5ac164f0-78d2-4461-bb6f-12bbb32b39f6' VALUES(
        "08fd9a02-1497-4f31-8bcf-dc9d4fed74fd", "https://transactionsafe.org", "Account", 1, 0
        );
CREATE INDEX '686a4d47-6cd8-48fb-a8ba-e844e13d85a2' ON '08fd9a02-1497-4f31-8bcf-dc9d4fed74fd' (AccountName);
 
CREATE TABLE '31580110-8eb8-42a1-909a-9aa72cb9534a'(
        CommodityId INTEGER PRIMARY KEY,
        CommodityName,
        Units
        );
INSERT INTO '5ac164f0-78d2-4461-bb6f-12bbb32b39f6' VALUES(
        "31580110-8eb8-42a1-909a-9aa72cb9534a", "https://transactionsafe.org", "Commodity", 1, 0
        );
CREATE INDEX 'dfbfa695-dee2-4e61-90a0-2000d72e6e2d' ON '31580110-8eb8-42a1-909a-9aa72cb9534a' (CommodityName);
 
CREATE TABLE '58426fd9-6b99-4e2c-8f5f-975b5508ae93'(
        Subject, Predicate, Object
        );
INSERT INTO '5ac164f0-78d2-4461-bb6f-12bbb32b39f6' VALUES(
        "58426fd9-6b99-4e2c-8f5f-975b5508ae93", "https://transactionsafe.org", "Relationships", 1, 0
        );
CREATE INDEX 'd08d9b0a-3391-4e60-84c7-b8af312b1ad7' ON '58426fd9-6b99-4e2c-8f5f-975b5508ae93' (Subject, Predicate);
CREATE INDEX '151b627b-af70-43df-959d-9dd43301f6e7' ON '58426fd9-6b99-4e2c-8f5f-975b5508ae93' (Object, Predicate);
END TRANSACTION;

An obvious flaw with this as a database model for the time-being is that transactions are not checked to ensure they sum to zero. I'm going to have to think some more about that

Now, let's see an example reporting application:

Account="'08fd9a02-1497-4f31-8bcf-dc9d4fed74fd'"
TransactionEntry="'7ba6ce04-66d2-4f32-9d40-31f5838d5bd4'"
sqlite foo.sqlite "SELECT Amount FROM $Account JOIN $TransactionEntry USING (AccountId);"

It's just like a bought one.

That's where I'm at. No further. I'm still prototyping in shell scripts with sqlite. Perhaps I'll get some more done soonish, but no more tonight.

Sun, 2004-Jul-04

Thin client, Thick client

A humbug post has kind-of-by-proxy raised the issue for me of thin and thick clients.

After reading that the project mentioned relies on PHP and javascript on a web server for its workings, I began drafting a response asking opinions from humbug members about what platforms they think are worth developing software for in the modern age.

It's not really something anyone can answer. I'm sure that at present there is no good language, api, or environment for software that's supported well across multiple operating systems and hardware configurations for GUI work. Instead of replying to humbug general I thought I'd just vent a little here instead.

The web server thing bugs me at the moment. You can write a little javascript to execute on the client, but basically you're restricted to the capabilities of HTML. Unless you dive into the land of XUL and friends or svg and flash there is not really anyway to create complex, responsive, and engaging interfaces for users to interact with. Worse, it's very difficult to interact with concepts the user themselves are familiar with in a desktop environment:

As soon as you put both a web browser and a web server between your software and your user it's very difficult to accomplish anything that the user can understand. If you save or operate on files they tend to be on the server-side... and what the hell kind of user understands what that is, where it is, and how to back the damn thing up?

You might have guessed that I'm not a fan of thin clients... but that's not exactly true. I think it's fundamentally a problem of layering, and thats where things get complicated and go wrong.

I think you should be able to access your data from anywhere. Preferrably this remote interface looks just like your local one. Additionally, you should be able to take your data anywhere and use it in a disconnected fashion without a lot of setup hassle. Thirdly, I think you need to be able to see where your files are and know how to get them from where they are to where you want them to be.

What starts to develop is not a simple model of client and server, but of data and the ways of getting at it. If we think about it as objects, it's data and the operations you can perform on it. Like a database, the filesystem consists of these funny little objects that have rules about how you can operate on them. Posix defines a whole bunch of interesting rules for dealing with the content of files, while the user-level commands such as cp and mv define high-level operations for operating on the file objects. Other user-level applications such as grep can pull apart data and put it back together in ways that the original developer didn't expect. Just as with databases, its important that everyone have the same perspective as to the content and structure of files and that ad hoc thing can be done with them.

To me, it is the files more than the client programs that need to be operated on remotely. You need to be able to put a file pretty much anywhere, and operate on it in the ways it defines for itself.

To start on the small scale, I think its important that you can see all your data files in your standard data file browser. This means they shouldn't be hidden away in database servers like postgres or mysql. On the other hand, you should be able to do all the normal filesystem operations on them as well so sqlite is a little deficient also. You can't safely copy and sqlite file unless you first get a read lock on exactly the appropriate place. Ideally, you should be able to use cp to copy your database and ensure that neither the source database nor the copied database are corrupt when you're done.

I feel there is a great wilderness here, where we're a step away from having a way of ensuring files are operated on in a consisten manner. It's like every file itself has an API and needs hidden code behind it to ensure appropriate action is always taken. Standard apis like the ones used to copy files or to issue sql queries or greps against them must exist and some fundamental apis every file must implement... but I keep feeling that we need more. Everyone needs to know how to operate on your file both locally and remotely. That inculdes nautilus. That includes your rdf analysis tool.

To my mind the fundamental layer we need to put in place is this architecture of file classification and apis to access the files. Past that, thick and thin clients can abound in whatever ways they are appropriate. I think that in the end thick clients will still win in the end, but they will resemble thin clients so closely in the clarity of definition of their environment that we won't really be able to tell the difference.