Sound advice - blog

Tales from the homeworld

My current feeds

Sat, 2004-Jul-17

Status update

I was asked tonight at humbug about the status of my accounting software concepts and where I thought they were going. To some bemusement of the onlookers I had to say that it hadn't come very far.

I'm coming to terms with a number of concepts I haven't had to be directly exposed to before, and I'm the kind of person who has to understand a bit of philosophy before continuing onto the real deal.

Here is the summary of my thinking so far:

Firstly, the technology stack. It's short. It's sqlite. I want to create a database schema in sqlite that could become a unifying force rather than a device force in free software accounting circles. Ideally, I would build a back-end API for use in both gnucash and kmymoney2 which could be used by disparate apps to extract and enter accounting data. Although lacking in at least one essential feature, sqlite as a database-in-a-file appeals terrifically to me for various reasons. I'm also intimately familiar with much of the sqlite source code, so could modify it as necessary to suit my goals. This might mean a branch for my proposed works, or may be something I can feed back to the author D Hipp.

Secondly, there's the database schema. I'm without a way of running visio or anything I've encountered that I'd consider up-to-scratch for even this diagram, so I'll describe it. You have a transaction entity. You have a transaction entry entity. You have an account entity. You have a commodity entity. The rules are these: A transaction can have multiple entries. An account can have multiple entries. Each entry has a single transaction and account. Each account has a single currency. The final rule is that all the amount of the transaction entries associated with a single transaction must sum to zero for their respective currency. Positive amounts indicate debit. Negative amounts indicate credit.

There are a couple of special fields in the data model. Each transaction has a date. I'm considering also giving either transactions or their entries a lot number. Each transaction entry has a memo field, and a reconciled status. Accounts are a special case unto themselves.

Since SQL doesn't handle graphs very well, I'd really like to be able to use some kind of bridge into rdf data to manage the relationships between accounts. As it is, I plan to create a table with an rdf feel to it. It will have three columns: subject, predicate, and object. The obvious predicate is something like "parent", where the subject is any account and the object is another account. It's a pity I'll still only be able to use sql over the structure (although it might be possible in alternate realities and future worlds to splice an rdf query mechanism in above the sqlite btree layer...).

Now... since the database effectively "is" the API of the backend I'm proposing, we need to consider how changes to the database structure will affect applications. Since I plan for this structure to be used and reused across applications and for ad-hoc query and reporting applications to be written independent of the application that created the accounting data, I need to consider a versioning scheme. Not only that, but I need to consider what other data might live alongside the accounting data, and be able to ensure that my data doesn't interfere with it or vice-versa while still allowing applications to query across the combination.

My current thoughts aren't well-formed on this issue, but wander along the COM versioning ideas. Essentially you never allow something that is different to the earlier version to be called by the same name. You do this by assigning unique ids to things you care about being the same. I haven't prototyped this concept at all, but my thinking is that all table names would be uuids gathered from a program like uuidgen. Here's one: f013170f-b8ff-419f-abb7-81306e2ccbdb. When the structure or semantics of that data changes, I create a new table to hold the new structure and call it 09c14549-3ab0-4517-a052-aba00af2c30d. I probably also create a table with a well-known uuid to map between uuids and sensible names and version numbers for applications that don't care about certain minor revisions. My thinking is that a minor revision would be one that doesn't cause queries that select specific columns to have to be altered, for example adding a new column. A major change would be one where columns changed names or meaning. Any application that inserts data would likely be sensitive to all schema changes.

Migration programs could be written to take advantage of the explicit versioning structure. When the program finds old data it could move it or replicate it into the new form. Additionally, multiple schemas could live alongside each other. In addition to the accounts themselves, a small payroll system might be included or a table to track market value of your shares.

We end up with a database schema that looks something like this:

BEGIN TRANSACTION;
CREATE TABLE '5ac164f0-78d2-4461-bb6f-12bbb32b39f6'(
        UUID, Source, Name, Major, Minor
        );
INSERT INTO '5ac164f0-78d2-4461-bb6f-12bbb32b39f6' VALUES(
        "5ac164f0-78d2-4461-bb6f-12bbb32b39f6", "https://transactionsafe.org", "Schema", 1, 0
        );
 
CREATE TABLE 'ded2f12d-8fd5-4490-bb9e-3e3b31c46b22'(
        TransactionHeaderId INTEGER PRIMARY KEY,
        Date
        );
INSERT INTO '5ac164f0-78d2-4461-bb6f-12bbb32b39f6' VALUES(
        "ded2f12d-8fd5-4490-bb9e-3e3b31c46b22", "https://transactionsafe.org", "TransactionHeader", 1, 0
        );
CREATE INDEX '11da5cb6-c03f-43c4-917a-c0f2502c5bc2' ON 'ded2f12d-8fd5-4490-bb9e-3e3b31c46b22' (Date);

CREATE TABLE '7ba6ce04-66d2-4f32-9d40-31f5838d5bd4'(
        TransactionEntryId INTEGER PRIMARY KEY,
        TransactionHeaderId,
        Amount,
        AccountId,
        Memo,
        ReconcileStatus
        );
INSERT INTO '5ac164f0-78d2-4461-bb6f-12bbb32b39f6' VALUES(
        "7ba6ce04-66d2-4f32-9d40-31f5838d5bd4", "https://transactionsafe.org", "TransactionEntry", 1, 0
        );
CREATE INDEX 'c234ebb4-11e5-4b09-9036-32a1486fd5fa' ON '7ba6ce04-66d2-4f32-9d40-31f5838d5bd4' (AccountId);
 
CREATE TABLE '08fd9a02-1497-4f31-8bcf-dc9d4fed74fd'(
        AccountId INTEGER PRIMARY KEY,
        AccountName,
        CommodityId
        );
INSERT INTO '5ac164f0-78d2-4461-bb6f-12bbb32b39f6' VALUES(
        "08fd9a02-1497-4f31-8bcf-dc9d4fed74fd", "https://transactionsafe.org", "Account", 1, 0
        );
CREATE INDEX '686a4d47-6cd8-48fb-a8ba-e844e13d85a2' ON '08fd9a02-1497-4f31-8bcf-dc9d4fed74fd' (AccountName);
 
CREATE TABLE '31580110-8eb8-42a1-909a-9aa72cb9534a'(
        CommodityId INTEGER PRIMARY KEY,
        CommodityName,
        Units
        );
INSERT INTO '5ac164f0-78d2-4461-bb6f-12bbb32b39f6' VALUES(
        "31580110-8eb8-42a1-909a-9aa72cb9534a", "https://transactionsafe.org", "Commodity", 1, 0
        );
CREATE INDEX 'dfbfa695-dee2-4e61-90a0-2000d72e6e2d' ON '31580110-8eb8-42a1-909a-9aa72cb9534a' (CommodityName);
 
CREATE TABLE '58426fd9-6b99-4e2c-8f5f-975b5508ae93'(
        Subject, Predicate, Object
        );
INSERT INTO '5ac164f0-78d2-4461-bb6f-12bbb32b39f6' VALUES(
        "58426fd9-6b99-4e2c-8f5f-975b5508ae93", "https://transactionsafe.org", "Relationships", 1, 0
        );
CREATE INDEX 'd08d9b0a-3391-4e60-84c7-b8af312b1ad7' ON '58426fd9-6b99-4e2c-8f5f-975b5508ae93' (Subject, Predicate);
CREATE INDEX '151b627b-af70-43df-959d-9dd43301f6e7' ON '58426fd9-6b99-4e2c-8f5f-975b5508ae93' (Object, Predicate);
END TRANSACTION;

An obvious flaw with this as a database model for the time-being is that transactions are not checked to ensure they sum to zero. I'm going to have to think some more about that

Now, let's see an example reporting application:

Account="'08fd9a02-1497-4f31-8bcf-dc9d4fed74fd'"
TransactionEntry="'7ba6ce04-66d2-4f32-9d40-31f5838d5bd4'"
sqlite foo.sqlite "SELECT Amount FROM $Account JOIN $TransactionEntry USING (AccountId);"

It's just like a bought one.

That's where I'm at. No further. I'm still prototyping in shell scripts with sqlite. Perhaps I'll get some more done soonish, but no more tonight.