Sound advice - blog

Tales from the homeworld

My current feeds

Sun, 2005-Jan-09

TransactionSafe turns 0.4

After more than four months of very occasional half-hearted hacking TransactionSafe has reached version 0.4. This version is titled "Basic Register" or alternatively "I was too embarrassed to release yet another version which still didn't support basic transaction entry!".

This version includes a few interesting design points, although they may be obscured by a fairly rushed and stream-of-conciousness flow of python programming. As with previous versions I've based the user Load/Save paradigm around a commit-as-soon-as-you-can model. There is no save button because the file is always saved. Pity there is no undo. Well, that can come later ;)

Another feature is my attempt to use "processes" sort-of-modelled after the Erlang message passing system. Essentially, each object gets told when to execute by an evalaute scheduler. At that time their input is already there (as it has been passsed on by another process) and ready to be processed into output. I think I've been able to achieve a reasonable amount of decoupling between components using this model, although it is still highly-experimental (meaning non-obvious and hard to read).

Third on my list of novelties is the way the program interacts with the sqlite database. The database itself (with some transformations applied) is considered the model in my MVC, and I have walked a path of updating everything using the same inefficient simple algorithm rather than custom-coding update features. You can see this in the diference between the account tree (which is based on the micro-update model) and the register (which is the newer code based on a single update mechanism). The round-trip to the database occurs often (every time a single field is updated), and I even went so far as to use the python difflib to determine which parts of the gtk tree model that makes up my register to update. That is to say, the gtk tree model likes to be updated minimally so that the user is still in the same place they were in the tree before the update. I've taken the raw tree model from disk each time, and passed it through a diff algorithm filter to determine what that minimal update is without the need for custom code for inferring.

This version makes use of triggers for database updates, particularly to maintain certain database invariants such as "all transactions balance". I've waited this long to do it because for a long time I wasn't sure exactly how I was going to enforce that constraint. Now I maintain a Balance account that must always exist as transactionEntryId 0, and is referred to in every transaction at least once (multiple currencies each get their own balance entry). This meshes with the current user interface/database interaction paradigm nicely.

Now that the triggers are in place, it should be possible for other applications to touch the database without too much risk of trashing the underlying conceptual model. This is core to my intentions for this work and is a step up from the gnucash line of a closed file format under a open (but subject-to-change) C api deep inside the gnucash codebase.

This version is still based on a "core" financial data model. As yet it doesn't even include a default currency for accounts, let alone account types that might direct the register to call the Debit and Credit columsn something friendlier. I'm hoping that I'll get around to designing and demonstrating the use of multiple rdf schemas in the same SQL database at some point to allow expansion for various application needs (share price download could use another schema integrated into the same database, for example).

In the end this version is significant because it represents my second motivation point in the development of this software. My current list is:

  1. Actually contribute some open source software, instead of just bitching (complete, v0.1)
  2. Try not to be completely full of hot air by getting transaction entry working (complete, v0.4)
  3. Make this program a usable alternative to Quicken for my own finances (todo)
  4. Establish a small user-base (todo)
  5. Establish a developer community (todo)
  6. Profit! (todo. damn.)

Hopefully the energy required to overcome these peaks of resistence will not be too great :) At present I'm still just a little pissant who can't manage to play with any of the big boys and who has therefore unreasonbly set out on his own. The best I could hope for at present is to fracture an already anemic developer and user home and small business financial software community. <sigh/>


Sat, 2004-Sep-25

TransactionSafe release 0.1

Well, the first revision is out. It's buggy, non-functional, and poorly-designed. Oh, and it doesn't really handle any error or exceptional cases, yet. You've got to start somewhere, I suppose. Currently the application only supports creation of account trees (not even deletion of accounts). You can't enter transaction data or get it back out again in the form of reports. The design is a first-cut pulled together from a lot of examples on the web rather than one that was planned. I also have a bug that I haven't gotten to the bottom of where when you rename an account it sometimes looks like its changing the name of a completely different account.

I decided to go with python for this revision, and have found no reason yet that I will move away from it in future revisions. I've had to learn python (which I'd never programmed in before), and gtk (which I'd never programmed with before), but at least I was familiar with the sqlite interface. Given the hurdles I think I've done ok for a first cut.

I made the choice of python because although it isn't the shortest path between sqlite and gtk, it does appear to be the path of least resistence. I haven't had to even think about creating a "database abstraction layer" because the python database model of returning row objects with attributes already in place to do the relevant things is exactly what I would have wanted to put together anyway. I really like the way python deals with databases.

I intend for my next revision to focus on refactoring the GUI code, which is currently in a terrible state. I might even start to put together a basic register (general ledger). Soon after that point I'll have to go back to my data models and work out how to get predicates from different rdf schemas to play nicely in the same database. At that point the current data model is likely to be replaced by a new one, so don't go using the formats I've specified so far for anything important :)

To those scared off reading my previous documentation revisions by the staroffice format they were released in, fear no longer. I have a pdf in the 0.1 release tarball which I hope will make for somewhat interesting reading, even though I haven't really put the time into it to clean it up or to cover what I'm doing with the python.

Anyway, that about does it. The quick link is here

Sat, 2004-Sep-11

A draft in time for HUMBUG

I haven't recieved any feedback on drafts of my proposal so far, which perhaps isn't surprising given that it's 26 pages long and doesn't even get to talking about design for code or definition of what I'm actually building, yet. Anyway, I thought I'd get another draft (TransactionSafe_Proposal,1.1C.sxw) out before going to HUMBUG for the evening.

This draft tightens up a few things in terms of the schema and conversions. It writes up some requirement of what kind RDF schemas I expect to work with and how they should be translated to sqlite queries. It also includes lot support into the RDF and SQLite queries.

This is the last revision I intend to write before starting to hammer out an API for dealing with the data. I'm still leaning towards doing the things in C++, as old habits die hard... but I'm not really thrilled about any current alternatives (C++ included). Anyway... it's more of the conceptual framework I want to map out. The ideas of how you want to access database data inside a software application. My basic requirements are that you should be able to write sql queries to your heart's content, but also be able to work with and navigate between objects without having to do any string concatenation in the process.

Have fun reading


Sun, 2004-Sep-05

My sourceforge submission

I've put together a project sumbission for sourceforge and if accepted I intend to upload a copy of my current propoal document. Any advice, suggestions, or feedback from Humbug members or other readers of this blog are welcome. It's that feedback I hope a sourceforge presence will elicit.

The project name I've chosen for now is TransactionSafe. It keeps a focus on the practical outcomes I want from this project in terms of a better personal accounting system. It perhaps detracts a little from my grander schemes of univeral data access, but I think I need to remain focused on the single task if anything greater is going to happen anyway.

The proposal document I have is in openoffice format (which is a package I personally feel stinks like the goats, but few free alternatives exist for what I'm trying to do). I have two revisions which roughly correspond to two weekends' musings that can be found at TransactionSafe_Proposal,1.1A.sxw, and TransactionSafe_Proposal,1.1B.sxw respectively. I haven't done much this weekend except a marginal amount of cleanup on the 1.1B version.

The first revision walks a fairly naieve road and tries to essentially reinvent the wheel wherever possible. 1.1B is my first attempt to map my thinking to available technologies and at least attempts to use an rdf schema to describe the data model. I'm not sure that the rdfs is actually correct, as I haven't actively investigated any validation tool alternatives.

I've laid out much of my technical and philisophical thinking in the document and hopefully it is scrubbed up enough that it won't get too many derisive giggles. There are a couple of inconsistencies already emerging with 1.1B retaining the 1.1A section on lots but not actually including them into its data model (I'm still thinking about how best to implement them). I also expect changes to the data model with the addition of things like account types (is it an asset or a liability, for example?) and the obligitory challenges as to what should be left to the chart of accounts if required and what should be dealt with explicitly by the data model. I may also go back to some existing accounting data model definitions to see if I can straighten out some terms and make them more consistent in my own model (xbrl-gl is an obvious source of material for comparision).

I'm hoping that another side-effect of sourceforge exposure will be that I'm pushed to actually producing something concrete (however insignificant) at least once every couple of weekends to get into the swing of this thing. I'm far too prone to dropping things I get bored of if they're not over quickly, and I don't want this itch to go unscratched.

Sun, 2004-Jul-18

Status update

I was asked tonight at humbug about the status of my accounting software concepts and where I thought they were going. To some bemusement of the onlookers I had to say that it hadn't come very far.

I'm coming to terms with a number of concepts I haven't had to be directly exposed to before, and I'm the kind of person who has to understand a bit of philosophy before continuing onto the real deal.

Here is the summary of my thinking so far:

Firstly, the technology stack. It's short. It's sqlite. I want to create a database schema in sqlite that could become a unifying force rather than a device force in free software accounting circles. Ideally, I would build a back-end API for use in both gnucash and kmymoney2 which could be used by disparate apps to extract and enter accounting data. Although lacking in at least one essential feature, sqlite as a database-in-a-file appeals terrifically to me for various reasons. I'm also intimately familiar with much of the sqlite source code, so could modify it as necessary to suit my goals. This might mean a branch for my proposed works, or may be something I can feed back to the author D Hipp.

Secondly, there's the database schema. I'm without a way of running visio or anything I've encountered that I'd consider up-to-scratch for even this diagram, so I'll describe it. You have a transaction entity. You have a transaction entry entity. You have an account entity. You have a commodity entity. The rules are these: A transaction can have multiple entries. An account can have multiple entries. Each entry has a single transaction and account. Each account has a single currency. The final rule is that all the amount of the transaction entries associated with a single transaction must sum to zero for their respective currency. Positive amounts indicate debit. Negative amounts indicate credit.

There are a couple of special fields in the data model. Each transaction has a date. I'm considering also giving either transactions or their entries a lot number. Each transaction entry has a memo field, and a reconciled status. Accounts are a special case unto themselves.

Since SQL doesn't handle graphs very well, I'd really like to be able to use some kind of bridge into rdf data to manage the relationships between accounts. As it is, I plan to create a table with an rdf feel to it. It will have three columns: subject, predicate, and object. The obvious predicate is something like "parent", where the subject is any account and the object is another account. It's a pity I'll still only be able to use sql over the structure (although it might be possible in alternate realities and future worlds to splice an rdf query mechanism in above the sqlite btree layer...).

Now... since the database effectively "is" the API of the backend I'm proposing, we need to consider how changes to the database structure will affect applications. Since I plan for this structure to be used and reused across applications and for ad-hoc query and reporting applications to be written independent of the application that created the accounting data, I need to consider a versioning scheme. Not only that, but I need to consider what other data might live alongside the accounting data, and be able to ensure that my data doesn't interfere with it or vice-versa while still allowing applications to query across the combination.

My current thoughts aren't well-formed on this issue, but wander along the COM versioning ideas. Essentially you never allow something that is different to the earlier version to be called by the same name. You do this by assigning unique ids to things you care about being the same. I haven't prototyped this concept at all, but my thinking is that all table names would be uuids gathered from a program like uuidgen. Here's one: f013170f-b8ff-419f-abb7-81306e2ccbdb. When the structure or semantics of that data changes, I create a new table to hold the new structure and call it 09c14549-3ab0-4517-a052-aba00af2c30d. I probably also create a table with a well-known uuid to map between uuids and sensible names and version numbers for applications that don't care about certain minor revisions. My thinking is that a minor revision would be one that doesn't cause queries that select specific columns to have to be altered, for example adding a new column. A major change would be one where columns changed names or meaning. Any application that inserts data would likely be sensitive to all schema changes.

Migration programs could be written to take advantage of the explicit versioning structure. When the program finds old data it could move it or replicate it into the new form. Additionally, multiple schemas could live alongside each other. In addition to the accounts themselves, a small payroll system might be included or a table to track market value of your shares.

We end up with a database schema that looks something like this:

CREATE TABLE '5ac164f0-78d2-4461-bb6f-12bbb32b39f6'(
        UUID, Source, Name, Major, Minor
INSERT INTO '5ac164f0-78d2-4461-bb6f-12bbb32b39f6' VALUES(
        "5ac164f0-78d2-4461-bb6f-12bbb32b39f6", "", "Schema", 1, 0
CREATE TABLE 'ded2f12d-8fd5-4490-bb9e-3e3b31c46b22'(
        TransactionHeaderId INTEGER PRIMARY KEY,
INSERT INTO '5ac164f0-78d2-4461-bb6f-12bbb32b39f6' VALUES(
        "ded2f12d-8fd5-4490-bb9e-3e3b31c46b22", "", "TransactionHeader", 1, 0
CREATE INDEX '11da5cb6-c03f-43c4-917a-c0f2502c5bc2' ON 'ded2f12d-8fd5-4490-bb9e-3e3b31c46b22' (Date);

CREATE TABLE '7ba6ce04-66d2-4f32-9d40-31f5838d5bd4'(
        TransactionEntryId INTEGER PRIMARY KEY,
INSERT INTO '5ac164f0-78d2-4461-bb6f-12bbb32b39f6' VALUES(
        "7ba6ce04-66d2-4f32-9d40-31f5838d5bd4", "", "TransactionEntry", 1, 0
CREATE INDEX 'c234ebb4-11e5-4b09-9036-32a1486fd5fa' ON '7ba6ce04-66d2-4f32-9d40-31f5838d5bd4' (AccountId);
CREATE TABLE '08fd9a02-1497-4f31-8bcf-dc9d4fed74fd'(
        AccountId INTEGER PRIMARY KEY,
INSERT INTO '5ac164f0-78d2-4461-bb6f-12bbb32b39f6' VALUES(
        "08fd9a02-1497-4f31-8bcf-dc9d4fed74fd", "", "Account", 1, 0
CREATE INDEX '686a4d47-6cd8-48fb-a8ba-e844e13d85a2' ON '08fd9a02-1497-4f31-8bcf-dc9d4fed74fd' (AccountName);
CREATE TABLE '31580110-8eb8-42a1-909a-9aa72cb9534a'(
        CommodityId INTEGER PRIMARY KEY,
INSERT INTO '5ac164f0-78d2-4461-bb6f-12bbb32b39f6' VALUES(
        "31580110-8eb8-42a1-909a-9aa72cb9534a", "", "Commodity", 1, 0
CREATE INDEX 'dfbfa695-dee2-4e61-90a0-2000d72e6e2d' ON '31580110-8eb8-42a1-909a-9aa72cb9534a' (CommodityName);
CREATE TABLE '58426fd9-6b99-4e2c-8f5f-975b5508ae93'(
        Subject, Predicate, Object
INSERT INTO '5ac164f0-78d2-4461-bb6f-12bbb32b39f6' VALUES(
        "58426fd9-6b99-4e2c-8f5f-975b5508ae93", "", "Relationships", 1, 0
CREATE INDEX 'd08d9b0a-3391-4e60-84c7-b8af312b1ad7' ON '58426fd9-6b99-4e2c-8f5f-975b5508ae93' (Subject, Predicate);
CREATE INDEX '151b627b-af70-43df-959d-9dd43301f6e7' ON '58426fd9-6b99-4e2c-8f5f-975b5508ae93' (Object, Predicate);

An obvious flaw with this as a database model for the time-being is that transactions are not checked to ensure they sum to zero. I'm going to have to think some more about that

Now, let's see an example reporting application:

sqlite foo.sqlite "SELECT Amount FROM $Account JOIN $TransactionEntry USING (AccountId);"

It's just like a bought one.

That's where I'm at. No further. I'm still prototyping in shell scripts with sqlite. Perhaps I'll get some more done soonish, but no more tonight.

Sun, 2004-May-23

Accounting for Commodities

I'm gathering confidence that my understanding of international accounting or accounting for commodities or for inventory is the correct one. This kind of accounting is where you carry shares, foreign currency, bicycles, whatever that can't be described strictly in the (australian) dollar sense. Nonetheless, your accounts must track their value in order to be correct.

I've been using Accounting 3[1] as my main reference tome. It has sections on inventory and international business, but always talks about them in dollar terms. This is clearly correct, but is not the whole picture. The book explains how when you purchase inventory at a particular cost price all the costs of that inventory go into your inventory asset account. When you sell the item you take the cost of the item out of your inventory, and record the income event in the same transaction:

Inventory purchase, 5 books

Sales Revenue$150CR
Cost of Sales$100DR
Inventory sale, 5 books

This is actually really simple. You spend some money to get something, but it's actually going to bring in some income pretty soon. You don't want to record the expense event until you can match it to the income event. It brings some balance to your chaotic accounting system. It can get more complicated when you don't sell all your books at once, and it can also get more complicated when you buy some more books before you've finished selling the old ones. You might buy and sell some of the books at different prices, so it can be tricky to keep track of things.

One way of keeping track is to take every indivdual book and record its indivdual cost and sale price. That can get overly weary, so mostly you take your idential books and put them in a single pool. You use some standard methods for guessing the actual cost of each single book you sell, based on how many books are left and how much the whole lot cost you. You can see what's missing, though.

We haven't actually recorded the number of books we have on hand in a machine-readable way!

This brings us to the crux of this issue. You've got a complete, self-consistent set of accounts in aussie dollars which track the cost value of the books you own. They don't track the number of books, and can't tell you anything about the books themselves. What you needs to do is to keep another set of records.

This second record set looks very much like a set of accounts, but it tracks a different kind of commodity. In this case your identical set of books. I see this again as a self-contained accounting system:

Inventory5 books DR
Books recieved5 books CR
Inventory purchase, 5 books

Books sold5 books DR
Inventory5 books CR
Inventory sale, 5 books

Notice the parallels with the "real money" transactions. So we now have two sets of accounts that track the same thing, but in different quantities. As in this example you will often have the two amounts vary at the same time. On the other hand, they may vary independently. You might be given 5 books for free. This doesn't change the cost basis of your (now) 10 books. It does change the amount you have on hand, though.

I think this concept of a separate set of accounts stands up across all kinds of commodities, and even stacks up in a system of barter where you buy commodities with other commodities... or you buy US stocks with US dollars. Every commodity you carry has its own set of accounts to deal with them.

I see real deficiencies in the way gnucash deals with multiple currencies. Under its current system, a transaction involving multiple currencies has a "home" currency. All components of the transaction must have home currency equivalent value attached to them and no checking is done to ensure that the foreign currency amounts actually balance. I think what will be needed in the future accounting system is to allow a single transaction to affect accounts in several currencies, and ensure that each currency's total balances correctly.

The other impact this all has is at query time. Queries can only really occur over transaction entries in the same set of accounting records. You can't compare apples and oranges (except perhaps in terms of taste, size, tangyness, and texture). So you can see how your net worth is doing based on the cost value of your shares... or you could see how many shares you have. You might even pull in current market data to do that apples and oranges comparison and try to see how much you would be worth if you sold your shares. In the end, though. It is the accounts that I'm trying to defined. They're the sacred... uhh... apple. The accounts themselves will be consistent for each currency represented.

[1] From inside the front cover:

3rd ed.
Includes index.
ISBN 0 7248 0500 1.

1. Accounting. I. Horngren, Charles T., 1926-.