Sound advice - blog

Tales from the homeworld

My current feeds

Sun, 2004-Jun-27

The objects that live in a database

Hello again. It's that time of the evening when my musings about how databases and objects relate to each other reach a new level.

Let's view the world as it is, but through the eyes of objects. Objects live in different places. Some live inside processes. Others live inside databases. Inside processes we have a wide variety of objects and object types. Inside databases there are two.

Inside a database, the first and most fundamental object is a row. Each row has assoications via foreign keys with other row objects. Each row has a set of operations that can be performed on it, and through triggers or other means a mechansim for reporting failure or causing other operations to occur when operations are performed on them.

The other kind of object is any cohesive set of tables. This set of tables and their rows may have its own rules and supports operations on its constitent row objects

Rows are simple objects of a strict type that essentially cannot be extended. This other kind of object, which I will call a database object, is something a little more fluid. Each set has its own type, but may support a set of operations that overlap with operations applicible to another database object. The overlap could be considered inheritance. A simple database object may be incorporated into a larger set of tables and operations. That could be considered composition.

As we build up the database model as objects, instead of trying to store objects in a database we see a slightly non-traditional view of how databases could be constructed. Traditionally we see a static database schema, because we traditionally see the entire database as a single database object. Perhaps another way to construct things is as follows

If we have a database object of type foo with tables bar and baz, we could create multiple instances of foo in a single database by prefixing the bar and baz table names. foo1.bar and foo1.baz could sit alongside foo2.bar and foo2.baz in the same database. Each could enforce its own rules internal to itself, the larger database object we've now constructed to contain these objects could enforce further rules.

A database object, therefore, can be defined as a set of tables with defined operations and rules plus a set of sub-objects. Lists or structures of sub-objects could be managed also.

An accounting system operates in a single currency on a single accounting entity. Multiple accounting entities could be managed in the same database as instances of a general ledger database object type. Multiple currencies for the same accounting entity, or even multiple gaaps might be managed as different objects within the greater database.

If the basic rules of the accounting model can be encapsulated inside a database object, perhaps more complicated structures can be built around the basic models. An interesting side-effect of this way of thinking about objects within databases is that you can still do SQL joins and queries between parts of different database sub-objects. They can still be made efficient.

Perhaps the biggest problem with this way of thinking about database objects is that SQL doesn't really support it. You can do the following kinds of operations:

... but you can't SELECT * FROM TABLES LIKE foo%.bar, or select these things based on type. You would certainly have to do some logic in the calling application if you were trying to perform "arbitrary" data mining across objects.

Oh, well. Something to think about, anyway.

As to how these object relate to applications...

Proxy objects can be created in an application to talk to database objects. You could begin a transaction, instantiate new database object proxies (or extract them from the database), and them modify and commit them. This would be an easy programmatic interface to work with. If a simple XML file defined the structure of the database objects it would be trivial to write an XSLT to generate code for the proxy objects

Proxy objects could be pulled from the database with select statements and the like. Foreign keys could be represented in the objects as smart pointers that, when dereferenced, would load the next object. Issues you might run into include updating proxy objects when appropriate as the result of triggers and the like (triggers would best be implimented in the object modifying the database in this case), and in dealing with relationships between database sub-objects which might not be modelled particularly well in this concept.

Another issue to consider is one that plauges the qof framework that underlies gnucash. Consistency. Consistency between the objects in memory and the database can't be guaranteed easily when multiple processes are modifying the database concurrently. The simple answer would be "start a transaction when you start pulling objects out of the database, and destroy all proxies when you commit". That would be nice, but you may be locking other users out of critical parts of the database. If you use sqlite you'll be locking them out of the whole database.

The naieve solution is not to load these objects until you are ready to make your change. The problem with this approach is that you typically need at least some data from the database to know what change you want to make in the first place. This leaves you in the uncomfortable position of making objects that are one level above the proxy objects.

These super-proxy objects would essentially check for consistency whenever they're accessed, or perhaps more often and warn the calling objects of possible inconsistency when they do change. This makes the whole system much more complicated than it otherwise might be, and adds one more complication. When you have modified your in-memory proxy objects and want to commit the changes to disk they must first run a final consistency check. Consistency failure means the commit cannot proceed and must be rolled back and possibly reapplied with the fresh data.

Oh, well. Complication always abounds when you try to exert your influcence over objects that someone else owns.

The new sqlite version (3.0) will hopefully alleviate this problem a little. When a writer begins a transaction it won't exclusively lock the database. Instead, it will prepare the transaction in-memory while it keeps a reserved lock. Readers can still read. It's only writers that can't get into the database. Still, it would be a pain in the arse to have to wait for slow-poke-allen to finish entering his transaction before you can even begin editing yours.

One more optimisation could be applied, here. New objects might be able to be created in proxy form, and then applied to the database in a short transaction. Since they essentially have nothing to check consistency against they could be constructed in a way that was essentially guaranteed to commit successfully. This is good news, because in a financial database most transactions are insertions. Only updates would need to lock the database.

The only problem with that approach is when you're relying on the database for some information about the new rows or database objects, for example if you're relying on the database to assign you a unique identifier which you use as a foreign key between objects. Bugger. That's almost always required.

Hrrm... oh well. Things are forming and I think I'm getting to a more concrete place with the philsophy behind this development. Perhaps one day I'll even see the qof "light" and move back to gnucash with a better apprecation of what's already been created there.