Sound advice - blog

Tales from the homeworld

My current feeds

Mon, 2004-Jun-14

Objects and Databases

In information technology there are the two main archetypes. A software engineer works in programs, applications, procedures, objects. The database stream works in tables, forms, reports, data. The two are worlds apart.

Objects are a means of data hiding. They behave in concrete and reliable ways because they themselves define all the operations that can be performed on them and how those operations affect their state. Their data is hidden. It may reflect what is held within the object, but there is no need for it to be so.

A schema in the software world is the set of objects and their methods. If an object is transmitted across a network or stored in a file a memento must be created in the object's place to be activated, read, or otherwise interpreted to construct the equivalent object in the target process. The memento itself can't be understood by anything but the receiving object in that target process.

Well, that's not true. A memento can be anything. It could be a hard binary representation subject to change whenever an object is modified. It may be an XML representation that can at least be partially understood... but still it can't reconstruct that object on the other side.

Databases are a weird and woolly world, where it is not the behaviour that is tightly controlled by the representation. Instead of defining a set of operations that can be performed on an object they define the object in terms of its internal data. The operations are basic, standard, and implicitly allowed. They allow modification to any part of the data. This fairly primitive model of operations is suffered because the main purpose of database is not to maintain the integrity of the objects it represents, but to allow those objects to be pulled apart, rearranged, and reassembled at will during the advanced query process.

SQL queries are a declaritive mechanism for defining how source data from tables should be transformed into target tables for reporting. These new tables are essentially new objects, so the whole database can be thought of as an engine for collecting and transforming objects into new object sets. It's an essential and well-performed service.

The problem with SQL databases is that they don't deal so well with the things that the software world has learned regarding objects over the years. Because the data is so exposed, database folk get used to the idea and it seems only small steps have been taken towards object-orientation. Stored procedures are the "standard" way of hiding data from applications that want to modify it behind standard operations. This, in effect, makes the entire database an object with metohds.

Views are an attempt to hide data from the other side. A number of "standard" object sets are created to do queries against, while the actual data remains hidden. The underlying data can thus change without harming the queries and reports that hinge of it.

Is there a way to make databases more like object models, while retaining the query and reporting capabilities that have made SQL great? I suppose another way of looking at the question is to ask: Is there a way of making object models more like databases, without sacrificing the data encapsulation capabilities we hold so dear?

It seems what we need to do is define an database that contains real objects. Those objects define a set of operations that can be performed on each, and present a number of "properties" that SET and UPDATE operations can be performed against. Further properties, or the same ones, would support SELECT statements. The objects themselves would be able to reject certain updates with an error or perform actions on other objects to achieve the SET operations.

One spanner-in-the-works of this whole way of thinking is that in databases, objects are essentially either rows or are the entire database. In software, objects are often complex beasts involving a number of sub-objects. These kinds of objects can be represented in a database, but don't match their natural form when doing so. The (understandable) obsesson with tables in database technologies means that complex objects are difficult to query over and to extract.

To get to the point, if you have a database containing accounting objects you need at least three classes. You need a transaction, a set of transaction entries, and an account for each transaction entry to reference. Certain constraints must be present on these objects, for example "at the end of each update the total credit transaction entries must equal the total debit transaction entries for each transaction instance". In essence, a transaction and its entries behave as one complicated object, but must be represented for the database as several objects. This has its advantages, making it easy to pull out transaction entries associated with individual accounts. It has its disadvantages, making it difficult to know when to apply this test. It can't be done each time an insert, erase, or update is issued to a transaction entry. Perhaps before each commit? But if so, which transactions do we run the check on? It's infeasible to check all transactions.

Perhaps I'd better just give in and start thinking relational. I suppose with triggers I could just maintain a dirty list, checked and cleared before each transaction completes.

P.S. On another note, I apologise for the untruncated nature of my blog entries. I've done some experimentation with the "seemore" plugin, but it doesn't appear to work for statically rendered blogs such as this one. Perhaps some knowledgeable person will take pity and refer me to one that does work on static pages :)