Sound advice - blog

Tales from the homeworld

My current feeds

Sat, 2004-Oct-23

I think exceptions in Java a still at least a bit evil

Again, I think it comes to the number of paths through a piece of code. Adrian points out that Java uses garbage collection (a practice I once considered at least a bit dodgy for reasons I won't go into here, but have warmed to somewhat since using it in python), and that garbage collection makes things much simpler in Java than they would be in C++. I have to agree. A consistent memory management model across all the software written in a specific language is a huge step forward over the C++ "generally rely on symmetry but where you can't do that: hack up some kind of reference-counting system...".

After his analysis I'm left feeling that exceptions are at worst half as evil than those of C++ due to that consistent memory management. Leaving that aside, though, we have the crux of my point. Consider the try {} finally {} block. C++ has a similar mechniam (that requires more coding to make work) called a monitor. You instantiate an object which has the code from your finally {} block in its destructor. It's guaranteed to be destroyed as the stack unwinds, unlike dynamically-allocated objects.

Unfortunately, in both C++ and Java when you arrive in the finally {} block you don't know exactly how you got there. Did you commit that transaction, or didn't you? Can we consider the function to have executed sucessfully? Did the exception come from a method you expected to raise the exception, or was it something that (on the surface) looked innocuous? These are all issues that you have to consider when using function return codes to convey success (or otherwise) of operations, but with an exception your thread of control jumps. The code that triggered the exception does not (may not) know where it has jumped to, and the code that catches the exception does not (may not) know where it came from. Neither piece of code may know why the exception was raised.

So what do I do, instead?

The main thing I do is to minimise error conditions by making things preconditions instead. This removes the need for both exception handling and return code handling. Instead, the calling class must either guarantee or test the precondition (using a method alongside the original one) before calling the function. Code that fails to meet this criteria effectively gets the same treatment as it would if an unhandled exception were thrown. A stack trace is dumped into the log file, and the process terminates. I work in fault-tolerant software. A backup instance of the process takes over and hopefully does not trigger the same code-path. If it does though, it's safer to fail than to continue operating with a known malfunction (I work in safety-related software).

The general pattern I use to make this a viable alternative to exception and return code handling, though is to classify my classes into two sets. One set deals with external stimulus, such as user interaction and data from other sources. It is responsible for either cleaning or rejecting the data and must have error handling built into it. Once data has passed through that set objects no longer handle errors. Any error past that point is a bug in my program, not the other guy's. A bug in my program must terminate my program.

Since most softare is internal to the program, most software is not exposed to error handling either by the exception or return code mechanisms. A small number of classes do have to have an error handling model, and for that set I continue to use return codes as the primary mechanism. I do use exceptions, particularly where there is some kind of deeply-nested factory object hierarchy and faults are detected in the bowels of it. I do so sparingly.

I'm the kind of person who likes to think in very simple terms about his software. A code path must be recognisable as a code path, and handling things by return code makes that feasible. Exceptions add more code paths, ones that don't go through the normal decision or looping structures of a language. Without that visual cue that a particular code path exists, and without a way to minimise the number of paths through a particular piece of code, I'm extremely uncomfortable. Code should be able to be determined correct by inspection, but the human mind can only deal with so many conditions and branches at once. Exceptions put a possible branch on every line of code, and that is why I consider them evil.

Benjamin

Sun, 2004-Oct-17

It's the poor code in the middle that gets hurt

Adrian Sutton argues that exceptions are not in fact harmful but helpful. I don't know about you, but I'm a stubborn bastard who needs to be right all the time. I've picked a fight, and I plan to win it ;)

Adrian is almost right in his assertion that

Checking return codes adds exactly the same amount of complexity as handling exceptions does - it's one extra branch for every different response to the return code.

but gives the game away with with this comment:

I'd move the exception logic up a little higher by throwing it from this method and catching it above somewhere - where depends on application design and what action will be taken in response to each error.

He's right that exceptions add no more complexity where they are thrown or where they are finally dealt with. It's the code in-between that gets hurt.

It's the code in-between that suddenly has code-paths that can trigger on any line of code. It's the code in-between that has to be written defensively according to an arms treaty that it did not sign and for which it is not aware of the text. It is the code in-between that suffers and pays.

This article is what got many of us so paranoid about exception handling. It is referenced in this boost article supportive of the use of exceptions under the "Myths and Superstitions" section but which doesn't address my own central point of increased number of code paths. Interestingly, in its example showing that exceptions don't make it more difficult to reason about a program's behaviour they cite a function that uses multiple return statements and replace it with exceptions. Both are smelly in my books.

Code should be simple. Branches should be symmetrical. Loops and functions should have one a single point of return. If you break these rules already then exceptions might be for you.

Personally, they form a significant part of my coding practice. I take very seriously the advice attributed to Brian W. Kernighan:

Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.

Adrian also picks up my final points about how to program with the most straightforward error checking, and perhaps I should have worded my final paragraphs more clearly to avoid his confusion. I don't like threads (that share data). They do cause similar code path blowouts and comprehensibility problems as do exceptions unless approached with straightforward discipline (such as a discipline of message-passing-only between threads). Dealing with the operating system can't be written off so easily, and my note towards the end of my original post was meant to convey my sentiment of "exceptions are no good elsewhere, and if the only place you can still argue their valid use is when dealing with the operating system... well... get a life" :) Most software is far more internally-complex than it is complex along its operating system boundary. If you want to use exceptions there, feel free. Just don't throw them through other classes. I personally think that exceptions offer no better alternative to return code checking in that limited environment.

Benjamin

Sat, 2004-Oct-16

Exceptions are evil

Today's entry in my series of "programming is evil" topics, we have exceptions. Yup. They're evil.

Good code minimises branches. Particularly, it minimises branches that can't be exercised easily with testing. Testability is a good indicator of clean design. If you can't get prod an object sufficiently with its public interface to get decent coverage of its internal logic, changes are you need to break up your class or do something similarly drastic. Good design is testable (despite testable design not always being good).

Enter exceptions. Exceptions would not be the complete vilest of sins ever inflicted on computer software except for the terrible error some programmers make of catching them. It isn't possible to write a class that survives every possible excpetion thrown through its member functions while maintaining consistent internal state. It just isn't. Whenever you catch an exception your data structures are (almost) by definition screwed up. This is the time when you stop executing your program and offer your sincerest apologies to your user.

Unfortunately, that's not always what happens. Some people like to put a "catch all exceptions" statement somewhere near the top of their program's stack. Perhaps it is in the main. Perhaps it is in an event loop. Even if you die after catching that exception you've thrown away the stack trace, the most useful piece of information you had available to you to debug your software.

Worse still, some people catch excpetions and try to keep executing. This guarantees errant behaviour will occur only in subtle and impossible to track ways, and ensures that any time an error becomes visible the cause of it is lost to the sands of time.

The worst thing that catching an exception does, though, is add more paths through your code. An almost infinite number of unexpected code paths sprouts in your rich soil of simplicilty, stunting all possible growth. Pain in the arse exceptions.

To minimise paths through your code, just follow a few simple guidelines:

  1. Make everything you think should be a branch in your code a precondition instead
  2. Assert all your preconditions
  3. Proivde functions that client objects can use to determine ahead of time whether they meet your preconditions

In cases where the state of preconditions can change between test and execute, return some kind of lock object (where possible) to prevent it. Of course multiple threads are evil too, and for many of the same reasons. When dealing with the operating system, just use the damn return codes ;)

Benjamin

Sat, 2004-Oct-16

The evil observer pattern

There are some design patterns that one can take a lifetime to get right. The state pattern is one of these. As you build up state machines with complicated sub-states and resources that are required in some but not all states some serious decisions have to be made about how to manage the detail of the implementation. Do you make your state objects flyweight? How do you represent common state? When and how are state objects created and destroyed? For some time I've thought that the observer pattern was one of these patterns that simply take experience to get right. Lately I've been swinging from that view. I've now formed the view that the observer pattern itself is the problem. Before I start to argue my case, I think we have to look a little closer at what observer is trying to achieve.

The Observer pattern is fairly simple in concept. You start with a client object that is able to navigate to a server object. The client subscribes to the server object using a special inherited interface class that describes the calls that can be made back upon the client. When events occur, client receives the callbacks.

Observer is often talked about in the context of the Model-View-Controller mega-pattern, where the model object tells the controller about changes to its state as they occur. It is also a pattern crucial to the handling of asynchronous communications and actions in many cases, where an object will register with a scheduler to be notified when it can next read or write to its file descriptor. I developed a system in C++ for my current place of employment very much based around this concept. Because each object is essentially an independent entity connected to external stimulus via observer patterns they can be seen to live a live on their own, decoupled nicely.

The problems in the observer pattern start to emerge when you questions like: "When do I deregister from the server object?" and "If this observation represents the state of some entity, should I get a callback immediately?" and "If I am the source of this change event, how I should I update my own state to reflect the change?".

The first question is hairier than you might think. In C++ you start to want to create reference objects to ensure that objects deregister before they are destroyed. Even in a garbage-collected system this pattern starts to cause reference leaks and memory growth. It's a pretty nasty business, really.

The second two questions are even more intriguing. I've settled on a policy of no immediatle callbacks and (where possible) no external notification of a change I've caused. The reasons for this are tied up with objects still being on the stack when they recieve notifications of changes, and also with the nature of how an object wants to update itself often chaning subtly between a purely external event (caused by another object) and one that this object is involved in.

I've begun to favour a much more wholistic approach to the scenarios that I've used observer patterns for in the past. I'm leaning towards something much more like the Erlang model. Virtual concurrency. Message queues. Delayed execution. I've built a little trial model for future work in my professional capacity that I hope to integrate into some of the work already in-place, but the general concept is as follows: We define a straightforward message source and sink pair of baseclasses. We devise a method of connecting sources and sinks. Each stateful object has some sources which provide data to other objects, and some sinks to recieve updates on.

Once connected, the sources are able to recieve updates, but instead of updating their main object, they register their object with a scheduler to ensure that it is eventually reevaluated. In the mean-time it clears an initial flag that can be used to avoid processing uninitilised input, and sets a changed flag that can be used by the object to avoid duplicate processing.

I'm working on making it a little more efficient and straightforward to use in the context of existing framework, but I'm confident that it will feature prominently in future development.

Because of its ties with HMI-related code, this will probably also make an appearance in TransactionSafe. As it is, I developed the ideas at home with TransactionSafe first, before moving the concept to code someone else owns. Since I'll be working more and more on HMI-related code at work, TransactionSafe may become a testbed for a number of concepts I'm going to use elsewhere.

I actually haven't had the chance to do much python-hacking of late due to illness and being completely worn-out. I don't know if I'll get too much more done before my holidays, but I'll try to get a few hours invested each week.

Benjamin

Sun, 2004-Oct-03

Playing with Darcs

I know that the revision control system is one of the essential tools of a programmer, behind only the compiler/interpreter and the debugger, but so far for TransactionSafe I haven't used one. I suppose it is also telling that I haven't used a debugger so far, relying solely on the old fashioned print(f) statement to get to the bottom of my code's behaviour. Anwyay, on the issue of revision control I have been roughly rolling my own. I toyed with rcs while the program was small enough to fit reasonably into one file, but since it broke that boundary I have been rolling one tar file per release, then renaming my working directory to that of my next intended revision.

That is a system that works well with one person (as I remain on this project) but doesn't scale well. The other thing it doesn't handle well is the expansion of a project release into something that encapsulates more than a few small changes. A good revision control system allows you to roll back or review the state of things at a sub-release level. To use the terminology of Telelogic CM Synergy, a good revision control system allows you to keep track of your changes at the task level.

Well, perhaps by now you've worked out why I haven't committed to using a specific tool in this area yet. I use CM Synergy professionally, and am a little set in my ways. I like task-based CM. In this model (again using the CM Synergy terminology), you have a project. The project contains objects from the central repositry, each essentially given a unique object id and a version number. A project version can either be a baseline version of an earlier release, or a moving target that incorporates changes going forward for a future release. Each developer has their own version of the project, which only picks up new version as they require it. Each project only picks up changes based on the developer's own criteria.

The mechanism of change acceptance is the reconfigure. A reconfigure accepts a baseline project with its root directory object, and a list of tasks derived by specific inclusion or by query. Tasks are beasts like projects. They each contain a list of specific object versions. Directories contain a specific list of objects, but does not specify their version. The reconfigure process is simple. Take the top-level directory, and select the latest version of it from the available tasks. Take each of its child objects and choose versions of them using the same algorithm. Recurse until done. Developer project versions can be reconfigured, and build manager project versions can be reconfigured. It's easy as pie to put tasks in and to pull them out. Create a merge task to integrate changes together, and Bob's your uncle.

This simple mechanism, combined with some useful conflicts checking algorithms to check for required merges make CM Synergy the best change management system I've used. It's a pity, really, because the interface sucks and its obvious that no serious money has gone into improving the system for a decade. Since its closed source I can't make use of it for my own work, nor can it be effectively improved without serious consulting money changing hands.

I've been out of the revision control circles for a few years, now, but Anthony Towns' series of entries regarding darcs has piqued my interest. Like CM synergy, every developer gets to work with their own branch version of the project. Like CM synergy, each unit of change can be bundled up into a single object and either applied or backed out of any individual or build manager's project. The main difference between darcs' and CM Synergy's approaches appear to be that while CM Synergy's change unit is a set of specificially selected object versions, darcs change unit is the difference between those object versions and the previous versions.

It is an interesting distinction. On the plus side for darcs, this means you don't have to have explicit well-known version numbering for objects. In fact, it is likely you'll be able to apply the patch to objects not exactly the same as those the patch was originally derived-from. That at least appears to bode well for ad-hoc distributed development. On the other side, I think this probably means that the clever conflicts checking algorithms that CM Synergy uses can't be applied to darcs. It might make it harder to be able to give such guarantees as "All of the object versions in my release have had proper review done on merges". Perhaps there are clever ways to do this that I haven't thought of, yet.

On the whole darcs looks like the revision control system most attuned to my way of thinking about revision control at the moment. I'll use it for the next few TransactionSafe versions and see how I go.

While I'm here, I might as well give an impromptu review of the telelogic products on display:

CM SynergyGood fundamental design, poor interface.
Change SynergyA poor man's bugzilla at a rich man's price. It's easy to get the bugs in, but the query system will never let you find them again. Don't go near it.
Object MakeGive me a break. I got a literal 20x performance improvement out of my build by moving to gnu make. This product has no advantages over free alternatives.
DOORSA crappy requirements tracking system in the midst of no competition. It is probably the best thing out there, but some better QA and some usability studies would go a long way to making this worth the very hefty pricetag.

Fri, 2004-Oct-01

Singletons in Python

As part of my refactoring for the upcoming 0.3 release of TransactionSafe (which I hope to include an actual register to allow actual transactions to be entered) I've made my model classes singletons. Singletons are a pattern I find makes some things simpler, and I hope this is one of those things.

As I'm quite new to python idioms I did some web searching to find the "best" way to do this in Python. My searching had me end up at this page. I took my pick of the implementations available, and ended up using the one at the end of the page from Gary Robinson. His blog entry on the subject can be found here.

Gary was kind enough to release his classes into the public domain, and as is my custom I requested explicit permission for use in my project. As it happens, he was happy to oblige.

As well as the standard singleton pattern, I like to use something I call a "map" singleton under C++. That simply means that instead of one instance existing per class you have one instance per key per class. I've renamed it to "dict" singleton for use in python and adapted Gary's original code to support it. In keeping with the terms of release of his original code I hereby release my "dict" version to the public domain.

Fri, 2004-Oct-01

Less Code Duplication != Better Design

I'm sure that Adrian Sutton didn't mean to imply in this blog entry that less code duplication implied better design, but it was a trigger for me to start writing this rant that has been coming for some time.

Code duplication has obvious problems in software. Code that is duplicated has to be written twice, and maintained twice. Cut-and-paste errors can lead to bugs being duplicated several times but only being fixed once. The same pattern tends to reduce developer checking and reading as they cut-and-paste "known working" code that may not apply as well as they thought to their own situation. Of all the maintainence issues that duplicated code can cause, though, your first thought on seeing duplication or starting to add duplication should not be "I'll put this in a common place". No! Stop! Evil! Bad!

Common code suffers from its own problems. When you share a code snippet without encapsulating in a unit of code that has a common meaning you can cause shears to occur in your code as that meaning or parts of it change. Common code that changes can have unforseen rammifications if its actual usage is not well understood throughout your software. You end up changing the code for one purpose and start needing exceptional conditions so that it also meets your other purposes for it.

Pretty soon, your common-code-for-the-sake-of-common-code starts to take on a life of its own. You can't explain exactly what it is meant to do, because you never really encapuslated that. You can only try to find out what will break if its behaviour changes. As you build and build on past versions of the software cohesion itself disappears.

To my mind, the bad design that comes about from prematurely reusing common-looking code is worse than that of duplicated code. When maintaining code with a lot of duplication you can start to pull together the common threads and concepts that have been spelt out by previous generations. If that code is all tied up in a single method, however, the code may be impossible to save.

When trying to improve your design, don't assume that reducing duplication is an improvement. Make sure your modules have a well defined purpose and role in module relationships first. Work on sharing the concept of the code, rather than the behaviour of the code. If you do this right, code duplication will evaporate by itself. Use the eradication of code duplication as a trigger to reconsider your design, not as a design goal in itself.

In the end, a small amount of code duplication probably means that you've designed things right. Spend enough design effort to keep the duplication low, and the design simplicity high. Total elimination of duplication is probably a sign that you're overdesigning anyway. Code is written to be maintained. It's a process and not a destination. Refactor, refactor, refactor.

Benjamin