Sound advice - blog

Tales from the homeworld

My current feeds

Sun, 2005-May-15

Coding for the compiler

In my last column I wrote about deliberately architecting your code in such a way that the compiler makes your code more likely to be correct. I used an example lifted from Joel Spolksy regarding "safe" and "unsafe" strings. In indicated they should be made different types so the that the compiler is aware of and can verify that the rules associated with assining one to the other or using one in place of the other are followed. I thought I'd bring up a few more examples of coding conventions I follow to use the compiler to best effect.

Don't initialise variables unless you have a real value for them

Junior programmers are often tempted to initialise every variable that they declare, and do so immediately. They do so to avoid messy compiler warnings, or because they think they are making the code cleaner and safer by providing a value where there would otherwise be no value.

That's a bad idea. Whether it be initialising an integer to zero or a pointer to null whenever you give a value to a variable you are lying to your tools. You're giving a value to the variable that no use to your program, but at the same time you are quelling the concerns of both your compiler and any memory analyser you might be using (such as purify).

Don't declare variables until you have a real value for them

This rule answers the question about how you deal with the warnings that would come from not assigning values to your variables. The simple pseudo-code you should always follow when declaring a variable is either one of:

MyType myVariable(myValue);

or

MyType myVariable;
if (condition)
{
	myVariable = myValue1;
}
else
{
	myVariable = myValue2;
}

Any other form is pretty-well flawed. In the first case, the variable is always assigned to a useful value, and it happens immediately. In the second case, every branch must make a concious decision about what to set your variable to. Sometimes variables come in pairs, or other tuples that need to be initialised together. Sometimes your conditions and branching are much more complicated than in this case. By not initalising the varaible up front and initialising in every branch you allow the compiler to find any cases you might have missed. If you set an initial value for the variable before a conditional that may assign values to it you stop the compiler complaining about branches where the variable is not assigned to, and you allow the introduction of bugs through unconsidered branches.

Always declare your else branch

Every "if" has an implicit "else". You need to consider what happens in that case. The variable assignment structure I've laid out forces you to declare your else branch(es) with the content you might otherwise have placed in the variable's declaration. This is a good thing. Considering all branches is a critial part of writing good code, and it is better to be in the habit of doing it than in the habit of not. Even if you don't have content for one side of the branch or the other, place a comment to that effect in the branch. Explain why. This point won't help the compiler find bugs in your code, but it is good to keep in mind.

Only assign to "external" variables once per loop iteration

This is a bit hard to write down as a hard and fast rule, but it is another one of those variable initialisation issues. Sometimes you'll have a loop which changes a variable declared outside of the loop. You'll be tempted to add one or subtract one at various points in the loop, but it's the wrong approach if you want help from the compiler. Instead, declare a variable which embodies the effect on the external variable. Use the same technique as I've outlined above to allow the compiler to check you've considered all cases. At the end of the conditionals that make up the body of your loop, apply the effect by assignment, addition, or whatever operation you require.

string::size_type pos;
do
{
	string::size_type comma(myString.find(',',pos));
	string::size_type newpos;
	if (comma == string::npos)
	{
		// None found
		newpos = string::npos;
	}
	else
	{
		myString[comma] = ';';
		newpos = comma+1;
	}
	pos = newpos;
}
while (pos != string::npos);

On the subject of loops...

Use a single exit point for loops and functions

When you have a single exit point, your code is more symmetrical and that tends to support the treatement of variables and cases that I'm suggesting. What you're looking for is...

  1. To make sure you clearly enumerate and consider every case your function has to deal with, and
  2. To give the compiler every opportunity to spot effects you may not have specified for each case

It is these principles at the core of my general mistrust of exceptions as a general unusual case-handling mechanism. I prefer explicit handling of those cases. I won't go over my position on exceptions again just yet... suffice as to say they're certianly evil under C++ and I'm still forming an opinion as to whether the compiler-support exceptions recieve under Java is sufficient to make them non-evil in that context :). Give me a few more years to make up my mind.

Under C++, I consider the following loop constructs to be useful:

Pre-tested loops

while (condition) { /* body */ }
for (
	iterator it=begin();
	it!=end();
	++it
	) { /* body */ }
for (
	iterator it=begin();
	it!=end() && other-condition;
	++it
	) { /* body */ }

Mid-tested loops

for (;;) { /* body */; if (condition) break; /* body*/ }

Post-testd loops

do { /* body */ } while (condition);
for (;;) { /* body */; if (loop-scoped-condition) break; }

Don't use default branches in your switch statements

This goes back to general principle of making all your cases explicit. When you change the list of conditions (ie. the enumeration behind a switch statement) the compiler will be able to warn you of cases you haven't covered. This isn't the case when you have a default. In those cases you'll have to find a way to search for each switch statement based on the enumeration and assess whether or not to create a new branch for it.

I'm happy for multiple enumeration value to fold down to a single piece of code, for example:

switch (myVariable)
{
case A:
	DoSomething1();
	break;
case B:
	DoSomething2();
	break;
case C:
case D:
case E:
case F:
	// Do nothing!
	break;
}

When you add "G" to the condition the compiler will prompt you with a warning that you need to consider that case for this switch statement.

You have a typing and classing system, so use it!

This is the point of my previous entry. Just because two things have the same bit representation doesn't mean they're compatible. Declare different types for the two and state exactly what operations you can do to each. Don't get caught assigning a 64-bit integer into a 32-bit slot without range checking and explicit code ok-ing the conversion. Wrap each up in a class that stops you doing stupid things with them.

Use class inheritence sparingly

By class inheritence I mean inheritence of class B from class A, where A has some code. In Java this is called extension. It is usually safe to say that class B inherits interface A, but not class A. When you inherit from a class instead of an interface you are aliasing two concepts:

  1. What you can do with it
  2. How it works

Sharing what you can do with an object between multiple classes is the cornerstone of Object-Oriented design. Sharing how classes work is something I don't think we have a syntactically simple solution to, even after all these years. Class inheritence is one idea that requires little typing, but it leads to finding out late in the piece that what you want to do with two classes is the same, but their implementation needs to be more different than you expected. At that point it takes a lot more typing to fix the problem.

So declare interfaces and inherit from interfaces where possible instead of from classes. Contain objects, and delegate your functions to them explicitly ahead of inheriting from ther classes. Sometimes class inheritence will be the right thing, but in my experience that's more like five cases out of every hundred.

I've strayed off the compiler-bug-detection path a little, so I'll finish up with two points.

Use the maximum useful warning settings

They're there for a reason

Treat warnings as errors

There's nothing worse than seeing 100 warnings when you compile a single piece of code, and have to try and judge whether one of them is due to your change or not. Warnings become useless if they are not addressed immediately, and they are your compiler's best tool for telling you you've missed something. Don't think you're smarter than the machine. Don't write clever code. Dumb your code down to a point the compiler really does understand what you're writing. It will help you. It'll will help other coders. Think about it.

Benjamin