Sound advice - blog

Tales from the homeworld

My current feeds

Sat, 2005-Feb-12

XP and Test Reports

Adrian Sutton writes about wanting to use tests as specifications for new work to be done:

You need to plan the set of features to include in a release ahead of time so you wind up with a whole heap of new unit tests which are acting as this specification that at some point in the future you will make pass.  At the moment though, those tests are okay to fail.

It isn't crazy do want to do this. It's part of the eXtreme Programming (XP) model for rapid development. In that model, documentation is thrown away at the end of a unit of work or not produced at all. The focus is on making the code tell the implementation story and making the tests tell the specification story. What the XP model would call what you're trying to do is not unit testing, but acceptance testing:

Customers are responsible for verifying the correctness of the acceptance tests and reviewing test scores to decide which failed tests are of highest priority. Acceptance tests are also used as regression tests prior to a production release.

Implicit in this model is the test score. In my line of work we call this the test report, and it must be produced at least once per release but preferrably once per build. A simple test report might be "all known tests pass". A more complicated one would list the available tests and their pass/fail status.

Adrain continues,

The solution I'm trying out is to create a list of tests that are okay to have fail for whatever reason and a custom ant task that reads that list and the JUnit reports and fails the build if any of the tests that should pass didn't but lets it pass even if some of the to do tests fail.

If you start from the assumption that you'll produce a test report the problem of changes to the set of passed or failed tests can become a configuration management one. If you commit the status of your last test report and perform a diff with the built one during the make process you can break the build on any unexpected behaviour in the test passes and fails. In addition to ensuring only known changes occur to the report it is possible to track (and review) positive and negative impacts on the report numbers. All the developer has to do is check in a new version of the report to acknowledge the effect their changes have had (they're triggered to do this by the build breakage). Reports can be produced one per module (or one per Makefile) for a fine-grained approach. As a bonus you get to see exactly which tests were added/removed/broken/fixed at exactly which time, by whom, and who put their name against acceptance of the changed test report and associated code. You have a complete history.

This approach can also benefit other things that are generated during the build process. Keep a controlled version of your object dump schema and forever after no unexpected or accidental changes will occur to the schema. Keep a controlled version of your last test output and you can put an even finer grain on the usual pass/fail criteria (sometimes it's important to know two lines in your output have swapped positions).

Sat, 2005-Feb-12

On awk for tabualar data processing

My recent post on awk one liners raised a little more contraversy than I had intended.

It was a response to Bradley Marshall's earlier post on how to do the same things in perl, and lead him to respond:

I'm glad you can do it in awk - I never suspected you couldn't... I deal with a few operating systems at work, and they don't always have the same version of awk or sed installed... it was no easier or harder for me to read. Plus, with Perl I get the advantage of moving into more fully fledged scripts if I need to...

Byron Ellacott also piped up, with:

As shown by Brad and Fuzzy, you can do similar things with different tools that often serve similar purposes. So, here's the same one-liners using sed(1) and a bit of bash(1)... (Brad, I know you were just demonstrating an extra feature of perl for those who use it. :)

I knew that also, and perhaps should be been more explicit in describing the subtleties of why I responded in the first place. Firstly, I have a long and only semi-serious association with awk vs perl advocacy. My position was always that perl was filling a gap that didn't exist for my personal requirements. I seem to recall that several early jousts on the subject were with Brad.

To my mind, awk was desperately simple and suited most tabular data processing problems you could throw at it. My devil's advoate position was that anything too complicated to do in awk was also too complicated to do legibly in perl. Clearly the weight of actual perl users made this position shaky (if not untenable) but I stuck to my guns and for the entire time I was at university and for several years later I found no use for perl that couln't be more appropriately implemented in another way.

Perl has advanced over the years, and while I still have no love for perl as a language the existance of CPAN does make perl a real "killer app". Awk, with its lack of even basic "#include" functionality will never stack up to the range of capabilities available to perl. On the other hand, bigger and better solutions are again appearing in other domains such as python, .NET, JVM-based language implementations and the like. I've had to learn small amounts of perl for various reasons over the years (primarily for maintaining perl programs) but I'll still work principally in awk for the things awk is good at.

So, when I saw Brad's post I couldn't resist. The one-liners he presented were absolutely fundamental awk capabilities. They were the exact use case awk was developed for. To present them in perl is like telling a lisp programmer that you need to do highly recursive list handling, so you've chosen python. It's a resonable language choice, especially if you're already a user of that language. It's just that you have to push that language just a little harder to make it happen. It's not what the language was meant to do, it's just something the language can do.

I absoluately understand that you can do those things in other langauges. I sincerely sympathise with Brad's "Awk? Which version of awk was that, again?" problem. I don't believe everyone should be using awk.

On the other hand, if you were looking for a language to do exactly those things in I would be happy to guide you in awk's direction. Given all the alternatives I still maintain that for those exact use-cases awk is the language that is most suitable. As for Brad's "with Perl I get the advantage of moving into more fully fledged scripts" quip, awk is better for writing full-fledged scripts than most people assume. So long as your script fits within the awk use case (tabular data handling) you won't have to bend over backwards to make fairly complicated transformations fly with awk. If you step outside that use-case, for example you want to run a bunch of programs and behave differently based on their return codes... well awk can still do that, but it's no longer what awk is designed for.

Benjamin