Sound advice - blog

Tales from the homeworld

My current feeds

Fri, 2005-Feb-11

On awk for tabualar data processing

My recent post on awk one liners raised a little more contraversy than I had intended.

It was a response to Bradley Marshall's earlier post on how to do the same things in perl, and lead him to respond:

I'm glad you can do it in awk - I never suspected you couldn't... I deal with a few operating systems at work, and they don't always have the same version of awk or sed installed... it was no easier or harder for me to read. Plus, with Perl I get the advantage of moving into more fully fledged scripts if I need to...

Byron Ellacott also piped up, with:

As shown by Brad and Fuzzy, you can do similar things with different tools that often serve similar purposes. So, here's the same one-liners using sed(1) and a bit of bash(1)... (Brad, I know you were just demonstrating an extra feature of perl for those who use it. :)

I knew that also, and perhaps should be been more explicit in describing the subtleties of why I responded in the first place. Firstly, I have a long and only semi-serious association with awk vs perl advocacy. My position was always that perl was filling a gap that didn't exist for my personal requirements. I seem to recall that several early jousts on the subject were with Brad.

To my mind, awk was desperately simple and suited most tabular data processing problems you could throw at it. My devil's advoate position was that anything too complicated to do in awk was also too complicated to do legibly in perl. Clearly the weight of actual perl users made this position shaky (if not untenable) but I stuck to my guns and for the entire time I was at university and for several years later I found no use for perl that couln't be more appropriately implemented in another way.

Perl has advanced over the years, and while I still have no love for perl as a language the existance of CPAN does make perl a real "killer app". Awk, with its lack of even basic "#include" functionality will never stack up to the range of capabilities available to perl. On the other hand, bigger and better solutions are again appearing in other domains such as python, .NET, JVM-based language implementations and the like. I've had to learn small amounts of perl for various reasons over the years (primarily for maintaining perl programs) but I'll still work principally in awk for the things awk is good at.

So, when I saw Brad's post I couldn't resist. The one-liners he presented were absolutely fundamental awk capabilities. They were the exact use case awk was developed for. To present them in perl is like telling a lisp programmer that you need to do highly recursive list handling, so you've chosen python. It's a resonable language choice, especially if you're already a user of that language. It's just that you have to push that language just a little harder to make it happen. It's not what the language was meant to do, it's just something the language can do.

I absoluately understand that you can do those things in other langauges. I sincerely sympathise with Brad's "Awk? Which version of awk was that, again?" problem. I don't believe everyone should be using awk.

On the other hand, if you were looking for a language to do exactly those things in I would be happy to guide you in awk's direction. Given all the alternatives I still maintain that for those exact use-cases awk is the language that is most suitable. As for Brad's "with Perl I get the advantage of moving into more fully fledged scripts" quip, awk is better for writing full-fledged scripts than most people assume. So long as your script fits within the awk use case (tabular data handling) you won't have to bend over backwards to make fairly complicated transformations fly with awk. If you step outside that use-case, for example you want to run a bunch of programs and behave differently based on their return codes... well awk can still do that, but it's no longer what awk is designed for.

Benjamin