Thursday, December 22, 2005

AWK Tricks

I think my favorite go-to tool for Unix scripting work is "awk". After all, how often do you want to split a line of text into columns and then do something based on those columns? Pretty darned often.

Here's a trick I use frequently. You've just created a CSV or otherly delimited file, and you want to validate that there's no extraneous characters in there to break your parsing:

awk -F, '{print NF}' file.csv | sort | uniq -c

if all of your fields have the same number of columns (as they likely should), then you'll just see something like

17 1500

and you know you're all set. But if you see:

17 1498
18 2

you know you've got a problem. How to find it? Easily:

awk -F, '{if (NF!=17) print $0;}' file.csv

Done. Then you can go back in and edit those rows to fix them, if you like. Or if you're in a situation where you can just chuck them out, do the opposite:

awk -F, '{if (NF==17) print $0;}' file.csv > cleanfile.csv

And there you go, cleanfile.csv contains only the rows with the proper number of columns.

Technorati Tags: , ,

No comments: