AWK Tricks
I think my favorite go-to tool for Unix scripting work is "awk". After all, how often do you want to split a line of text into columns and then do something based on those columns? Pretty darned often.
Here's a trick I use frequently. You've just created a CSV or otherly delimited file, and you want to validate that there's no extraneous characters in there to break your parsing:
awk -F, '{print NF}' file.csv | sort | uniq -c
if all of your fields have the same number of columns (as they likely should), then you'll just see something like
17 1500
and you know you're all set. But if you see:
17 1498
18 2
you know you've got a problem. How to find it? Easily:
awk -F, '{if (NF!=17) print $0;}' file.csv
Done. Then you can go back in and edit those rows to fix them, if you like. Or if you're in a situation where you can just chuck them out, do the opposite:
awk -F, '{if (NF==17) print $0;}' file.csv > cleanfile.csv
And there you go, cleanfile.csv contains only the rows with the proper number of columns.
0 comments:
Post a Comment