-->
Page 578
Using awk, it is possible to quickly create complex reports. It is much easier to perform string comparisons, build arrays on-the-fly, and take advantage of associative arrays than to code in another language (like C). Instead of having to search through an array for a match with a text key, that key can be used as the array subscript.
I have produced reports using awk with three levels of control breaks, multiple sections of reports in the same control break, and multiple totaling pages. The totaling pages were for each level of control break plus a final page; if the control break didn't have a particular type of data, then the total page didn't have it either. If there was only one member of a control break, then the total page for that level wasn't created. (This saved a lot of paper when there was really only one level of control breakthe highest.)
This report ended up being more than 1,000 lines of awk (nawk to be specific) code. It takes a little longer to run than the equivalent C program, but it took a lot less programmer time to create. Because it was easy to create and modify, it was developed using prototypes. The users briefly described what they wanted, and I produced a report. They decided they needed more control breaks, and I added them; then they realized a lot of paper was wasted on total pages, so the report was modified as described.
Being easy to develop incrementally without knowing the final result made it easier and more fun for me. By my being responsive to user changes, the users were made happy!
As mentioned early in this chapter, many systems don't produce data in the desired format. When working with data stored in relational databases, there are two main ways to get data out: Use a query tool with SQL or write a program to get the data from the database and output it in the desired form. SQL query tools have limited formatting ability but can provide quick and easy access to the data.
One technique I have found very useful is to extract the data from the database into a file that is then manipulated by an awk script to produce the exact format required. When required, an awk script can even create the SQL statements used to query the database (specifying the key values for the rows to select).
The following example is used when the query tool places a space before a numeric field that must be removed for program that will use the data in another system (mainframe COBOL):
{ printf("%s%s%-25.25s\n", $1, $2, $3); }
awk automatically removes the field separator (the space character) when splitting the input record into individual fields, and the formatting %s string format specifiers in printf are contiguous (do not have any spaces between them).
Page 579
The ability to pipe the output of a command into another is very powerful because the output from the first becomes the input that the second can manipulate. A frequent use of one-line awk programs is the creation of commands based on a list.
The find command can be used to produce a list of files that match its conditions, or it can execute a single command that takes a single command-line argument. You can see files in a directory (and subdirectories) that match specific conditions with the following:
$ find . -name "*.prn" -print
This outputs
./exam2.prn ./exam1.prn ./exam3.prn
Or you can print the contents of those files with the following:
find . -name "*.prn" -exec lp {} \;
The find command inserts the individual filenames that it locates in place of the {} and executes the lp command. But if you wanted to execute a command that required two arguments (to copy files to a new name) or execute multiple commands at once, you couldn't do it with find alone. You could create a shell script that would accept the single argument and use it in multiple places, or you could create an awk single-line program:
$ find . -name "*.prn" -print | awk `{print "echo bak" $1; Âprint "cp " $1 " " $1".bak";}'
This outputs
echo bak./exam2.prn cp ./exam2.prn ./exam2.prn.bak echo bak./exam1.prn cp ./exam1.prn ./exam1.prn.bak echo bak./exam3.prn cp ./exam3.prn ./exam3.prn.bak
To get the commands to actually execute, you need to pipe the commands into one of the shells. The following example uses the Korn shell; you can use the one you prefer:
$ find . -name "*.prn" -print | awk `{print "echo bak" $1; print "cp " $1 " " $1".bak";}' | ksh
This outputs
bak./exam2.prn bak./exam1.prn bak./exam3.prn
Page 580
Before each copy takes place, the message is shown. This is also handy if you want to search for a string (using the grep command) in the files of multiple subdirectories. Many versions of the grep command don't show the name of the file searched unless you use wildcards (or specify multiple filenames on the command line). The following uses find to search for C source files, awk to create grep commands to look for an error message, and the shell echo command to show the file being searched:
$ find . -name "*.c" -print | awk `{print "echo " $1; print "grep error-message " $1;}' | ksh
The same technique can be used to perform lint checks on source code in a series of subdirectories. I execute the following in a shell script periodically to check all C code:
$ find . -name "*.c" -print | awk `{print "lint " $1 " > " $1".lint"}' | ksh
The lint version on one system prints the code error as a heading line and then the parts of code in question as a list below. grep shows the heading but not the detail lines. The awk script prints all lines from the heading until the first blank line (end of the lint section).
When in doubt, pipe the output into more or pg to view the created commands before you pipe them into a shell for execution.
There is one more built-in function that doesn't fit in the character or numeric categories: system. The system function executes the string passed to it as an argument. This allows you to execute commands or scripts on-the-fly when your awk code has the need.
You can code a report to automatically print to paper when it is complete. The code looks something like Listing 27.8.
Listing 27.8. Using the system function.
BEGIN { pageno = 0; pageno = print_header(pageno); printf("the page number is now %d\n", pageno); } # The production of the report would be coded here END { close ("report.txt"); system ("lpr -Pmyprinter report.txt"); } function print_header(page ) { page++; printf("This is the header for page %d\n", page) > "report.txt";