-->
Previous | Table of Contents | Next |
The gawk language has a few built-in variables that are used to represent things such as the total number of records processed. These are useful when you want to get totals. Table 25.7 shows the important built-in variables.
Variable | Description |
---|---|
NR | The number of records read so far |
FNR | The number of records read from the current file |
FILENAME | The name of the input file |
FS | Field separator (default is whitespace) |
RS | Record separator (default is newline) |
OFMT | Output format for numbers (default is %g) |
OFS | Output field separator |
ORS | Output record separator |
NF | The number of fields in the current record |
The NR and FNR values are the same if you are processing only one file, but if you are doing more than one file, NR is a running total of all files, while FNR is the total for the current file only.
The FS variable is useful because it controls the input files field separator. To use the colon for the /etc/passwd file, for example, use the following command in the script, usually as part of the BEGIN pattern:
FS=:
You can use these built-in variables as you would any other. For example, the following command gives you a way to check the number of fields in the file you are processing and generates an error message if the values are incorrect:
NF <= 5 {print Not enough fields in the record}
Enough of the details have been covered to allow us to start doing some real gawk programming. Although we have not covered all of gawks pattern and action considerations, we have seen all the important material. Now we can look at writing control structures.
If you have any programming experience at all or have tried some shell script writing, many of these control structures will appear familiar. If you havent done any programming, common sense should help, as gawk is cleanly laid out without weird syntax. Follow the examples and try a few test programs of your own.
Incidentally, gawk enables you to place comments anywhere in your scripts, as long as the comment starts with a # sign. You should use comments to indicate what is going on in your scripts if it is not immediately obvious.
The if statement is used to allow gawk to test some condition and, if it is true, execute a set of commands. The general syntax for the if statement is as follows:
if (expression) {commands} else {commands}
The expression is always evaluated to see if it is true or false. No other value is calculated for the if expression. Heres a simple if script:
# a simple if loop (if ($1 == 0){ print This cell has a value of zero } else { printf The value is %d\n, $1 })
Notice that the curly braces were used to lay out the program in a readable manner. Of course, this could all have been entered on one line and gawk would have understood it, but writing in a nicely formatted manner makes it easier to understand what is going on and to debug the program if the need arises.
In this simple script, we test the first column to see if the value is zero. If it is, a message to that effect is printed. If not, the printf statement prints the value of the column.
The flow of the if statement is quite simple to follow. There can be several commands in each part, as long as the curly braces mark the start and end. There is no need to have an else section. It can be left out entirely, if desired. For example, this is a complete and valid gawk script:
(if ($1 == 0){ print This cell has a value of zero })
The gawk language, to be compatible with other programming languages, allows a special format of the if statement when a simple comparison is being conducted. This quick-and-dirty if structure is harder to read for novices, and I dont recommend it if you are new to the language. For example, heres the if statement written the proper way:
# a nicely formatted if loop (if ($1 > $2){ print The first column is larger } else { print The second column is larger })
Heres the quick-and-dirty method:
# if syntax from hell $1 > $2{ print The first column is larger } {print The second column is larger)
You may notice that the keywords if and else are left off. The general structure is retained: expression, true commands, and false commands. However, this is much less readable if you dont know that it is an if statement! Not all versions of gawk allow this method of using if, so dont be too surprised if it doesnt work. Besides, you should be using the more verbose method of writing if statements for readabilitys sake.
The while statement allows a set of commands to be repeated as long as some condition is true. The condition is evaluated each time the program loops. The general format of the gawk while loop is as follows:
while (expression){ commands }
For example, the while loop can be used in a program that calculates the value of an investment over several years (the formula for the calculation is value=amount(1+interest_rate)^years):
# interest calculation computes compound interest # inputs from a file are the amount, interest_rate, and years {var = 1 while (var <= $3) { printf(%f\n, $1*(1+$2)^var) var++ } }
You can see in this script that we initialize the variable var to 1 before entering the while loop. If we dont do this, gawk assigns a value of zero. The values for the three variables we use are read from the input file. The autoincrement command is used to add one to var each time the line is executed.
Previous | Table of Contents | Next |