-->
Previous Table of Contents Next


Built-In Variables

The gawk language has a few built-in variables that are used to represent things such as the total number of records processed. These are useful when you want to get totals. Table 25.7 shows the important built-in variables.

Table 25.7. The important built-in variables.

Variable Description

NR The number of records read so far
FNR The number of records read from the current file
FILENAME The name of the input file
FS Field separator (default is whitespace)
RS Record separator (default is newline)
OFMT Output format for numbers (default is %g)
OFS Output field separator
ORS Output record separator
NF The number of fields in the current record

The NR and FNR values are the same if you are processing only one file, but if you are doing more than one file, NR is a running total of all files, while FNR is the total for the current file only.

The FS variable is useful because it controls the input file’s field separator. To use the colon for the /etc/passwd file, for example, use the following command in the script, usually as part of the BEGIN pattern:


FS=”:”

You can use these built-in variables as you would any other. For example, the following command gives you a way to check the number of fields in the file you are processing and generates an error message if the values are incorrect:


NF <= 5 {print “Not enough fields in the record”}

Control Structures

Enough of the details have been covered to allow us to start doing some real gawk programming. Although we have not covered all of gawk’s pattern and action considerations, we have seen all the important material. Now we can look at writing control structures.

If you have any programming experience at all or have tried some shell script writing, many of these control structures will appear familiar. If you haven’t done any programming, common sense should help, as gawk is cleanly laid out without weird syntax. Follow the examples and try a few test programs of your own.

Incidentally, gawk enables you to place comments anywhere in your scripts, as long as the comment starts with a # sign. You should use comments to indicate what is going on in your scripts if it is not immediately obvious.

The if Statement

The if statement is used to allow gawk to test some condition and, if it is true, execute a set of commands. The general syntax for the if statement is as follows:


if (expression) {commands} else {commands}

The expression is always evaluated to see if it is true or false. No other value is calculated for the if expression. Here’s a simple if script:


# a simple if loop

(if ($1 == 0){

   print “This cell has a value of zero”

   }

else {

   printf “The value is %d\n”, $1

   })

Notice that the curly braces were used to lay out the program in a readable manner. Of course, this could all have been entered on one line and gawk would have understood it, but writing in a nicely formatted manner makes it easier to understand what is going on and to debug the program if the need arises.

In this simple script, we test the first column to see if the value is zero. If it is, a message to that effect is printed. If not, the printf statement prints the value of the column.

The flow of the if statement is quite simple to follow. There can be several commands in each part, as long as the curly braces mark the start and end. There is no need to have an else section. It can be left out entirely, if desired. For example, this is a complete and valid gawk script:


(if ($1 == 0){

   print “This cell has a value of zero”

   })

The gawk language, to be compatible with other programming languages, allows a special format of the if statement when a simple comparison is being conducted. This quick-and-dirty if structure is harder to read for novices, and I don’t recommend it if you are new to the language. For example, here’s the if statement written the proper way:


# a nicely formatted if loop

(if ($1 > $2){

   print “The first column is larger”

   }

else {

   print “The second column is larger”

   })

Here’s the quick-and-dirty method:


# if syntax from hell

$1 > $2{

   print “The first column is larger”

   }

{print “The second column is larger”)

You may notice that the keywords if and else are left off. The general structure is retained: expression, true commands, and false commands. However, this is much less readable if you don’t know that it is an if statement! Not all versions of gawk allow this method of using if, so don’t be too surprised if it doesn’t work. Besides, you should be using the more verbose method of writing if statements for readability’s sake.

The while Loop

The while statement allows a set of commands to be repeated as long as some condition is true. The condition is evaluated each time the program loops. The general format of the gawk while loop is as follows:


while (expression){

 commands

   }

For example, the while loop can be used in a program that calculates the value of an investment over several years (the formula for the calculation is value=amount(1+interest_rate)^years):


# interest calculation computes compound interest

# inputs from a file are the amount, interest_rate, and years

{var = 1

while (var <= $3) {

   printf(“%f\n”, $1*(1+$2)^var)

   var++

   }

}

You can see in this script that we initialize the variable var to 1 before entering the while loop. If we don’t do this, gawk assigns a value of zero. The values for the three variables we use are read from the input file. The autoincrement command is used to add one to var each time the line is executed.


Previous Table of Contents Next