-->

Previous | Table of Contents | Next

Page 555

Actions

The actions of your program are the part that tells awk what to do when a pattern is matched. If there is no pattern, it defaults to true. A pattern without an action defaults to {print $0}.

All actions are enclosed within curly braces ({ }). The open brace should appear on the same line as the pattern; other than that, there are no restrictions. An action will consist of one or many actions.

Variables

Except for simple find-and-print types of programs, you are going to need to save data. That is done through the use of variables. Within awk, there are three types of variables: field, predefined, and user-defined. You have already seen examples of the first two—$1 is the field variable that contains the first field in the input record, and FS is the predefined variable that contains the field separator.

User-defined variables are ones that you create. Unlike many other languages, awk doesn't require you to define or declare your variables before using them. In C, you must declare the type of data contained in a variable (such as int—integer, float—floating-point number, char—character data, and so on). In awk, you just use the variable. awk attempts to determine the data in the variable by how it is used. If you put character data in the variable, it is treated as a string; if you put a number in, it is treated as numeric.

awk will also perform conversions between the data types. If you put the string "123" in a variable and later perform a calculation on it, it will be treated as a number. The danger of this is, what happens when you perform a calculation on the string "abc"? awk will attempt to convert the string to a number, get a conversion error, and treat the value as a numeric zero! This type of logic error can be difficult to debug.

TIP
Initialize all your variables in a BEGIN action like this:
BEGIN {total = 0.0; loop = 0; first_time = "yes"; }

Like the C language, awk requires that variables begin with an alphabetic character or an underscore. The alphabetic character can be upper- or lowercase. The remainder of the variable name can consist of letters, numbers, or underscores. It would be nice (to yourself and anyone else who has to maintain your code once you are gone) to make the variable names meaningful. Make them descriptive.

Although you can make your variable names all uppercase letters, that is a bad practice because the predefined variables (like NF or FS) are in uppercase. It is a common error to type the

Page 556

predefined variables in lowercase (like nf or fs)—you will not get any errors from awk, and this mistake can be difficult to debug. The variables won't behave like the proper, uppercase spelling, and you won't get the results you expect.

Predefined Variables

gawk provides you with a number of predefined (also known as built-in) variables. These are used to provide useful data to your program; they can also be used to change the default behavior of the gawk (by setting them to a specific value).

Table 27.4 summarizes the predefined variables in gawk. Earlier versions of awk don't support all these variables.

Table 27.4. gawk predefined variables.

Variable Meaning Default Value (if any)
ARGC The number of command-line arguments
ARGIND The index within ARGV of the current
file being processed
ARGV An array of command-line arguments
CONVFMT The conversion format for numbers %.6g
ENVIRON The UNIX environmental variables
ERRNO The UNIX system error message
FIELDWIDTHS A whitespace separated string of the
width of input fields
FILENAME The name of the current input file
FNR The current record number
FS The input field separator Space
IGNORECASE Controls the case sensitivity 0 (case-sensitive)
NF The number of fields in the current record
NR The number of records already read
OFMT The output format for numbers %.6g
OFS The output field separator Space
ORS The output record separator Newline
RS Input record separator Newline
RSTART Start of string matched by match function
RLENGTH Length of string matched by match function
SUBSEP Subscript separator "\034"

Page 557

The ARGC variable contains the number of command-line arguments passed to your program. ARGV is an array of ARGC elements that contains the command-line arguments themselves. The first one is ARGV[0], and the last one is ARGV[ARGC-1]. ARGV[0] contains the name of the command being executed (gawk). The gawk command-line options won't appear in ARGV—they are interpreted by gawk itself. ARGIND is the index within ARGV of the current file being processed.

The default conversion (input) format for numbers is stored in CONVFMT (conversion format) and defaults to the format string "%.6g". See the section "printf" for more information on the meaning of the format string.

The ENVIRON variable is an array that contains the environmental variables defined to your UNIX session. The subscript is the name of the environmental variable for which you want to get the value.

If you want your program to perform specific code depending on the value in an environmental variable, you can use the following:

ENVIRON["TERM"] == "vt100"  {print "Working on a Video Tube!"}

If you are using a VT100 terminal, you will get the message Working on a Video Tube!. Note that you only put quotes around the environmental variable if you are using a literal. If you have a variable (named TERM) that contains the string "TERM", you would leave the double quotes off.

The ERRNO variable contains the UNIX system error message if a system error occurs during redirection, read, or close.

The FIELDWIDTHS variable provides a facility for fixed-length fields instead of using field separators. To specify the size of fields, you set FIELDWIDTHS to a string that contains the width of each field separated by a space or tab character. After this variable is set, gawk will split up the input record based on the specified widths. To revert to using a field separator character, you assign a new value to FS.

The variable FILENAME contains the name of the current input file. Because different (or even multiple files) can be specified on the command line, this provides you a means of determining which input file is being processed.

The FNR variable contains the number of the current record within the current input file. It is reset for each file that is specified on the command line. It always contains a value that is less than or equal to the variable NR.

The character that is used to separate fields is stored in the variable FS with a default value of space. You can change this variable with a command-line option or within your program. If you know that your file will have some character other than a space as the field separator (like the /etc/passwd file in earlier examples, which uses the colon), you can specify it in your program with the BEGIN pattern.

Previous | Table of Contents | Next