-->
Previous | Table of Contents | Next |
Running pattern-action pairs one or two at a time from the command line would be pretty difficult (and time-consuming), so gawk allows you to store pattern-action pairs in a file. A gawk program (called a script) is a set of pattern-action pairs stored in an ASCII file. For example, this could be the contents of a valid gawk script:
/tparker/{print $6} $2 != foo {print}
The first line looks for tparker and prints the sixth column, and the second line starts at the top of the file again and looks for second columns that dont match the string foo, then displays the entire line. When you are writing a script, you dont need to worry about the quotation marks around the pattern-action pairs as you did on the command line, because the new command to execute this script makes it obvious where the pattern-action pairs start and end.
After you have saved all of the pattern-action pairs in a program, they are called by gawk with the -f option on the command line:
gawk -f script filename
This command causes gawk to read all of the pattern-action pairs from the file script and process them against the file called filename. This is how most gawk programs are written. Dont confuse the -f and -F options!
If you want to specify a different field separator on the command line (they can be specified in the script, but use a special format youll see later), the -F option must follow the -f option:
gawk -f script -F: filename
If you want to process more than one file using the script, just append the names of the files:
gawk -f script filename1 filename2 filename3
By default, all output from the gawk command is displayed on the screen. You can redirect it to a file with the usual Linux redirection commands:
gawk -f script filename > save_file
There is another way of specifying the output file from within the script, but well come back to that in a moment.
Two special patterns supported by gawk are useful when writing scripts. The BEGIN pattern is used to indicate any actions that should take place before gawk starts processing a file. This is typically used to initialize values, set parameters such as field separators, and so on. The END pattern is used to execute any instructions after the file has been completely processed. Typically, this can be for summaries or completion notices.
Any instructions following the BEGIN and END patterns are enclosed in curly braces to identify which instructions are part of both patterns. Both BEGIN and END must appear in capitals. Heres a simple example of a gawk script that uses BEGIN and END, albeit only for sending a message to the terminal:
BEGIN { print Starting the process the file } $1 == UNIX {print} $2 > 10 {printf This line has a value of %d, $2} END { print Finished processing the file. Bye!}
In this script, a message is initially printed out, and each line that has the word UNIX in the first column is echoed to the screen. Next, the file is processed again to look for any line with the second column greater than 10, and the message is generated with its current value. Finally, the END pattern prints out a message that the program is finished.
If you have used any programming language before, you know that a variable is a storage location for a value. Each variable has a name and an associated value, which may change.
With gawk, you assign a variable a value using the assignment operator (=):
var1 = 10
Note:
Dont confuse the assignment operator, =, which assigns a value, with the comparison operator, ==, which compares two values. This is a common error that takes a little practice to overcome.
This assigns the value 10 (numeric, not string) to the variable var1. With gawk, you dont have to declare variable types before you use them as you must with most other languages. This makes it easy to work with variables in gawk.
The gawk language lets you use variables within actions:
$1 == Plastic { count = count + 1 }
Note:
Actually, gawk assigns all variables a value of zero when they are first used, so you dont really have to define the value before you use it. It is, however, good programming practice to initialize the variable anyway.
This pattern-action pair checks to see if the first column is equal to the string Plastic, and if it is, increments the value of count by one. Somewhere above this line we should set a preliminary value for the variable count (usually in the BEGIN section), or we will be adding one to something that isnt a recognizable number.
Heres a more complete example:
BEGIN { count = 0 } $5 == UNIX { count = count + 1 } END { printf %d occurrences of UNIX were found, count }
In the BEGIN section, the variable count is set to zero. Then, the gawk pattern-action pair is processed, with every occurrence of UNIX adding one to the value of count. After the entire file has been processed, the END statement displays the total number.
Variables can be used in combination with columns and values, so all of the following statements are legal:
count = count + $6 count = $5 - 8 count = $5 + var1
Variables can also be part of a pattern. The following are both valid as pattern-action pairs:
$2 > max_value {print Max value exceeded by , $2 - max_value} $4 - var1 < min_value {print Illegal value of , $4}
Two special operators are used with variables to increment and decrement by one, because these are common operations. Both of these special operators are borrowed from the C language:
count++ | Increments count by one |
count-- | Decrements count by one |
Previous | Table of Contents | Next |