-->

Previous | Table of Contents | Next

Page 552

Typical search strings can be used to search for a name in the first field (Bob) and compare specific fields with regular expressions:

$1 == "Bob"   { print "Bob stuff" }

$2 ~ /(may)|(MAY)|(May)/ { print "May stuff" }

$3 !~ /[Mm][Aa][Yy]/ { print "other May stuff" }

Compound Pattern Operators

The compound pattern operators used by awk are similar to those used by C and the UNIX shells. They are the notation used to combine other patterns (expressions or regular expressions) into a complex form of logic.

Table 27.3 shows the compound pattern operators and their behavior.

Table 27.3. Compound pattern operators in awk.

Operator Meaning
&& Logical AND
|| Logical OR
! Logical NOT
() Parentheses—used to group compound statements

If I wanted to execute some action (print a special message, for instance), if the first field contained the value "Bob" and the fourth field contained the value "Street", I could use a compound pattern that looks like:

$1 == "Bob" && $4 == "Street" {print"some message"}

Range Pattern Operators

The range pattern is slightly more complex than the other types—it is set true when the first pattern is matched and remains true until the second pattern becomes true. The catch is that the file needs to be sorted on the fields that the range pattern matches. Otherwise, it might be set true prematurely or end early.

The individual patterns in a range pattern are separated by a comma (,). If you have twenty-six files in your directory with the names A to Z, you can show a range of the files as shown in Listing 27.2.

Listing 27.2. Range pattern example.

$ ls | gawk `{$1 == "B", $1 == "D"}'

B

C

D

Page 553


$ ls | gawk `{$1 == "B", $1 <= "D"}'

B

$ ls | gawk `{$1 == "B", $1 > "D"}'

B

C

D

E

$ _

The first example is obvious—all the records between B and D are shown. The other examples are less intuitive, but the key to remember is that the pattern is done when the second condition is true. The second gawk command only shows the B because C is less than or equal to D (making the second condition true). The third gawk shows B through E because E is the first one that is greater than D (making the second condition true).

Handling Input

As each record is read by awk, it breaks it down into fields and then searches for matching patterns and the related actions to perform. It assumes that each record occupies a single line (the newline character, by definition, ends a record). Lines that are just blanks or are empty (just the newline) count as records, just with very few fields (usually zero).

You can force awk to read the next record in a file (cease searching for pattern matches) by using the next statement. next is similar to the C continue command—control returns to the outermost loop. In awk, the outermost loop is the automatic read of the file. If you decide you need to break out of your program completely, you can use the exit statement. exit will act like the end-of-file was reached and pass control to the END block (if one exists). If exit is in the END block, the program will immediately exit.

By default, fields are separated by spaces. It doesn't matter to awk whether there is one or many spaces—the next field begins when the first nonspace character is found. You can change the field separator by setting the variable FS to that character. To set your field separator to the colon (:), which is the separator in /etc/passwd, code the following:

BEGIN { FS = ":" }

The general format of the file looks something like the following:

david:!:207:1017:David B Horvath,CCP:/u/david:/bin/ksh

If you want to list the names of everyone on the system, use the following:

gawk --field-separator=: `{ print $5 }' /etc/passwd

You will then see a list of everyone's name. In this example, I set the field separator variable (FS) from the command line using the gawk format command-line options (--field-
separator=:). I could also use -F :, which is supported by all versions of awk.

Page 554

The first field is $1, the second is $2, and so on. The entire record is contained in $0. You can get the last field (if you are lazy like me and don't want to count) by referencing $NF. NF is the number of fields in a record.

Coding Your Program

The nice thing about awk is that, with a few exceptions, it is free format—like the C language. Blank lines are ignored. Statements can be placed on the same line or split up in any form you like. awk recognizes whitespace, much like C does. The following two lines are essentially the same:

$1=="Bob"{print"Bob stuff"}

$1    ==    "Bob"       {     print    "Bob stuff"     }

Spaces within quotes are significant because they will appear in the output or are used in a comparison for matching. The other spaces are not. You can also split up the action (but you have to have the opening curly brace on the same line as the pattern):

$1    ==    "Bob"       {

                           print    "Bob stuff"

                        }

You can have multiple statements within an action. If you place them on the same line, you need to use semicolons (;) to separate them (so awk can tell when one ends and the next begins). Printing multiple lines looks like the following:

$1    ==    "Bob"       {

                           print    "Bob stuff"; print    "more stuff";

                           Âprint    "last stuff";

                        }

You can also put the statements on separate lines. When you do that, you don't need to code the semicolons, and the code looks like the following:

$1    ==    "Bob"       {

                           print    "Bob stuff"

                           print    "more stuff"

                           print    "last stuff"

                        }

Personally, I am in the habit of coding the semicolon after each statement because that is the way I have to do it in C. To awk, the following example is just like the previous (but you can see the semicolons):

$1    ==    "Bob"       {

                           print    "Bob stuff";

                           print    "more stuff";

                           print    "last stuff";

                        }

Another thing you should make use of is comments. Anything on a line after the pound sign or octothorpe (#) is ignored by awk. These are notes designed for the programmer to read and aid in the understanding of the program code. In general, the more comments you place in a program, the easier it is to maintain.

Previous | Table of Contents | Next