-->

Previous | Table of Contents | Next

Page 549

One of the ways I use awk most commonly is to process the output of another command by piping its output into awk. If I wanted to create a custom listing of files that contained the filename and then the permissions only, I would execute a command like:

ls -l | gawk `{print $NF, " ", $1}'

$NF is the last field (which is the filename; I am lazy—I didn't want to count the fields to figure out its number). $1 is the first field. The output of ls -l is piped into awk, which processes it for me.

If I put the awk script into a file (named lser.awk) and redirected the output to the printer, I would have a command that looks like:

ls -l | gawk -f lser.awk | lp

I tend to save my awk scripts with the file type (suffix) of .awk just to make it obvious when I am looking through a directory listing. If the program is longer than about 30 characters, I make a point of saving it because there is no such thing as a "one-time only" program, user request, or personal need.

CAUTION
If you forget the -f option before a program filename, your program will be treated as if it were data.
If you code your awk program on the command line but place it after the name of your data file, it will also be treated as if it were data.
What you will get is odd results.

See the section "Commands On-the-Fly" later in this chapter for more examples of using awk scripts to process piped data.

Patterns and Actions

Each awk statement consists of two parts: the pattern and the action. The pattern decides when the action is executed and, of course, the action is what the programmer wants to occur. Without a pattern, the action is always executed (the pattern can be said to "default to true").

There are two special patterns (also known as blocks): BEGIN and END. The BEGIN code is executed before the first record is read from the file and is used to initialize variables and set up things like control breaks. The END code is executed after end-of-file is reached and is used for any cleanup required (like printing final totals on a report). The other patterns are tested for each record read from the file.

Page 550

The general program format is to put the BEGIN block at the top, any pattern/action pairs, and finally, the END block at the end. This is not a language requirement—it is just the way most people do it (mostly for readability reasons).

BEGIN and END blocks are optional; if you use them, you should have a maximum of one each. Don't code two BEGIN blocks, and don't code two END blocks.

The action is contained within curly braces ({ }) and can consist of one or many statements. If you omit the pattern portion, it defaults to true, which causes the action to be executed for every line in the file. If you omit the action, it defaults to print $0 (print the entire record).

The pattern is specified before the action. It can be a regular expression (contained within a pair of slashes [/ /]) that matches part of the input record or an expression that contains comparison operators. It can also be compound or complex patterns which consists of expressions and regular expressions combined or a range of patterns.

Regular Expression Patterns

The regular expressions used by awk are similar to those used by grep, egrep, and the UNIX editors ed, ex, and vi. They are the notation used to specify and match strings. A regular expression consists of characters (like the letters A, B, and c—that match themselves in the input) and metacharacters. Metacharacters are characters that have special (meta) meaning; they do not match to themselves but perform some special function.

Table 27.1 shows the metacharacters and their behavior.

Table 27.1. Regular expression metacharacters in awk.

Metacharacter Meaning
\ Escape sequence (next character has special meaning, \n is the
newline character and \t is the tab). Any escaped metacharacter will
match to that character (as if it were not a metacharacter).
^ Starts match at beginning of string.
$ Matches at end of string.
. Matches any single character.
[ABC] Matches any one of A, B, or C.
[A-Ca-c] Matches any one of A, B, C, a, b, or c (ranges).
[^ABC] Matches any character other than A, B, and C.
Desk|Chair Matches any one of Desk or Chair.
[ABC][DEF] Concatenation. Matches any one of A, B, or C that is followed by any
one of D, E, or F.
* [ABC]*—Matches zero or more occurrences of A, B, o r C.

Page 551

Metacharacter Meaning
+ [ABC]+—Matches one or more occurrences of A, B, or C.
? [ABC]?—Matches to an empty string or any one of A, B, or C.
() Combines regular expressions. For example, (Blue|Black)berry
matches to Blueberry or Blackberry.

All of these can be combined to form complex search strings. Typical search strings can be used to search for specific strings (Report Date), strings in different formats (may, MAY, May), or as groups of characters (any combination of upper- and lowercase characters that spell out the month of May). These look like the following:

/Report Date/  { print "do something" }

/(may)|(MAY)|(May)/ { print "do something else" }

/[Mm][Aa][Yy]/ { print "do something completely different" }

Comparison Operators and Patterns

The comparison operators used by awk are similar to those used by C and the UNIX shells. They are the notation used to specify and compare values (including strings). A regular expression alone will match to any portion of the input record. By combining a comparison with a regular expression, specific fields can be tested.

Table 27.2 shows the comparison operators and their behavior.

Table 27.2. Comparison operators in awk.

Operator Meaning
== Is equal to
< Less than
> Greater than
<= Less than or equal to
>= Greater than or equal to
!= Not equal to
~ Matched by regular expression
!~ Not matched by regular expression

This enables you to perform specific comparisons on fields instead of the entire record. Remember that you can also perform them on the entire record by using $0 instead of a specific field.

Previous | Table of Contents | Next