developer.com - Reference

Click here to support our advertisers

SHOPPING

JOB BANK

CLASSIFIEDS

DIRECTORIES

REFERENCE
Online Library

LEARNING CENTER

JOURNAL

NEWS CENTRAL

DOWNLOADS

COMMUNITY

CALENDAR

ABOUT US

Get the weekly email highlights from the most popular journal for developers!
Current issue

developer.com
developerdirect.com
htmlgoodies.com
javagoodies.com
jars.com
intranetjournal.com
javascripts.com

All Categories : Linux
developer.com - Reference

Click here to support our advertisers
SHOPPING
JOB BANK
CLASSIFIEDS
DIRECTORIES
REFERENCE
Online Library
LEARNING CENTER
JOURNAL
NEWS CENTRAL
DOWNLOADS
COMMUNITY
CALENDAR
ABOUT US

Journal:

Get the weekly email highlights from the most popular journal for developers!
Current issue

developer.com
developerdirect.com
htmlgoodies.com
javagoodies.com
jars.com
intranetjournal.com
javascripts.com
Linux

- 26 -

gawk

What Is the awk Language?
Files, Records, and Fields
NOTE
Pattern-Action Pairs
NOTE

Simple Patterns

NOTE

Comparisons and Arithmetic
Strings and Numbers
Formatting Output
Changing Field Separators
Metacharacters

Calling gawk Programs

BEGIN and END
Variables

NOTE
NOTE

Built-In Variables

Control Structures

The if Statement
The while Loop
The for Loop
next and exit
Arrays

Summary

- 26 -

gawk

by Tim Parker

IN THIS CHAPTER

What Is the awk Language?

Files, Records, and Fields

Pattern-Action Pairs

Calling gawk Programs

Control Structures

The awk programming language was created by the three people who gave their last-name initials to the language: Alfred Aho, Peter Weinberger, and Brian Kernighan. The gawk program included with Linux is the GNU implementation of that programming language.

The awk language is more than just a programming language; it is an almost indispensable tool for many system administrators and UNIX programmers. The language itself is easy to learn, easy to master, and amazingly flexible. Once you get the hang of using awk, you'll be surprised how often you can use it for routine tasks on your system.

To help you understand gawk, I will follow a simple order of introducing the elements of the programming language, as well as showing good examples. You are encouraged, or course, to experiment as the chapter progresses.

I can't cover all the different aspects and features of gawk in this chapter, but we will look at the basics of the language and show you enough, hopefully, to get your curiosity working.
What Is the awk Language?

awk is designed to be an easy-to-use programming language that lets you work with information either stored in files or piped to it. The main strengths of awk are its capabilities to do the following:

Display some or all the contents of a file, selecting rows, columns, or fields as necessary.

Analyze text for frequency of words, occurrences, and so on.

Prepare formatted output reports based on information in a file.

Filter text in a very powerful manner.

Perform calculations with numeric information from a file.

awk isn't difficult to learn. In many ways, awk is the ideal first programming language because of its simple rules, basic formatting, and standard usage. Experienced programmers will find awk refreshingly easy to use.
Files, Records, and Fields

Usually, gawk works with data stored in files. Often this is numeric data, but gawk can work with character information, too. If data is not stored in a file, it is supplied to gawk through a pipe or other form of redirection. Only ASCII files (text files) can be properly handled with gawk. Although it does have the ability to work with binary files, the results are often unpredictable. Since most information on a Linux system is stored in ASCII, this isn't a problem.

As a simple example of a file that gawk works with, consider a telephone directory. It is composed of many entries, all with the same format: last name, first name, address, telephone number. The entire telephone directory is a database of sorts, although without a sophisticated search routine. Indeed, the telephone directory relies on a pure alphabetical order to enable users to search for the data they need.

Each line in the telephone directory is a complete set of data on its own and is called a record. For example, the entry in the telephone directory for "Smith, John," which includes his address and telephone number, is a record.

Each piece of information in the record--the last name, the first name, the address, and the telephone number--is called a field. For the gawk language, the field is a single piece of information. A record, then, is a number of fields that pertain to a single item. A set of records makes up a file.

In most cases, fields are separated by a character that is used only to separate fields, such as a space, a tab, a colon, or some other special symbol. This character is called a field separator. A good example is the file /etc/passwd, which looks like this:
tparker:t36s62hsh:501:101:Tim Parker:/home/tparker:/bin/bash etreijs:2ys639dj3h:502:101:Ed Treijs:/home/etreijs:/bin/tcsh ychow:1h27sj:503:101:Yvonne Chow:/home/ychow:/bin/bash

If you look carefully at the file, you will see that it uses a colon as the field separator. Each line in the /etc/passwd file has seven fields: the user name, the password, the user ID, the group ID, a comment field, the home directory, and the startup shell. Each field is separated by a colon. Colons exist only to separate fields. A program looking for the sixth field in any line needs only count five colons across (because the first field doesn't have a colon before it).

That's where we find a problem with the gawk definition of fields as they pertain to the telephone directory example. Consider the following lines from a telephone directory:
Smith, John 13 Wilson St. 555-1283 Smith, John 2736 Artside Dr, Apt 123 555-2736 Smith, John 125 Westmount Cr 555-1726

We "know" there are four fields here: the last name, the first name, the address, and the telephone number. But gawk doesn't see it that way. The telephone book uses the space character as a field separator, so on the first line it sees "Smith" as the first field, "John" as the second, "13" as the third, "Wilson" as the fourth, and so on. As far as gawk is concerned, the first line when using a space character as a field separator has six fields. The second line has eight fields.

NOTE: When working with a programming language, you must consider data the way the language will see it. Remember that programming languages take things literally.

To make sense of the telephone directory the way we want to handle it, we have to find another way of structuring the data so that there is a field separator between the sections. For example, the following uses the slash character as the field separator:
Smith/John/13 Wilson St./555-1283 Smith/John/2736 Artside Dr, Apt 123/555-2736 Smith/John/125 Westmount Cr/555-1726

By default, gawk uses blank characters (spaces or tabs) as field separators unless instructed to use another character. If gawk is using spaces, it doesn't matter how many are in a row; they are treated as a single block for purposes of finding fields. Naturally, there is a way to override this behavior, too.
Pattern-Action Pairs

The gawk language has a particular format for almost all instructions. Each command is composed of two parts: a pattern and a corresponding action. Whenever the pattern is matched, gawk executes the action that matches that pattern.

Pattern-action pairs can be thought of in more common terms to show how they work. Consider instructing someone how to get to the post office. You might say, "Go to the end of the street and turn right. At the stop sign, turn left. At the end of the street, go right." You have created three pattern-action pairs with these instructions:
end of street: turn right stop sign: turn left end of street: turn right

When these patterns are met, the corresponding action is taken. You wouldn't turn right before you reached the end of the street, and you don't turn right until you get to the end of the street, so the pattern must be matched precisely for the action to be performed. This is a bit simplistic, but it gives you the basic idea.

With gawk, the patterns to be matched are enclosed in a pair of slashes, and the actions are in a pair of curly braces:
/pattern1/{action1} /pattern2/{action2} /pattern3/{action3}

This format makes it quite easy to tell where the pattern starts and ends, and when the action starts and ends. All gawk programs are sets of these pattern-action pairs, one after the other. Remember these pattern-action pairs are working on text files, so a typical set of patterns might be matching a set of strings, and the actions might be to print out parts of the line that matched.

Suppose there isn't a pattern? In that case, the pattern matches every time and the action is executed every time. If there is no action, gawk copies the entire line that matched without change.

Here are some simple examples. The gawk command
gawk `/tparker/' /etc/passwd

will look for each line in the /etc/passwd file that contains the pattern tparker and display it (there is no action, only a pattern). The output from the command will be the one line in the /etc/passwd file that contains the string tparker. If there is more than one line in the file with that pattern, they all will be displayed. In this case, gawk is acting exactly like the grep utility!

This example shows you two important things about gawk: It can be invoked from the command line by giving it the pattern-action pair to work with and a filename, and it likes to have single quotes around the pattern-action pair in order to differentiate them from the filename.

The gawk language is literal in its matching. The string cat will match any lines with cat in them, whether the word "cat" by itself or part of another word such as "concatenate." To be exact, put spaces on either side of the word. Also, case is important. We'll see how to expand the matching in the section "Metacharacters" a little later in the chapter.

Jumping ahead slightly, we can introduce a gawk command. The command
gawk `{print $3}' file2.data

has only one action, so it performs that action on every line in the file file2.data. The action is print $3, which tells gawk to print the third field of every line. The default field separator, a space, is used to tell where fields begin and end. If we had tried the same command on the /etc/passwd file, nothing would have been displayed because the field separator used in that file is the colon.

We can combine the two commands to show a complete pattern-action pair:
gawk `/UNIX/{print $2}' file2.data

This command will search file2.data line by line, looking for the string UNIX. If it finds UNIX, it prints the second field of that line (record).

NOTE: The quotes around the entire pattern-action pair are very important and should not be left off. Without them, the command might not execute properly. Make sure the quotes match (don't use a single quote at the beginning and a double quote at the end).

You can combine more than one pattern-action pair in a command. For example,
gawk `/scandal/{print $1} /rumor/{print $2}' gossip_file

scans each line of gossip_file for the patterns "scandal" and "rumor." When a match is found, gawk prints the first or second field, respectively.
Simple Patterns

As you might have figured out, gawk numbers all of the fields in a record. The first field is $1, the second is $2, and so on. The entire record is called $0. As a short form, gawk allows you to ignore the $0 in simple commands, so the instructions
gawk `/tparker/{print $0}' /etc/passwd gawk `/tparker/{print}' /etc/passwd gawk `/tparker/' /etc/passwd

result in the same output (the latter one because no action causes the entire line to be printed).

Sometimes you want to do more than match a simple character string. The gawk language has many powerful features, but I'll just introduce a few at the moment. We can, for example, make a comparison of a field with a value. The command
gawk `$2 == "foo" {print $3}' testfile

instructs gawk to compare the second field ($2) of each record in testfile and check to see whether it is equal to the string foo. If it is, gawk prints the third field ($3).

This command demonstrates a few important points. First, there are no slashes around the pattern because we are not matching a pattern but are evaluating something. Slashes are used only for character matches. Second, the == sign means "is equal to." We must use two equal signs, because the single equal sign is used for assignment of values, as you will see shortly. Finally, we put double quotations around foo because we want gawk to interpret it literally. Only strings of characters that are to be literally interpreted must be quoted in this manner.

NOTE: Don't confuse the quotes used for literal characters with those used to surround the pattern-action pair on the command line. If you use the same quote marks for both, gawk will be unable to process the command properly.

Comparisons and Arithmetic

An essential component of any programming language is the ability to compare two strings or numbers and evaluate whether they are equal or different. The gawk program has several comparisons, including ==, which you just saw in an example. Table 26.1 shows the important comparisons.

Table 26.1. The important comparisons.

Comparison Description

== Equal to

!= Not equal to

> Greater than

< Less than

>= Greater than or equal to

<= Less than or equal to

These are probably familiar to you from arithmetic and other programming languages you may have seen. From this, you can surmise that the command
gawk `$4 > 100' testfile

will display every line in testfile in which the value in the fourth field is greater than 100.

All of the normal arithmetic commands are available, including add, subtract, multiply, and divide. There are also more advanced functions such as exponentials and remainders (also called modulus). Table 26.2 shows the basic arithmetic operations that gawk supports.

Table 26.2. Basic arithmetic operators.

Operator Description Example

+ Addition 2+6

- Subtraction 6-3

* Multiplication 2*5

/ Division 8/4

^ Exponentiation 3^2 (=9)

% Remainder 9%4 (=1)

You can combine fields and math, too. For example, the action
{print $3/2}

divides the number in the third field by 2.

There is also a set of arithmetic functions for trigonometry and generating random numbers. See Table 26.3.

Table 26.3. Random-number and trigonometric functions.

Function Description

sqrt(x) Square root of x

sin(x) Sine of x (in radians)

cos(x) Cosine of x (in radians)

atan2(x,y) Arctangent of x/y

log(x) Natural logarithm of x

exp(x) The constant e to the power x

int(x) Integer part of x

rand() Random number between 0 and 1

srand(x) Set x as seed for rand()

The order of operations is important to gawk, as it is to regular arithmetic. The rules gawk follows are the same as with arithmetic: all multiplications, divisions, and remainders are performed before additions and subtractions. For example, the command
{print $1+$2*$3}

multiplies field two by field three and then adds the result to field one. If you wanted to force the addition first, you would have to use parentheses:
{print ($1+$2)*$3}

Because these are the same rules you used in algebra, they shouldn't cause you any confusion. Remember, if in doubt, put parentheses in the proper places to force the operations.

Strings and Numbers

If you've used any other programming language, these concepts will be familiar to you. If you are new to programming, you will probably find them obvious, but you'd be surprised how many people get things hopelessly muddled by using strings when they should have used numbers.

A string is a set of characters to be interpreted literally by gawk. Strings are surrounded by quotation marks. Numbers are not surrounded by quotation marks and are treated as real values.

For example, the command
gawk `$1 != "Tim" {print}' testfile

will print any line in testfile that doesn't have the word Tim in the first field. If we had left out the quotation marks around Tim, gawk wouldn't have processed the command properly. The command

gawk `$1 == "50" {print}' testfile

will display any line that has the string 50 in it. It does not attempt to see if the value stored in the first field is different than 50; it just does a character check. The string 50 is not equal to the number 50 as far as gawk is concerned.

Formatting Output

We've seen how to do simple actions in the commands we've already discussed, but you can do several things in an action. For example, the command
gawk `$1 != "Tim" {print $1, $5, $6, $2}' testfile

will print the first, fifth, sixth, and second field of testfile for every line that doesn't have the first field equal to "Tim". You can place as many of these fields as you want in a print command.

Indeed, you can place strings in a print command, too, such as in the command
gawk `$1 != "Tim" {print "The entry for ", $1, "is not Tim. ", $2}' testfile

which will print the strings and the fields as shown. Each section of the print command is separated by a comma. There are also spaces at the end of the strings to ensure there is a space between the string and the value of the field that is printed.

You can use additional formatting instructions to make gawk format the output properly. These instructions are borrowed from the C language, and they use the command printf (print formatted) instead of print.

The printf command uses a placeholder scheme, but the gawk language knows how to format the entry because of the placeholder and looks later in the command line to find out what to put there. An example will help make this obvious:
{printf "%5s likes this language\n", $2}

The %5s part of the line instructs gawk how to format the string, in this case using five string characters. The value to place in this position is given at the end of the line as the second column. The \n at the end of the quoted section is a newline character. If the second field of a four-line file held names, printf would format the output like this:
Tim likes this language Geoff likes this language Mike likes this language Joe likes this language

You will notice that the %5s format means to right-justify the column entry. This prevents awkward spacing.

The gawk language supports several format placeholders. They are shown in Table 26.4.

Table 26.4. Format placeholders.

Placeholder Description

c If a string, the first character of the string; if an integer, the character that matches the first value

d An integer

e A floating-point number in scientific notation

f A floating-point number in conventional notation

g A floating-point number in either scientific or conventional notation, whichever is shorter

o An unsigned integer in octal format

s A string

x An unsigned integer in hexadecimal format

Whenever you use one of the format characters, you can place a number before the character to show how many digits or characters are to be used. Therefore, the format 6d would have six digits of an integer. Many formats can be on a line, but each must have a value at the end of the line, as in this example:
{printf "%5s works for %5s and earns %2d an hour", $1, $2, $3}

Here, the first string is the first field, the second string is the second field, and the third set of digits is from the third field in a file. The output would be something like this:
Joe works for Mike and earns 12 an hour

A few little tricks are useful. As you saw in an earlier example, strings are right-justified, so the command
{printf "%5s likes this language\n", $2}

results in the output
Tim likes this language Geoff likes this language Mike likes this language Joe likes this language

To left-justify the names, place a minus sign in the format statement:
{printf "%-5s likes this language\n", $2}

This will result in the output
Tim likes this language Geoff likes this language Mike likes this language Joe likes this language

Notice that the name is justified on the left instead of on the right.

When dealing with numbers, you can specify the precision to be used, so that the command
{printf "%5s earns $%.2f an hour", $3, $6}

will use the third field and put five characters from it in the first placeholder, and then take the value in the sixth field and place it in the second placeholder with two digits after the decimal point. The output of the command would be like this:
Joe earns $12.17 an hour

The dollar sign was inside the quotation marks in the printf command, and was not generated by the system. It has no special meaning inside the quotation marks. If you want to limit the number of digits to the right of the period, you can do that too. The command
{printf "%5s earns $%6.2f an hour", $3, $6}

will put six digits before the period and two after.

Finally, we can impose some formatting on the output lines themselves. In an earlier example, you saw the use of \n to add a newline character. These are called escape codes, because the backslash is interpreted by gawk to mean something different than a backslash. Table 26.5 shows the important escape codes that gawk supports.

Table 26.5. Escape codes.

Code Description

\a Bell

\b Backspace

\f Formfeed

\n Newline

\r Carriage return

\t Tab

\v Vertical tab

\ooo Octal character ooo

\xdd Hexadecimal character dd

\c Any character c

You can, for example, escape a quotation mark by using the sequence \", which will place a quotation mark in the string without interpreting it to mean something special. For example:
{printf "I said \"Hello\" and he said "\Hello\"."

Awkward-looking, perhaps, but necessary to avoid problems. You'll see lots more escape characters used in examples later in this chapter. To use a literal backslash, use \\ in your program.

Changing Field Separators

As I mentioned earlier, the default field separator is always a whitespace character (spaces or tabs). This is not often convenient, as we found with the /etc/passwd file. You can change the field separator on the gawk command line by using the -F option followed by the separator you want to use:
gawk -F":" `/tparker/{print}' /etc/passwd

This command changes the field separator to a colon and searches the /etc/passwd file for the lines containing the string tparker. The new field separator is put in quotation marks to avoid any confusion. Also, the -F option (it must be a capital F) is before the first quote character enclosing the pattern-action pair. If it came after, it wouldn't be applied.

Metacharacters

Earlier I mentioned that gawk is particular about its pattern-matching habits. The string cat will match anything with the three letters on the line. Sometimes you want to be more exact in the matching. If you only want to match the word "cat" but not "concatenate," you should put spaces on either side of the pattern:
/ cat / {print}

What about matching different cases? That's where the or instruction, represented by a vertical bar, comes in. For example,
/ cat | CAT / {print}

will match "cat" or "CAT" on a line. However, what about "Cat?" That's where we also need to specify options within a pattern. With gawk, we use square brackets for this. To match any combination of "cat" in upper- or lowercase, we must write the pattern like this:
/ [Cc][Aa][Tt] / {print}

This can get pretty awkward, but it's seldom necessary. To match just "Cat" and "cat," for example, we would use the pattern
/ [Cc]at / {print}

A useful matching operator is the tilde (~). This is used when you want to look for a match in a particular field in a record. For example, the pattern
$5 ~ /tparker/

will match any records where the fifth field is tparker. It is similar to the == operator. The matching operator can be negated, so
$5 !~ /tparker/

will find any record where the fifth field is not equal to tparker.

A few characters (called metacharacters) have special meaning to gawk. Many of these metacharacters will be familiar to shell users, because they are carried over from UNIX shells. The metacharacters shown in Table 26.6 can be used in gawk patterns.

Table 26.6. Metacharacters.

Metacharacter Meaning Example Meaning of Example

~ The beginning $3 ~ /^b/ Matches if the third field

of the field starts with b

$ The end of the $3 ~ /b$/ Matches if the third field

field ends with b

. Matches any $3 ~ /i.m/ Matches any record that has

single character a third field value of i, another character,

and then m

| Or /cat|CAT/ Matches cat or CAT

* Zero or more repe- /UNI*X/ Matches UNX, UNIX,

titions of a character UNIIX, UNIIIX, and so on

+ One of more repe- /UNI+X/ Matches UNIX, UNIIX, and

titions of a character so on, but not UNX

\{a,b\} The number of /UNI\{1,3\}X Matches only UNIX,

repetitions between UNIIX, and UNIIIX

a and b (both

integers)

? Zero or one repe- /UNI?X/ Matches UNX and UNIX

titions of a string only

[] Range of /I[BDG]M/ Matches IBM, IDM, and

characters IGM

[^] Not in the set /I[^DE]M/ Matches all three character sets starting with I and ending in M, except IDM andIEM

Some of these metacharacters are used frequently. You will see some examples later in this chapter.

Calling gawk Programs

Running pattern-action pairs one or two at a time from the command line would be pretty difficult (and time consuming), so gawk allows you to store pattern-action pairs in a file. A gawk program (called a script) is a set of pattern-action pairs stored in an ASCII file. For example, this could be the contents of a valid gawk script:
/tparker/{print $6} $2 != "foo" {print}

The first line would look for tparker and print the sixth field, and the second line would look for second fields that don't match the string "foo", then display the entire line. When you are writing a script, you don't need to worry about the quotation marks around the pattern-action pairs as you did on the command line, because the new command to execute this script makes it obvious where the pattern-action pairs start and end. After you have saved all of the pattern-action pairs in a program, they are called by gawk with the -f option on the command line:

gawk -f script filename

This command causes gawk to read all of the pattern-action pairs from the file script and process them against the file called filename. This is how most gawk programs are written. Don't confuse the -f and -F options!

If you want to specify a different field separator on the command line (they can be specified in the script, but use a special format you'll see later), the -F option must follow the -f option:

gawk -f script -F":" filename

If you want to process more than one file using the script, just append the names of the files:
gawk -f script filename1 filename2 filename3 ...

By default, all output from the gawk command is displayed on the screen. You could redirect it to a file with the usual UNIX redirection commands:
gawk -f script filename > save_file

There is another way of specifying the output file from within the script, but we'll come back to that in a moment.

BEGIN and END

Two special patterns supported by gawk are useful when writing scripts. The BEGIN pattern is used to indicate any actions that should take place before gawk starts processing a file. This is usually used to initialize values, set parameters such as field separators, and so on. The END pattern is used to execute any instructions after the file has been completely processed. Typically, this can be for summaries or completion notices.

Any instructions following the BEGIN and END patterns are enclosed in curly braces to identify which instructions are part of both patterns. Both BEGIN and END must appear in capitals. Here's a simple example of a gawk script that uses BEGIN and END, albeit only for sending a message to the terminal:
BEGIN { print "Starting to process the file" } $1 == "UNIX" {print} $2 > 10 {printf "This line has a value of %d", $2} END { print "Finished processing the file. Bye!"}

In this script, a message is initially printed, and each line that has the word UNIX in the first field is echoed to the screen. Next, any line with the second field greater than 10 is found, and the message is generated with its current value. Finally, the END pattern prints a message that the program is finished.

Variables

If you have used any programming language before, you know that a variable is a storage location for a value. Each variable has a name and an associated value, which may change.

With gawk, you assign a variable a value by using =, the assignment operator:
var1 = 10

This assigns the value 10 (numeric, not string) to the variable var1. With gawk, you don't have to declare variable types before you use them as you must with most other languages. This makes it easy to work with variables in gawk.

NOTE: Don't confuse the assignment operator, =, which assigns a value, with the comparison operator, ==, which compares two values. This is a common error that takes a little practice to overcome.

The gawk language lets you use variables within actions, so the pattern-action pair
$1 == "Plastic" { count = count + 1 }

checks to see if the first field is equal to the string "Plastic", and if it is, increments the value of count by one. Somewhere above this line we should set a preliminary value for the variable count (usually in the BEGIN section), or we will be adding one to an unknown value.

NOTE: Actually, gawk assigns all variables a value of zero when they are first used, so you don't really have to define the value before you use it. It is, however, good programming practice to initialize the variable anyway.

Here's a more complete example:
BEGIN { count = 0 } $5 == "UNIX" { count = count + 1 } END { printf "%d occurrences of UNIX were found", count }

In the BEGIN section, the variable count is set to zero. Then, the gawk pattern-action pair is processed, with every occurrence of "UNIX" adding one to the value of count. After the entire file has been processed, the END statement displays the total number.

Variables can be used in combination with fields and values, so all of the following statements are legal:
count = count + $6 count = $5 - 8 count = $5 + var1

Variables can also be part of a pattern. The following are all valid as pattern-action pairs:
$2 > max_value {print "Max value exceeded by ", $2 - max_value} $4 - var1 < min_value {print "Illegal value of ", $4}

Two special operators are used with variables to increment and decrement by one, because these are common operations. Both of these special operators are borrowed from the C language:

count++ Increments count by one

count--

Decrements count by one

Built-In Variables

The gawk language has a few built-in variables that are used to represent things such as the total number of records processed. These are useful when you want to get totals. Table 26.7 shows the important built-in variables.

Table 26.7. The important built-in variables.

Variable Description

NR The number of records read so far

FNR The number of records read from the current file

FILENAME The name of the input file

FS Field separator (default is whitespace)

RS Record separator (default is newline)

OFMT Output format for numbers (default is %g)

OFS Output field separator

ORS Output record separator

NF The number of fields in the current record

The NR and FNR values are the same if you are processing only one file, but if you are doing more than one file, NR is a running total of all files, while FNR is the total for the current file only.

The FS variable is useful, because it controls the input file's field separator. To use the colon for the /etc/passwd file, for example, you would use the command
FS=":"

in the script, usually as part of the BEGIN pattern.

You can use these built-in variables as you would any other. For example, the command

NF <= 5 {print "Not enough fields in the record"}

gives you a way to check the number of fields in the file you are processing and generate an error message if the values are incorrect.

Control Structures

Enough of the details have been covered to allow us to start doing some real gawk programming. Although we have not covered all of gawk's pattern and action considerations, we have seen all the important material. Now we can look at writing control structures.

If you have any programming experience at all, or have tried some shell script writing, many of these control structures will appear familiar. Follow the examples and try a few test programs of your own.

Incidentally, gawk enables you to place comments anywhere in your scripts, as long as the comment starts with a # sign. You should use comments to indicate what is going on in your scripts if it is not immediately obvious.

��R�h:��l� �S�S��PYI�qD�݆=��]�Ж�&E��A��=�J�;e 'R{O��k'δ�o�)wv+<Ѯ��PSM�S�e`�:0��ԵJ�U�cK*&�w��m�-5}�sG�>�е{Ǳ߂�%�@�E�X�ؘ�Ef*��B��9;mO3g��)��GWd�+�sk�F0�1�V|2��.!��lG�6e�p�@u��sImD�E$68D^c⁥�i��k�Tv�Nn1�7 E�dҬd�t��m��ŁK��Juѥc�h��c�p-��(荕 Pq�RS�Dˎ�8�B�Π"��}"��h��g��⟠�`��p�B�qGZ��8l��@~�� L�[k��]r�o��1�u9 i��e�u ݞO6�Y�MǨ�� TC�T ��|u� [��}�� gH�� i�ҵ�|�;בI��^a+TT� ��^if�($��Pb�;�G��ګ�I�&p��(/�C#Y0, ?��A��Ǫ��Oh�+�TQ�,{�\��i�G�p�:[�l 4)�[,F^��*��6ÏRj\��B�+��.��-��aXİc�ld[�~�.0�lL ��$��ܥ�q�$�1��r��Q(EI��Ҕ��Jkb��[v�M�%��*S��W� �e��-�bD�#:Z��o]��*��ȗ%�Nc��_lQ�G��h��PQ8��Ԉ�.��P�恰r��`��V�4`đ��x�Wd+��GFf"�G9Eg)LX�j)��S�z�&��[MK�{�t��t��B��s2y�t�C�mRR��0^~��Ib� �J�r�d�3�0��$�(�)e��֥j�{��D3�V6�e��7ˆQ�`�^�3�=gx!/5'�>k�9�C�?(Ɨ��a�oű-�c��d��{��O��O?��C@2Y�m�/��|Iο��u�M��WB`DA-�y��"!'��iC��cwc �7b8�L��tGuV�37�C�`8�! A�2��ҀN�T��:XzJ7G:�S[`kJMܐTp!'#t{pt/B, �gU%_�f6�8��`@{e�O�=��K�=lp1R� g@�,DtWF$#��^P&b2�#C� �p�?5*4 ��*1Na�4��r�W8k�VÁH��S�"��t2m�AO�!>��b�pF;d�6�e%D�#�Q['X@vvW�0��G�&le�?X�y9��_,�as�Qsa�b&f'��%�#$z�u\��>86��&�1Rg�A,�VS��q�S��$��h�� o�<�b�86�C��RX|2�R{"S�p�(#��S�`�_H-�'N��Y�EA��a��q4�U0L�`&�B�t��hZ� �GQXo�2�� %N�E5��a`D�jE�__�!K UՃ�v�AE�W׵s.�pi7;Jf0q}#�l K? z&f0:I.gU��8f� x<�`\�PM ��7��OG}�6-uW^9v2�#wAk�BB1'�G�՘i�M��PĂ3�� rG��u�P@WZF�T(�W�$]OKG�є��),i��V�s�⊲ui�hP27�S��[(�n9T ��R �?t�w��1w�d+y$�^��}��%#�e$%��w+��P19�ih}��2��'3SI_��G�t|b�4��2*i�gw�� 4Vzij��X��YԹt��*ĒdR��I:J�J%��y`$Ҩ"�#��RS �U#IF:��ER!:I�R1�o:�Z�#p)hcv7,��)��B��Tg/&s��wy��$�SdVʜ ��~�W�I�ꘒ�yuP�;DaQ˴u�S�ig�S ��w2?W�6Z�Y��Z�S��?a:��jE��vHx]�F�8`��V��6��9[�k��9��Eh�'�&�yY"�xaW� ��&-�J��Q��f��d��e��C�H�u��;�&G� �� "e��2@PrU�r�u'ށe3/��e-!r�ro�r �A�!6��N� � �0I�% �u��S+:�;9c58��jz��N�E��IQ��]�SEV�� B�h�T�|H�Z��h�䊭�%�P��Uwk:�Ḁ$�D@�VNם�Z�j>��@�Ӑ��VQ8T*� ��!��L��C��|1��z� ��gm�r#��Y� O7��پ1� ��7��b�2G�;�!��k|��Z^礩9Y�k5�hK�-�� *^�ێ�k�}��@q��#A��b!�H7�0��e�� 3zd��d�5��1-ύ�$�óf=�[��y��0�*r��W5�4�x'ANr�T�<$z�O2!�A2�(�S�� [폊kK7��j�K��n�R��:lM�;b5�w�[ﬧ�i�t� l$�b[��=��b}��2Na�A�M�qr="�I�LW�s� |V%��˸�IѮ�ߵi��9ѹ��⹋#�3I�9��E9J��\�0cD�~�6n�Fn",�(J�c<��d�n0߮$��=�܈��,�^�~�� =4��%.|��k�al~Y��t�F!Gnզm�G��2ه0��_��2�e�Ŋ�z�e��}o��e�E耳|��Zv?dT4��EfW�v�~��6��AG쵪�iPa�%KX�w��r��!�Vff�Y�!E�:�(��X�G2�XH��h��*��Ii��v4�a��z�鹠�x�xI�<��W��Ch#�fNҟD~�HF��C��䛾�w��*��?��?��cX)��1�+�x;sV�o+�P��6�@��A��S��r�*4��n��9t?��4��S�Q�l%�4��i 1.��R��r;�a"�x\˦^V�E1�y�aR��∯��.�� 4��g<�!a'��Mu�Aӝ�b��򇬎�P-�� #��&��9�:��n�n��='o�E�}nӤ�P��x�n��g�X��HkA��R��b��rG�[�Qi��R��:g%z[�b��hfl{c��G� /��$��.��'��6��E��G�O��(�X��g*�J��s=��*� ��N��>��Ė�x�y$5*ά�J�U��fQ��._�q�Z��9��(�z��"�~ϐ�BuqQ��&$��lٔ��ª�;��?_��x4��I]5��=4��M_�7;�%E�/�dk>#p��u�[I�C��X��+D[�)�ReN��T �T�6AVB��3 H�Ԫ��(j�A�i��R�q�4 b��Yb-�� 0d��i@a�� S8�ȅ��&��U&�uDC�EZj ��$��WC�d�:x�X@��D˴ ��6[��e�v&��)h��p�;!��RP�(j�nJ��y�K�ЎBW��v&��k��OO<ƌ� C*,��CD�d��g�66KaG�H,}�q@qf�F36�v*,��ZBj��L�&N�̚��A�O�R�mZ��ܧ�!R�cb�+b�q��j�1�K�Pú��z��28`d�V��2q�!ݴi�r92��AݘN!!��E�0)��K�x�J�uw\��fx�v��ɶ��G� �(�D��XP�"��œ];=�� f�[|�5Uu�,�Cvq%�M6*i�W�ױ�FkCeR`R/��[;̲CT��&a7�P�J�� x8�*Nx$J�%�Q%��BQ.-5�t��#[��S΅ �Vf!E��"^l�e#X� �S�H��!��0d�g�@��Nb�Wj��aQ��v��R8�ɣ�H�]'��]��_a��Q򀠧)��0F6,��JrĆѫ��L��I�� >R��SP,�ZޗHV j��Dj��!EhQ-��Nq��Q|��H+�(��ŷ5��.��g�K７]x��qqp\�N��b��F��Y�j� 輂��n�փ�&�� ˔�D��E�r!�,+&/��p~�OO��Z=��G��ե��3�lΘ�iV~�!0�T8B� dʐ�L�D��A��$�D;U ��L�j�.*�Z��uJ�@@G� �8y��L�,5��$N�"�� 2�N�^冊(7M3q��66��q��2��[q�(��Y䭸��|�D�ˈ.�5i[��l��zw�\��k�1�{�$бz6=� �L5A|��pF�� Aˋ��̯��?q��j$��H}�S��kU�l��a��x\BV��C��'@³Ս�&��!BS)��$2�@x�-��R�s�q��X��(��:X��sx�� a:A��2��*R��W�]9�/o$�U\�ʰ�5�%��h�0,6�1��+k�f��r�O�V��D��g$�܏r)��hB3p�X�H��-/� d��u-s\��ʾe808R R�q��m��G�BDO�ni ��[� �-"I��v>��Oa��$-��>�*N�J;3)�mkc��`԰ur��Ԏ4�i(>~Xy,!H��d b3��A��Q~��)�p�=�@��(V&�y�bp�!N�ʋalF�#�"��w�ǟ�3�R�і~�� M�G��PPQT�Je-x��(lKe�}ʁ��x�7�i<7�i<��ޖD��VC�tI��-}�ӋI9Jzb�Vi5(��;�� ?0��3�(2��,:}m�#PU�%�w��u��mz�+s� �}��M�襾��*[Sц�žT��W�햠��%�kqg�]XC�/�Ү�W.r:�ɜ ��0�̴��b�7�z�C��&hN�`��ʗc�|��+5�P�i��2�ˀ�~ �Ta ��D]k��E[uiX]�D��l��"F�$'�9��e$㉾� } -�OF0Q,��|��>4�C{ *[��+5��ף� �&��A5�QA��{6ߡ*�J�m��$��Z�;0�٣�0q��$��z�oYس@;�G6\e{�F��ũ��#�|?$t��hUL=�P�4�n��E�1D�o��M�{�0�OոbHo@#U��}E=�Aa�Z ��(�fZ�Q�=�w>-��<�e��, Ct��D}I��l�Y-�SϤ�J�H��5��E�Zj��T%��EN�6��>�\�g��`�O��tĈ��f�;��r�VGF��L�GU�\��d�S��y��_oЮ��u�/h/�u�T�6�B`$��5�%9��MV��?Z�W8v�ȭؐy0nR��b��`x�j&5�d��uf�{��"�5^��Cv)�Q&�њ�Մ��e��q��R��g*)�0g��q�S`t�#��TK E�>34��313�EZS�_:��7o��*i��PT��x�s�Bdp��t]�C;�2�rjI�MY�@\�!v]8\aҮ�HuA��4yG��`#`07P�٘E|ф�AaB.a�5F�wf�6^�p($k[��y#R�G��sa}��g�yq��x�w�j�ɤ?j��%{�&`��I�׋�٫��˽�˽��+��SP�㋾髾��˾��+��K��k��˿��,�L�l�� ,�L�l��p=�)b�D>�X�E�~FL煭y�9�%$ܰB�L��F~� �B-k�5~T��b-��-�[-�{Hs�GU�B��J�K��8w�Z:�A�9@Q��FFK��GV�+�T�4 9e�eS3}`X�K&RD\\k�:A�C.�I��GЁxm��(��Xc�֡K�P�8��P�8�}9��8*��{e��d��;��\r-��"R��W EB�G�ϚQ$��s��Er�n��?��+�:p�\B�#�!'��4�l��-Dv�ZV�vP��K��,a��'m*M�|q��)"C9��!��M1��~'��S�U��s-v�T��V�K��:�h��7K!CĽ��4��7��z�(&�-1�ꪃ��,G��i�.gx!kE@a1a`��̦�˝)��^�*��{<;?��e�fӂ�|@��&��y�D�<^�@lCݑA�A�3��~86.�U'SG�{��Uܤ��|��,5S��8�*F�I,�4��FN��X��X6��V�eF0��C��5�0P�Y�8�X(�/��,�}C,t�Y��gb~Fx�{z�獜-2��Z��A`�Fa}��p��ӹ�x��LB9�.�2)��i9rռ�W�9MʂR� ��K.K�� \٩c"T�&TVՕV;�A5��9}6�+p��+'w%J��p*]�gmnW��mk �` �@7"��q��S�� i��Ky-4�֢�4]�v��OC�{�i�L�OˉE1�x,R��B%t�T��:�F��x��k}X�,��)(��_rؼ��G��:r��eET�]� e�/L �\�e��M��%"2 W