-->
Previous | Table of Contents | Next |
To make sense of the telephone directory the way we want to handle it, we have to find another way of structuring the data so that there is a field separator between the sections. For example, the following uses the slash character as the field separator:
Smith/John/13 Wilson St./555-1283 Smith/John/2736 Artside Dr, Apt 123/555-2736 Smith/John/125 Westmount Cr/555-1728
By default, gawk uses blank characters (spaces or tabs) as field separators unless instructed to use another character. If gawk is using spaces, it doesnt matter how many are in a row; they are treated as a single block for purposes of finding fields. Naturally, there is a way to override this behavior, too.
The gawk language has a particular format for almost all instructions. Each command is composed of two parts: a pattern and a corresponding action. Whenever the pattern is matched, gawk executes the action that matches that pattern.
Pattern-action pairs can be thought of in more common terms to show how they work. Consider instructing someone how to get to the post office. You might say, Go to the end of the street and turn right. At the stop sign, turn left. At the end of the street, go right. You have created three pattern-action pairs with these instructions:
end of street: turn right stop sign: turn left end of street: turn right
When these patterns are met, the corresponding action is taken. You wouldnt turn right before you reached the end of the street, and you dont turn right until you get to the end of the street, so the pattern must be matched precisely for the action to be performed. This is a bit simplistic, but it gives you the basic idea.
With gawk, the patterns to be matched are enclosed in a pair of slashes, and the actions are in a pair of braces:
/pattern1/{action1} /pattern2/{action2} /pattern3/{action3}
This format makes it quite easy to tell where the pattern starts and ends, and when the action starts and ends. All gawk programs are sets of these pattern-action pairs, one after the other. Remember these pattern-action pairs are working on text files, so a typical set of patterns might be matching a set of strings, and the actions might be to print out parts of the line that matched.
Suppose there isnt a pattern? In that case, the pattern matches every time and the action is executed every time. If there is no action, gawk copies the entire line that matched without change.
Consider the following example:
gawk /tparker/ /etc/passwd
The gawk command looks for each line in the /etc/passwd file that contains the pattern tparker and displays it (there is no action, only a pattern). The output from the command is the one line in the /etc/passwd file that contains the string tparker. If there is more than one line in the file with that pattern, they all are displayed. In this case, gawk is acting exactly like the grep utility!
This example shows you two important things about gawk: It can be invoked from the command line by giving it the pattern-action pair to work with and a filename, and it likes to have single quotes around the pattern-action pair in order to differentiate them from the filename.
The gawk language is literal in its matching. The string cat will match any lines with cat in them, whether the word cat is by itself or part of another word such as concatenate. To be exact, insert spaces on each side of the word. Also, case is important. Well see how to expand the matching in the section Metacharacters a little later in the chapter.
Jumping ahead slightly, we can introduce a gawk command:
gawk {print $3} file2.data
The preceding command has only one action, so it performs that action on every line in the file file2.data. The action is print $3, which tells gawk to print the third field of every line. The default field separator, a space, is used to tell where fields begin and end. If we try the same command on the /etc/passwd file, nothing displays because the field separator used in that file is the colon.
We can combine the two commands to show a complete pattern-action pair:
gawk /UNIX/{print $2} file2.data
Tip:
The quotation marks around the entire pattern-action pair are very important and should not be left off. Without them, the command might not execute properly. Make sure the quotation marks match (dont use a single quotation mark at the beginning and a double quotation mark at the end).
This command searches file2.data line by line, looking for the string UNIX. If it finds UNIX, it prints the second column of that line (record).
You can combine more than one pattern-action pair in a command. For example, the command
gawk /scandal/{print $1} /rumor/{print $2} gossip_file
scans gossip_file for all occurrences of the pattern scandal and prints the first column, and then starts at the top again and searches for the pattern rumor and prints the second column. The scan starts at the top of the file each time there is a new pattern-action pair.
Previous | Table of Contents | Next |