-->
Previous Table of Contents Next


Finally, we can impose some formatting on the output lines themselves. In an earlier example, you saw the use of “\n” to add a newline character. These are called escape codes, because the backslash is interpreted by gawk to mean something different than a backslash. Table 25.5 shows the important escape codes that gawk supports.

Table 25.5. Escape codes.

Code Description

\a Bell
\b Backspace
\f Formfeed
\n Newline
\r Carriage return
\t Tab
\v Vertical tab
\ooo Octal character ooo
\xdd Hexadecimal character dd
\c Any character c

You can, for example, escape a quotation mark by using the sequence \”, which places a quotation mark in the string without interpreting it to mean something special:


{printf “I said \”Hello\” and he said “\Hello\”.”

Awkward-looking, perhaps, but necessary to avoid problems. You’ll see lots more escape characters used in examples later in this chapter.

Changing Field Separators

As I mentioned earlier, the default field separator is always a whitespace character (spaces or tabs). This is often not convenient, as we found with the /etc/passwd file. You can change the field separator on the gawk command line by using the -F option followed by the separator you want to use:


gawk -F”:” ’/tparker/{print}’ /etc/passwd

This command changes the field separator to a colon and searches the etc/passwd file for the lines containing the string tparker. The new field separator is put in quotation marks to avoid any confusion. Also, the -F option (it must be a capital F) is before the first quotation mark enclosing the pattern-action pair. If it comes after, it won’t be applied.

Metacharacters

Earlier I mentioned that gawk is particular about its pattern-matching habits. The string cat matches anything with the three letters on the line. Sometimes you want to be more exact in the matching. If you only want to match the word “cat” but not “concatenate,” put spaces on each side of the pattern:


/ cat / {print}

What about matching different cases? That’s where the or instruction, represented by a vertical bar, comes in.


/ cat | CAT / {print}

The preceding pattern will match “cat” or “CAT” on a line. However, what about “Cat”? That’s where we also need to specify options within a pattern. With gawk, we use square brackets for this. To match any combination of “cat” in upper- or lowercase, write the pattern like this:


/ [Cc][Aa][Tt] / {print}

This can get pretty awkward, but it’s seldom necessary. To match just “Cat” and “cat,” for example, use the following pattern:


/ [Cc]at / {print}

A useful matching operator is the tilde (~). This is used when you want to look for a match in a particular field in a record. Consider the following example:


$5 ~ /tparker/

This pattern matches any records where the fifth field is tparker. It is similar to the == operator. The matching operator can be negated, so


$5 !~ /tparker/

This pattern finds any record where the fifth field is not equal to tparker.

A few characters (called metacharacters) have special meaning to gawk. Many of these metacharacters are familiar to shell users because they are carried over from UNIX shells. The metacharacters shown in Table 25.6 can be used in gawk patterns.

Table 25.6. Metacharacters.

Metacharacter Meaning Example Meaning of Example

~ The beginning of the field $3 ~ /^b/ Matches if the third field starts with b
$ The end of the field $3 ~ /b$/ Matches if the third field ends with b
. Matches any single character $3 ~ /i.m/ Matches any record that has a third field value of i, another character, and then m
| Or. /cat|CAT/ Matches cat or CAT
* Zero or more repetitions of a character /UNI*X/ Matches UNX, UNIX, UNIIX, UNIIIX, and so on
+ One or more repetitions of a character /UNI+X/ Matches UNIX, UNIIX, and so on, but not UNX
\{a,b\} The number of repetitions between a and b (both integers) /UNI\{1,3\}X Matches only UNIX, UNIIX, and UNIIIX
? Zero or one repetition of a string /UNI?X/ Matches UNX and UNIX only
[] Range of characters /I[BDG]M/ Matches IBM, IDM, and IGM
[^] Not in the set /I[^DE]M/ Matches all three character sets starting with I and ending in M, except IDM and IEM

Some of these metacharacters are used frequently. You will see some examples later in this chapter.


Previous Table of Contents Next