-->
Previous Table of Contents Next


Strings and Numbers

If you’ve used any other programming language, these concepts will be familiar to you. If you are new to programming, you will probably find them obvious, but it’s surprising how many people get things hopelessly muddled by using strings when they should have used numbers.

A string is a set of characters that are to be interpreted literally by gawk. Strings are surrounded by quotation marks. Numbers are not surrounded by quotation marks and are treated as real values:


gawk ’$1 != “Tim” {print}’ testfile

This command prints any line in testfile that doesn’t have the word Tim in the first column. If we had left out the quotation marks around Tim, gawk wouldn’t have processed the command properly. The following command displays any line that has the string 50 in it:


gawk ’$1 == “50” {print}’ testfile

It does not attempt to see if the value stored in the first column is different than 50; it just does a character check. The string 50 is not equal to the number 50 as far as gawk is concerned.

Formatting Output

We’ve seen how to do simple actions in the commands we’ve already discussed, but you can do several things in an action:


gawk ’$1 != “Tim” {print $1, $5, $6, $2}’ testfile

The preceding command prints the first, fifth, sixth, and second columns of testfile for every line that doesn’t have the first column equal to “Tim”. You can place as many of these columns as you want in a print command.

Indeed, you can place strings in a print command, too:


gawk ’$1 != “Tim” {print “The entry for “, $1, “is not Tim. “, $2}’

testfile

This command prints the strings and the columns as shown. Each section of the print command is separated by a comma. There are also spaces at the ends of the strings to ensure there is a space between the string and the value of the column that is printed.

You can use additional formatting instructions to make gawk format the output properly. These instructions are borrowed from the C language, and they use the command printf (print formatted) instead of print.

The printf command uses a placeholder scheme, but the gawk language knows how to format the entry because of the placeholder and looks later in the command line to find out what to insert there. An example helps clarify this :


{printf “%5s likes this language\n”, $2}

The %5s part of the line instructs gawk how to format the string, in this case using five string characters. The value to place in this position is given at the end of the line as the second column. The \n at the end of the quoted section is a newline character. If the second column of a four-line file holds names, printf formats the output like this:


  Tim likes this language

Geoff likes this language

 Mike likes this language

  Joe likes this language

Notice that the “%5s” format means to right-justify the column entry. This prevents awkward spacing.

The gawk language supports several format placeholders. They are shown in Table 25.4.

Table 25.4. Format placeholders.

Placeholder Description

c If a string, the first character of the string; if an integer, the character that matches the first value
d An integer
e A floating-point number in scientific notation
f A floating-point number in conventional notation
g A floating-point number in either scientific or conventional notation, whichever is shorter
o An unsigned integer in octal format
s A string
x An unsigned integer in hexadecimal format

Whenever you use one of the format characters, you can place a number before the character to show how many digits or characters are to be used. Therefore, the format “6d” would have six digits of an integer. Many formats can be on a line, but each must have a value at the end of the line, as in this example:


{printf “%5s works for %5s and earns %2d an hour”, $1, $2, $3}

Here, the first string is the first column, the second string is the second column, and the third set of digits is from the third column in a file. The output looks something like this:


Joe works for Mike and earns 12 an hour

A few little tricks are useful. Consider the following command:


{printf “%5s likes this language\n”, $2}

As shown in an earlier example, strings are right-justified, so this command results in the following output:


  Tim likes this language

Geoff likes this language

 Mike likes this language

  Joe likes this language

To left-justify the names, place a minus sign in the format statement:


{printf “%-5s likes this language\n”, $2}

This results in the following output:


Tim   likes this language

Geoff likes this language

Mike  likes this language

Joe   likes this language

Notice that the name is justified on the left instead of on the right.

When dealing with numbers, you can specify the precision to be used:


{printf “%5s earns $%.2f an hour”, $3, $6}

The preceding command uses the string in column three and puts five characters from it in the first placeholder, and then takes the value in the sixth column and places it in the second placeholder with two digits after the decimal point. The output of the command looks like this:


Joe earns $12.17 an hour

The dollar sign is inside the quotation marks in the printf command and is not generated by the system. It has no special meaning inside the quotation marks. If you want to limit the number of digits to the right of the period, you can do that, too:


{printf “%5s earns $%6.2f an hour”, $3, $6}

This command puts six digits before the period and two after.


Previous Table of Contents Next