-->

Previous | Table of Contents | Next

Page 564

Multidimension Arrays

Although awk doesn't directly support multidimension arrays, it does provide a facility to simulate them. The distinction is fairly trivial to you as a programmer. You can specify multiple dimensions in the subscript (within the square brackets) in a form familiar to C programmers:

array[5, 3] = "Mary"

This is stored in a single-dimension array with the subscript actually stored in the form 5 SUBSEP 3. The predefined variable SUBSEP contains the value of the separator of the subscript components. It defaults to the double quote (" or \034) because it is unlikely that the double quote will appear in the subscript itself. Remember that the double quotes are used to contain a string; they are not stored as part of the string itself. You can always change SUBSEP if you need to have the double quote character in your multidimension array subscript.

If you want to calculate total sales by city and state (or country), you will use a two-dimension array:

total_sales["Philadelphia", "Pennsylvania"] = 10.15

You can use the in function within a conditional:

("Wilmington", "Delaware") in total_sales

You can also use the in function within a loop to step through the various cities.

Built-in Numeric Functions

gawk provides a number of numeric functions to calculate special values.

Table 27.7 summarizes the built-in numeric functions in gawk. Earlier versions of awk don't support all these functions.

Table 27.7. gawk built-in numeric functions.

Function Purpose
atan2(x, y) Returns the arctangent of y/x in radians
cos(x) Returns the cosine of x in radians
exp(x) Returns e raised to the x power
int(x) Returns the value of x truncated to an integer
log(x) Returns the natural log of x
rand() Returns a random number between 0 and 1
sin(x) Returns the sine of x in radians
sqrt(x) Returns the square root of x

Page 565

Function Purpose
srand(x) Initializes (seeds) the random number generator; systime()
is used if x is omitted
systime() Returns the current time in seconds since midnight, January 1, 1970

Arithmetic Operators

gawk supports a wide variety of math operations. Table 27.8 summarizes these operators.

Table 27.8. gawk arithmetic operators.

Operator Purpose
x^y Raises x to the y power
x**y Raises x to the y power (same as x^y)
x%y Calculates the remainder of x/y
x+y Adds x to y
x-y Subtracts y from x
x*y Multiplies x times y
x/y Divides x by y
-y Negates y (switches the sign of y); also known as the unary minus
++y Increments y by 1 and uses value (prefix increment)
y++ Uses value of y and then increments by 1 (postfix increment)
--y Decrements y by 1 and uses value (prefix decrement)
y-- Uses value of y and then decrements by 1 (postfix decrement)
x=y Assigns value of y to x. gawk also supports operator-
assignment operators (+=, -=, *=, /=, %=, ^=, and **=)
NOTE
All math in gawk uses floating point (even if you treat the number as an integer).

Conditional Flow

By its very nature, an action within a gawk program is conditional. It is executed if its pattern is true. You can also have conditional programs flow within the action through the use of an if statement.

Page 566

The general flow of an if statement is as follows:

if (condition)

   statement to execute when true

else

   statement to execute when false

condition can be any valid combination of patterns shown in Tables 27.2 and 27.3. else is optional. If you have more than one statement to execute, you need to enclose the statements within curly braces ({ }), just as in the C syntax.

You can also stack if and else statements as necessary:

if ("Pennsylvania" in total_sales)

   print "We have Pennsylvania data"

else if ("Delaware" in total_sales)

   print "We have Delaware data"

else if (current_year < 2010)

   print "Uranus is still a planet"

else

   print "none of the conditions were met."

The Null Statement

By definition, if requires one (or more) statements to execute; in some cases, the logic might be straightforward when coded so that the code you want executed occurs when the condition is false. I have used this when it would be difficult or ugly to reverse the logic to execute the code when the condition is true.

The solution to this problem is easy: Just use the null statement, the semicolon (;). The null statement satisfies the syntax requirement that if requires statements to execute; it just does nothing.

Your code will look something like the following:

if (($1 <= 5 && $2 > 3) || ($1 > 7 && $2 < 2))

   ;        # The Null Statement

else

   the code I really want to execute

The Conditional Operator

gawk has one operator that actually has three parameters: the conditional operator. This operator allows you to apply an if-test anywhere in your code.

The general format of the conditional statement is as follows:

condition ? true-result : false-result

While this might seem like duplication of the if statement, it can make your code easier to read. If you have a data file that consists of an employee name and the number of sick days taken, you can use the following:

{ print $1, "has taken", $2, "day" $2 != 1 ? "s" : "", "of sick time" }

Page 567

This prints day if the employee only took one day of sick time and prints days if the employee took zero or more than one day of sick time. The resulting sentence is more readable. To code the same example using an if statement would be more complex and look like the following:

if ($2 != 1)

   print $1, "has taken", $2, "days of sick time"

else

   print $1, "has taken", $2, "day of sick time"

Looping

By their very nature, awk programs are one big loop—reading each record in the input file and processing the appropriate patterns and actions. Within an action, the need for repetition often occurs. awk supports loops through the do, for, and while statements that are similar to those found in C.

As with the if statement, if you want to execute multiple statements within a loop, you must contain them in curly braces.

TIP
Forgetting the curly braces around multiple statements is a common programming error with conditional and looping statements.

The do Statement

The do statement (sometimes referred to as the do while statement) provides a looping construct that will be executed at least once. The condition or test occurs after the contents of the loop have been executed.

The do statement takes the following form:

do

   statement

while (condition)

statement can be one statement or multiple statements enclosed in curly braces. condition is any valid test like those used with the if statement or the pattern used to trigger actions.

In general, you must change the value of the variable in the condition within the loop. If you don't, you will have a loop forever condition because the test result (condition) would never change (and become false).

Loop Control

You can exit a loop early if you need to (without assigning some bogus value to the variable in the condition). awk provides two facilities to do this: break and continue.

Previous | Table of Contents | Next