-->

Previous | Table of Contents | Next

Page 165

When a string must be converted to a number, the conversion is accomplished using atof(3). A number is converted to a string by using the value of CONVFMT as a format string for sprintf(3), with the numeric value of the variable as the argument. However, even though all numbers in awk are floating-point, integral values are always converted as integers. Thus, given this:


CONVFMT = "%2.2f"

a =12

b =a""

the variable b has a string value of 12 and not 12.00.

gawk performs comparisons as follows: If two variables are numeric, they are compared numerically. If one value is numeric and the other has a string value that is a "numeric string," then comparisons are also done numerically. Otherwise, the numeric value is converted to a string and a string comparison is performed. Two strings are compared, of course, as strings. According to the standard, even if two strings are numeric strings, a numeric comparison is performed. However, this is clearly incorrect, and gawk does not do this.

Uninitialized variables have the numeric value 0 and the string value "" (the null, or empty, string).

PATTERNS AND ACTIONS

awk is a line-oriented language. The pattern comes first, and then the action. Action statements are enclosed in and .BR. Either the pattern may be missing, or the action may be missing, but, of course, not both. If the pattern is missing, the action will be executed for every single line of input. A missing action is equivalent to


{ print }

which prints the entire line.

Comments begin with the # character, and continue until the end of the line. Blank lines may be used to separate statements. Normally, a statement ends with a newline, however, this is not the case for lines ending in a ,, {, ?, :, &&, or ||. Lines ending in do or else also have their statements automatically continued on the following line. In other cases, a line can be continued by ending it with a \, in which case the newline will be ignored.

Multiple statements may be put on one line by separating them with a semicolon. This applies to both the statements within the action part of a pattern-action pair (the usual case), and to the pattern-action statements themselves.

PATTERNS

awk patterns may be one of the following:


BEGIN

END

/regular expression/

relational expression

pattern && pattern

pattern jj pattern

pattern ? pattern : pattern

(pattern)

! pattern

pattern1, pattern2

BEGIN and END are two special kinds of patterns that are not tested against the input. The action parts of all BEGIN patterns are merged as if all the statements had been written in a single BEGIN block. They are executed before any of the input is read. Similarly, all the END blocks are merged, and executed when all the input is exhausted (or when an exit statement is executed). BEGIN and END patterns cannot be combined with other patterns in pattern expressions. BEGIN and END patterns cannot have missing action parts.

For /regular expression/ patterns, the associated statement is executed for each input line that matches the regular expression. Regular expressions are the same as those in egrep(1), and are summarized as follows:

Page 166

A relational expression may use any of the operators defined later in the section on actions. These generally test whether certain fields match certain regular expressions.

The &&, ||, and ! operators are logical AND, logical OR, and logical NOT, respectively, as in C. They do short-circuit evaluation, also as in C, and are used for combining more primitive pattern expressions. As in most languages, parentheses may be used to change the order of evaluation.

The ?: operator is like the same operator in C. If the first pattern is true, then the pattern used for testing is the second pattern; otherwise, it is the third. Only one of the second and third patterns is evaluated.

The pattern1, pattern2 form of an expression is called a range pattern. It matches all input records starting with a line that matches pattern1, and continuing until a record that matches pattern2, inclusive. It does not combine with any other sort of pattern expression.

REGULAR EXPRESSIONS

Regular expressions are the extended kind found in egrep. They are composed of characters as follows:

c Matches the non-meta-character c.
\c Matches the literal character c.
. Matches any character except newline.
^ Matches the beginning of a line or a string.
$ Matches the end of a line or a string.
[abc...] Character class, matches any of the characters abc....
[^abc...] Negated character class, matches any character except abc... and newline.
r1|r2 Alternation: matches either r1 or r2.
r1r2 Concatenation: matches r1, and then r2.
r+ Matches one or more rs.
r* Matches zero or more rs.
r? Matches zero or one rs.
(r) Grouping: matches r.

The escape sequences that are valid in string constants are also legal in regular expressions.

ACTIONS

Action statements are enclosed in braces, { and }. Action statements consist of the usual assignment, conditional, and looping statements found in most languages. The operators, control statements, and input/output statements available are patterned after those in C.

OPERATORS

The operators in awk, in order of increasing precedence, are

=+=_= *= /= %= ^= Assignment. Both absolute assignment (var = value) and operator-assignment (the other forms) are supported.
?: The C conditional expression. This has the form expr1 ? expr2 : expr3 .If expr1 is true, the value of the expression is expr2; otherwise, it is expr3. Only one of expr2 and expr3 is evaluated.
|| Logical OR.
&& Logical AND.
~!~ Regular expression match, negated match. NOTE: Do not use a constant regular expression (/foo/) to the left of a ~ or !~. Only use one on the right side. The expression /foo/ ~ exp has the same meaning as (($0 ~ /foo/) ~ exp). This is usually not what was intended.
< >, <=>= The regular relational operators.
blank String concatenation.
+_ Addition and subtraction.

Previous | Table of Contents | Next