-->
Previous | Table of Contents | Next |
Developed by three Bell Labs researchers (Alfred Aho, Peter Weinberger, and Brian Kernighanhence the acronym awk), awk is a programming language (with some strong similarities to the C programming language, discussed earlier in this chapter) but is used in much the same manner as other UNIX scripting tools. Hence its inclusion in this chapter.
Technically speaking, awk doesnt ship with Linux; instead, the GNU version, gawk, ships with Linux. (By now you shouldnt be surprised that Linux features software from the GNU Project.) Because gawk is virtually identical to other implementations of awk (there are a few extensions to awk in gawk, but you can ignore them if you choose), most users with experience with awk will have no problems with awk.
Gawks primary value is in the manipulation of structured text files, where information is stored in columnar form and is separated by consistent characters (such as tabs or spaces). Gawk takes these structured files and manipulates them through editing, sorting, and searching.
Lets use a data file named workers as an example:
Eric 286 555-6674 erc 8 Geisha 280 555-4221 geisha 10 Kevin 279 555-1112 kevin 2 Tom 284 555-2121 spike 12
Lets sink into the trap of abstraction for a minute and compare our example file output to a two-dimensional graph. Each row across is called a record, which in turn is made up of vertical fields or columns, almost like a database. Gawk allows us to manipulate the data in the file by either row or column. Using the gawk command is not a complicated process. The structure of the gawk command looks like:
$ gawk [option] pattern {action}'
(The only options available with gawk are -F, which allows you to specify a field separator other than the default of white space; -f, which allows you to specify a filename full of gawk commands instead of placing a complex pattern and action on the Linux command line, and -W, which runs gawk in total compatibility with awk.) Here we should define our terms. A pattern can be an ASCII string (which gawk treats numerically; instead of seeing the character e as an e, it sees it as the ASCII equivalent), a numeral, a combination of numerals, or a wildcard, while action refers to an instruction we provide. Essentially, gawk works by having us tell it to search for a particular pattern; when it has found that pattern, gawk is to do something with it, such as printing the pattern to another file.
The simplest gawk program merely prints out all lines in the file:
gilbert:/$ gawk '{ print }' workers Eric 286 555-6674 erc 8 Geisha 280 555-4221 geisha 10 Kevin 279 555-1112 kevin 2 Tom 284 555-2121 spike 12
Continuing our example, lets say we wanted to pull all records that began with the string geisha. Wed use the following:
gilbert:/$ gawk '$1 ~ /Geisha/ {print $0}' workers
Heres what the command means, part by part:
In our case, gawk would print the following to the screen:
Geisha 280 555-4221 geisha 10
Not every action has to be the result of matching a specific pattern, of course. In gawk, the tilde (~) acts as a relational operator, which sets forth a condition for gawk to use. There are a number of other relational operators available to gawk users that allow gawk to compare two patterns. (The relational operators are based on algebraic notation.) Gawk supports the same relational operators found in the C programming language; they are listed in Table 10.7.
Operator | Meaning | Usage |
---|---|---|
< | Less than | $1 < "Eric" returns every pattern with an ASCII value less than Eric. |
<= | Less than or equal to | $1 <= "Eric". |
== | Equals | $1 == "Eric" returns every instance of Eric. |
!= | Does not equal | $1 != "Eric" returns every field not containing the string Eric. |
>= | Greater than or equal to | $1 >= "Eric" returns every field equal to or greater than Eric. |
> | Greater than | $1 > "Eric" returns every field greater than Eric. |
Previous | Table of Contents | Next |