-->
Previous Table of Contents Next


Gawk

Developed by three Bell Labs researchers (Alfred Aho, Peter Weinberger, and Brian Kernighan—hence the acronym awk), awk is a programming language (with some strong similarities to the C programming language, discussed earlier in this chapter) but is used in much the same manner as other UNIX scripting tools. Hence its inclusion in this chapter.

Technically speaking, awk doesn’t ship with Linux; instead, the GNU version, gawk, ships with Linux. (By now you shouldn’t be surprised that Linux features software from the GNU Project.) Because gawk is virtually identical to other implementations of awk (there are a few extensions to awk in gawk, but you can ignore them if you choose), most users with experience with awk will have no problems with awk.

Gawk’s primary value is in the manipulation of structured text files, where information is stored in columnar form and is separated by consistent characters (such as tabs or spaces). Gawk takes these structured files and manipulates them through editing, sorting, and searching.

Let’s use a data file named workers as an example:


     Eric     286   555-6674   erc       8

     Geisha   280   555-4221   geisha   10

     Kevin    279   555-1112   kevin     2

     Tom      284   555-2121   spike    12

Let’s sink into the trap of abstraction for a minute and compare our example file output to a two-dimensional graph. Each row across is called a record, which in turn is made up of vertical fields or columns, almost like a database. Gawk allows us to manipulate the data in the file by either row or column. Using the gawk command is not a complicated process. The structure of the gawk command looks like:


     $ gawk [option] ‘pattern {action}'

(The only options available with gawk are -F, which allows you to specify a field separator other than the default of white space; -f, which allows you to specify a filename full of gawk commands instead of placing a complex pattern and action on the Linux command line, and -W, which runs gawk in total compatibility with awk.) Here we should define our terms. A pattern can be an ASCII string (which gawk treats numerically; instead of seeing the character e as an e, it sees it as the ASCII equivalent), a numeral, a combination of numerals, or a wildcard, while action refers to an instruction we provide. Essentially, gawk works by having us tell it to search for a particular pattern; when it has found that pattern, gawk is to do something with it, such as printing the pattern to another file.

The simplest gawk program merely prints out all lines in the file:


     gilbert:/$ gawk '{ print }' workers

     Eric    286     555-6674        erc     8

     Geisha  280     555-4221        geisha  10

     Kevin   279     555-1112        kevin   2

     Tom     284     555-2121        spike   12

Continuing our example, let’s say we wanted to pull all records that began with the string geisha. We’d use the following:


     gilbert:/$ gawk '$1 ~ /Geisha/ {print $0}' workers

Here’s what the command means, part by part:

  $1: Tells gawk to use the first column for the basis of further action. gawk will perform some action on a file based on either records or fields; a number beginning with a $ tells gawk to work on a specific field. In this case, $1 refers to the first field.
  ~: Tells gawk to match the following string.
  /Geisha/: The string to search for.
  {print $0}: Tells gawk to print out the entire record containing the matched string. A special use of the $ sign is with the character 0, which tells gawk to use all the fields possible.
  workers: The file to use.

In our case, gawk would print the following to the screen:


     Geisha   280   555-4221   geisha   10

Not every action has to be the result of matching a specific pattern, of course. In gawk, the tilde (~) acts as a relational operator, which sets forth a condition for gawk to use. There are a number of other relational operators available to gawk users that allow gawk to compare two patterns. (The relational operators are based on algebraic notation.) Gawk supports the same relational operators found in the C programming language; they are listed in Table 10.7.

Table 10.7 Gawk Relational Operators

Operator Meaning Usage
< Less than $1 < "Eric" returns every pattern with an ASCII value less than “Eric”.
<= Less than or equal to $1 <= "Eric".
== Equals $1 == "Eric" returns every instance of “Eric”.
!= Does not equal $1 != "Eric" returns every field not containing the string “Eric”.
>= Greater than or equal to $1 >= "Eric" returns every field equal to or greater than “Eric”.
> Greater than $1 > "Eric" returns every field greater than “Eric.”


Previous Table of Contents Next