-->
Page 574
To append data to an existing file, you use the following:
printf ("hello world\n") >> "datafile"
In addition to redirecting your output to a file, you can send the output from your program to act as input for another command. You can code something like the following:
printf ("hello world\n") | "sort -t`,`"
Any other output statements that pipe data into the same command will specify exactly the same command after the pipe character (|) because that is how awk keeps track of which command is receiving which output from your program.
Whenever you send output to a file or pipe, you should close it when you are done processing the data. There is a maximum number of open files allowed to awk that varies with operating system version or individual account configuration (a pipe counts as a file). By closing files when you are done with them, you reduce the chances of hitting the limit.
The syntax to close a file is simply
close ("filename")
where filename is the one specified on the output statement (which can also be stdout, a variable that contains the filename, or the exact command used with a pipe).
In addition to the built-in functions (like gsub or srand), gawk allows you to write your own. User-defined functions are a means of creating a block of code that is accessed in multiple places in your code. They can also be used to build a library of commonly used routines so you do not have to recode the same algorithms repeatedly.
User-defined functions are not a part of the original awkthey were added to nawk and are supported by gawk.
There are two parts to using a function: the definition and the call. The function definition contains the code to be executed (the function itself) and the call temporarily transfers from the main code to the function. There are two ways that command execution is transferred back to the main code: implicit and explicit returns. When gawk reaches the end of a function (the close curly brace [}]), it automatically (implicitly) returns control to the calling routine. If you want to leave your function before the bottom, you can explicitly use the return statement to exit early.
Page 575
The general form of a gawk function definition looks like the following:
function functionname(parameter list) { the function body }
You code your function just as if it were any other set of action statements and can place it anywhere you would put a pattern/action set. If you think about it, the function functionname(parameter list) portion of the definition could be considered a pattern and the function body the action.
NOTE |
gawk supports another form of function definition where the function keyword is abbreviated to func. The remaining syntax is the same: func functionname(parameter list) { the function body } |
Listing 27.5 shows the defining and calling of a function.
Listing 27.5. Defining and calling functions.
BEGIN { print_header() } function print_header( ) { printf("This is the header\n"); printf("this is a second line of the header\n"); } This is the header this is a second line of the header
The code inside the function is executed only oncewhen the function is called from within the BEGIN action. This function uses the implicit return method.
CAUTION |
When working with user-defined functions, you must place the parentheses that contain the parameter list immediately after the function name when calling that function. When you use the built-in functions, this is not a requirement. |
Page 576
Like C, gawk passes parameters to functions by value. In other words, a copy of the original value is made and that copy is passed to the called function. The original is untouched, even if the function changes the value.
Any parameters are listed in the function definition separated by commas. If you have no parameters, you can leave the parameter list (contained in the parentheses) empty.
Listing 27.6 is an expanded version of Listing 27.5; it shows the pass-by-value nature of gawk function parameters.
Listing 27.6. Passing parameters.
BEGIN { pageno = 0; print_header(pageno); printf("the page number is now %d\n", pageno); } function print_header(page ) { page++; printf("This is the header for page %d\n", page); printf("this is a second line of the header\n"); } This is the header for page 1 this is a second line of the header the page number is now 0
The page number is initialized before the first call to the print_header function and incremented in the function. But when it is printed after the function call, it remains at the original value.
CAUTION |
gawk does not perform parameter validation. When you call a function, you can list more or fewer parameters than the function expects. Any extra parameters are ignored, and any missing ones default to zero or empty strings (depending on how they are used). |
TIP |
You can take advantage of the lack of function parameter validation. It can be used to create local variables within the called functionjust list more variables in the function definition than you use in the function call. I strongly suggest that you comment the fact that the extra parameters are really being used as local variables. |
Page 577
There are several ways that a called function can change variables in the calling routinesthrough explicit return or by using the variables in the calling routine directly. (These variables are normally global anyway.)
If you want to return a value or leave a function early, you need to code a return statement. If you don't code one, the function will end with the close curly brace (}). Personally, I prefer to code them at the bottom.
If the calling code expects a returned value from your function, you must code the return statement in the following form:
return variable
Expanding on Listing 27.6 to let the function change the page number, Listing 27.7 shows the use of the return statement.
Listing 27.7. Returning values.
BEGIN { pageno = 0; pageno = print_header(pageno); printf("the page number is now %d\n", pageno); } function print_header(page ) { page++; printf("This is the header for page %d\n", page); printf("this is a second line of the header\n"); return page; } This is the header for page 1 this is a second line of the header the page number is now 1
The updated page number is returned to the code that called the function.
NOTE |
The return statement allows you to return only one value back to the calling routine. |
Generating a report in awk entails a sequence of steps, with each step producing the input for the next step. Report writing is usually a three-step process: Pick the data, sort the data, and make the output pretty.