Chapter 19

Generating Reports


CONTENTS


This chapter presents techniques for generating reports with Perl using the built-in format specifiers. Perl stands for Practical Extraction Report Language; this chapter covers how to extract information in the form of reports using Perl. (The name of Perl is also quoted as "Pathologically Eclectic Rubbish Lister," in Larry Wall's book, Programming Perl.)

Formatted Output

Perl has excellent report-generation capabilities. You have already used the print and printf statements to write out text. Perl also has the capability to print out reports using formats. By using formats, you can actually visualize how your output will look because the definition of a format in Perl is very similar to what you see on the output.

There are three steps that you must take to use formats with Perl:

  1. Define the format and the variables that apply to the format.
  2. Initialize the variables used in the format.
  3. Output to the file handle of the format with the write() function.

We cover these steps in detail throughout the rest of the chapter. Let's start off with a quick example of how generating reports works with a sample letter writing application. This example will give you a quick overview of what's entailed in using formats. After this example, we will cover specific details of how to use the format specification.

Up to now, we have covered printing only with the use of printf or print statements. It's a bit difficult to see what you are actually printing out when you read print statements and the ensuing double quotes around the variables being printed. What would be nice is if you could lay out the approximate structure of a page and basically insert placeholders for where you want the output to go.

For example, let's say that you want to create a simple letter that you want to mail to your customers whose names and addresses are stored in a text file. You would first type in a generic letter like the one shown here:

FIRSTNAME LASTNAME
ADDRESS

Dear FIRSTNAME:
I am pleased to announce the new whizbang needle sharpening tool.
Give me a call if you are interested.


Sincerely,
Ipik Freely.

In the letter above, the fields FIRSTNAME, LASTNAME, and ADDRESS are placeholders for the actual first name, last name, and complete address of each individual who will receive the letter. Now that you have your letter defined, you would write code, in Perl naturally, to print one letter for each record in your database. Each printed letter would have the FIRSTNAME, LASTNAME, and ADDRESS placeholders replaced with what is in an input record.

Listing 19.1 is a sample application which generates a listing from the letter shown above.


Listing 19.1. A simple report generator without using formats.
 1 #!/usr/bin/perl
 2 open (NAMES,"names") || die "Cannot open names $!";
 3 while (<NAMES>) {
 4    ($fname,$lname,@address) = split(':',$_);
 5    print "\n\n";
 6    print "$fname $lname \n";
 7    print "$address \n\n";
 8    print "I am very pleased to announce the new TOOTHPX3000 now with a\n";
 9    print "rechargeable battery and direct 110-220 Volt adapter. Give me\n";
10    print "a call for a demo.\n" ;
11    print "\n";
12    print "Sincerely, \n";
13    print "Ipik Freely,\n";
14     print "Prezdet.\n\f";
15}
16 close NAMES;

The data file for this is simple and is

Big:Wolf:1 Tree Lane
Wise:Pig :333 Brick house
NotSoWise:Pig:666 Straw House

There is one record per line in the file. Each field in every record is delimited by a colon (:) just like fields in the /etc/passwd file. The first field in the line is the first name; the second field is the last name; and the rest of the line is the address of the individual. This is a contrived example for this chapter, so your own database would be different. For example, the address could be split into street address, city, and state. Or you could have a phone number as the third field. For this example, let's take the first name, last name, and address values into three variables, $fname, $lname, and $address at line 4 in Listing 19.1. Then you will print out these lines. The output is as follows (with the page breaks called out as such):

Big Wolf
1 Tree Lane


I am very pleased to announce the new TOOTHPX3000 now with a
rechargeable battery and direct 110-220 Volt adapter. Give me
a call for a demo.

Sincerely,
Ipik Freely,
Prezdet.
--page break--


Wise Pig
333 Brick house


I am very pleased to announce the new TOOTHPX3000 now with a rechargeable battery and direct 110-220 Volt adapter. Give me a call for a demo.

Sincerely,
Ipik Freely,
Prezdet.
--page break--


NotSoWise Pig
666 Straw house


I am very pleased to announce the new TOOTHPX3000 now with a rechargeable battery and direct 110-220 Volt adapter. Give me a call for a demo.

Sincerely,
Ipik Freely,
Prezdet.
--page break--

In Listing 19.1, the lines of code to print the output are cumbersome to look at. Note how the final form of the letter to print out is not immediately apparent from examining the code in all the print statements. Also, typing the print statements is prone to error while typing. Let's see how the three steps to define and use a format could be applied here to print a more human-readable Perl program, as shown in Listing 19.2.


Listing 19.2. A simple report generator using formats.
 1 #!/usr/bin/perl
 2
 3 open (NAMES,"names") || die "Cannot open names $!";
 4
 5 $~ = NAME_FORMAT;
 6 while (<NAMES>) {
 7     ($fname,$lname,$address) = split(' ',$_);
 8     $name = $fname . " " . $lname;
 9     write;
10     }
11 close NAMES;
12
13 format NAME_FORMAT =
14 @<<<<<<<<<<<<<<<<<<<
15 $name
16 @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
17 $address
18
19 I am very pleased to announce the new TOOTHPX3000 now with a
20 rechargeable battery and direct 110-220 Volt adapter. Give me
21 a call for a demo.
22
23 Sincerely,
24 Ipik Freely,
25 Prezdet.
26 .

The input file is opened in line 3. Each record is split into the first name, last name, and the address fields in line 7. A full name is created in line 8. The entry is written to STDOUT in line 9. The input file is closed in line 11. Lines 13 to 26 define the format with the placeholders for the variables.

Defining a Format

Formats can be defined anywhere in the code for a program. Here's the syntax for a format specification:

format FORMAT_NAME =
Presentation Line  #1
Values for  Presentation Line  #1
Presentation Line  #2
Values for  Presentation Line  #2
Presentation Line  #3
Values for  Presentation Line  #3
...

The FORMAT_NAME is the name of the format to use. Any Perl variable name can be used here to name a format, but the convention is to use all capital letters in the name. The period at the end is required to signify the end of a format specification. Formats work with file handles. Each file handle in a Perl program can have a format defined for it. The name of a format is the same as that of its associated file handle. The default output file handle for a Perl script is STDOUT; therefore, the format name for standard output is also STDOUT. For stderr, the error output, the format is STDERR. In the program shown in Listing 19.2 it is defined in lines 13 through 26.

Within the format definition, two lines of code are used for each line that is printed as output. The first line of code indicates how fields are to be displayed using placeholders, the second describes the variables whose values are used to display in the placeholders. The program in Listing 19.2 used two variables, $name and $address, to be printed out in two lines. See lines 14 through 17 in the source code. Line 14 defines where to put the value of the next variable, and line 15 defines $name to be the name of the variable whose value is used with the format definition in line 13. Line 16 defines another placeholder, and line 17 defines the variable to use to fill the placeholder specified in line 16.

By using the placeholders in a format statement, you can carefully place values of variables in specific locations in a series of lines in an output. Placeholders can left, center, or right justify the text, clip the output text, and so on to further beautify the text.

Using the format Statement

Let's look at another sample use of this format for printing out data. The data is on chemicals and is in a text file. The file contains a list of chemicals in common household products. The first script is for printing out this data in a nice, presentable fashion.

Here is the sample data file:

Acetic Acid: Vinegar
Ammonium Hydroxide: Ammonia Cleaners
Ammonium Nitrate: Salt Peter
Ammonium Oleate: Ammonia Soap
Barium Sulfide: Black Ash
Carbon Carbinate: Chalk
Carbontetrachloride: Cleaning Fluid
Calcium Oxide: Lime
Ferric Oxide: Rust
Glucose: Corn Syrup
Graphite: Pencil Lead
Hydrogen Peroxide: Peroxide
Naphthalene: Mothballs
Silver Nitrate:  Photographer's Hypo
Sodium Bicarbonate: Baking Soda
Sodium Borate: Borax
Sodium Carbonate: Washing Liquids
Sodium Chloride: Salt
Sodium Silicate: Glass
Sulfuric Acid: Battery Acid
Sucrose: Cane Sugar

To print this file in a clean fashion, the obvious choice is to use printf statements with a header and one printf statement per line in the file, as shown in the following fragment of code. The output from this run is too garbled to print in a book.

print "\n %20s | %20s \n" , "Chemical", " Product ";
while(<>) {
    my ($chemical, $found) = split(':');
    printf " %20s | %20s \n", $chemical, $found;
    }

Also note that you don't know what the data looks like. What if you wanted to get an idea what the output should look like using format statements? The code in Listing 19.3 illustrates how to print the file to get the following output:

Chemical                    Product
=====================================================
Acetic Acid                 Vinegar
Ammonium Hydroxide          Ammonia Cleaners
Ammonium Nitrate            Salt Peter
Ammonium Oleate             Ammonia Soap
Barium Sulfide              Black Ash
Carbon Carbinate            Chalk
Carbontetrachloride         Cleaning Fluid
Calcium Oxide               Lime
Ferric Oxide                Rust
Glucose                     Corn Syrup
Graphite                    Pencil Lead
Hydrogen Peroxide           Peroxide
Naphthalene                 Mothballs
Silver Nitrate              Photographer's Hypo
Sodium Bicarbonate          Baking Soda
Sodium Borate               Borax
Sodium Carbonate            Washing Liquids
Sodium Chloride             Salt
Sodium Silicate             Glass
Sulfuric Acid               Battery Acid
Sucrose                     Cane Sugar

Listing 19.3. A simple report generator.
 1 #!/usr/bin/perl
 2 while(<>) {
 3     ($chemical, $found) = split(':');
 4     write;
 5     }
 6 format STDOUT_TOP =
 7 Chemical                    Product
 8 =====================================================
 9 .
10 format STDOUT =
11 @<<<<<<<<<<<<<<<<<<<<<     @<<<<<<<<<<<<<<<<<<<<<<<<
12 $chemical, $found
13 .

Line 4 is where each record is printed out with the use of the write() function. The write() function can take a FILEHANDLE parameter. Therefore, the following two lines are equivalent because the default file handle is STDOUT:

write;write STDOUT;

There is no guarantee, however, that you'll be writing to STDOUT. You should check to see which file handle you are writing to in order to make sure you are using the correct format. I cover selection of formats shortly.

In the format specifications on lines 6 through 9, the name of the format is STDOUT_TOP. STDOUT is the name that corresponds to the file handle you happen to be writing to, which in this case is the standard output: STDOUT. The _TOP appendage is for the write() statement to print this header every time it starts a new page. The format specification is terminated with the solitary period on line 9.

Perl actually makes the assumption that most output from reports will be sent to a printer. Therefore, when the line count goes to zero, Perl prints out the value in the $^ variable for the top of page. If you want your output to go to a screen, you might not want the top of page format to be printed out. In such a case, set $^ to NULL, and you will get continuous output. The default number of lines per page is set to 60. The $^ variable is set to a default format with a value of a form feed to start a new page on a printer.

There are two points that should be mentioned here. One, the format specification for the header or the record is not specified in the write() call. Two, the header is printed only once automatically before the first record is printed.

Lines 10 through 13 contain the format specifications to be used for each record printed with the write() function in line 4. The format is specified in two lines for each line of output. The first line specifies how to print the information, and the second line specifies what variables are used to print the information. As with the header, the format specification is terminated with a solitary period (see line 13).

The format specifiers in Perl have an at symbol (@) followed by these symbols:

The symbols after @ are repeated to signify the number of columns to take up on the output. For example, to create a left-justified field that is 10 characters long, you use @<<<<<<<<<< as the format specifier. Listing 19.4 centers the text in the middle of the page.


Listing 19.4. Right- and left-justifying code.
 1 #!/usr/bin/perl
 2 while(<>) {
 3     ($chemical, $found) = split(':');
 4     write;
 5     }
 6 format STDOUT_TOP =
 7              Chemical Product
 8 ===================================================
 9 .
10 format STDOUT =
11 @>>>>>>>>>>>>>>>>>>>> @<<<<<<<<<<<<<<<<<<<<<<<<<<<<
12 $chemical, $found
13 .

Here is the right- and left-justified text output.

             Chemical Product
===================================================
          Acetic Acid  Vinegar
   Ammonium Hydroxide  Ammonia Cleaners
     Ammonium Nitrate  Salt Peter
      Ammonium Oleate  Ammonia Soap
       Barium Sulfide  Black Ash
     Carbon Carbinate  Chalk
  Carbontetrachloride  Cleaning Fluid
        Calcium Oxide  Lime
         Ferric Oxide  Rust
              Glucose  Corn Syrup
             Graphite  Pencil Lead
    Hydrogen Peroxide  Peroxide
          Naphthalene  Mothballs
       Silver Nitrate  Photographer's Hypo
   Sodium Bicarbonate  Baking Soda
        Sodium Borate  Borax
     Sodium Carbonate  Washing Liquids
      Sodium Chloride  Salt
      Sodium Silicate  Glass
        Sulfuric Acid  Battery Acid
              Sucrose  Cane Sugar

Look at line 11 in Listing 19.4 and compare it with line 11 in Listing 19.3. Then, compare the output of each of those programs.

Caution
You can create the best-looking reports in your Perl script and have them present beautifully on your xterm. However, unless you ensure that your terminal and printing device both use fixed-width fonts, lining up the text columns will be a nightmare. When in doubt, use fonts like Courier (on printers) or Fixed (on X Window System terminals). Avoid fonts like Helvetica or New Century Schoolbook because these are variable-width fonts, and you'll never really be able to align the characters and columns as you would be able to with fixed-width fonts.
This type of inconsistency in outputs is fairly obvious when you try to print your Perl- formatted reports on Web browsers that are set on variable-width fonts. Do not expect a Web browser to be set on fixed-width fonts. If you must print such reports, consider using an HTML page table instead. See Part IV, "Working with the Web," for more information.

Numbers in the format field are specified with the hash mark (#). Let's try a different example with a new data file containing both numeric and text data. The idea is to print out the values in this data file in a nice report.

Here is the sample data file with text and numeric fields.

UK , 44 , Pound , 1.85 , 100
BELGIUM , 32 , Franc , 32.0 , 200
DENMARK , 45 , Krone , 6.0 , 2000
FINLAND , 358 , Markka , 4.69, 1000
FRAncE , 33 , Franc , 5.28, 50
ELSALVADOR , 503 , Colon , 8.74, 340
PHILIPPINES , 63 , Peso , 24.8, 1000
PAKISTAN , 92 , Rupee , 38.0, 1200
BAHRAIN , 973 , Dinar , 0.38 , 45
IRAQ , 964 , Dinar , 0.60, 10
JORDAN , 962 , Dinar , 0.70, 100
SAUDIARABIA , 966 , Riyal , 3.75, 1000

This file contains the names of countries, their international dialing codes, their currencies, and the value of the currencies with respect to the dollar. (Keep in mind that I made up these numbers.) The fifth value is the number of currency bills on hand.

The value of the fourth item in each row is where numbers have different numbers of digits to the right of the decimal point. This output is a fragment from a spreadsheet comma-delimited file that did print unequal numbers of decimal digits.

Listing 19.5 takes this file as input and generates a nice, clean report. (I am deliberately not using the amount field in this program.)


Listing 19.5. Printing numeric and text fields.
 1 #!/usr/bin/perl
 2 $count = 0;
 3 while(<>) {
 4     ($country, $code, $currency, $value) = split(',');
 5     $count++;
 6     write;
 7     }
 8 format STDOUT_TOP =
 9 Id  Country           Code  Currency    Rate
10 =============================================
11 .
12 format STDOUT =
13 @## @<<<<<<<<<<<<<<<<<@####@<<<<<<<<@#####.##
14 $count, $country, $code, $currency, $value
15 .

Notice how in line 13 the $counter variable is printed in two digits using the @## format. The name of the country is left-justified with the @<< symbol, followed immediately by the area code number. If the country's name is too long to fit in the specified area, it is truncated to fit with no spaces between the country's name and code.

The $value is shown with two digits to the right of the decimal point. Even though the input did not have the same number of digits to the right of the decimal point, the output will be formatted with two digits to the right of the decimal point. In fact, the output will be cleanly aligned on the decimal point, as shown here:

Id  Country           Code  Currency    Rate
=============================================
  1 UK                   44 Pound        1.85
  2 BELGIUM              32 Franc       32.00
  3 DENMARK              45 Krone        6.00
  4 FINLAND             358 Markka       4.69
  5 FRAncE               33 Franc        5.28
  6 ELSALVADOR          503 Colon        8.74
  7 PHILIPPINES          63 Peso        24.80
  8 PAKISTAN             92 Rupee       38.00
  9 BAHRAIN             973 Dinar        0.38
 10 IRAQ                964 Dinar        0.60
 11 JORDAN              962 Dinar        0.70
 12 SAUDIARABIA         966 Riyal        3.75

You are not limited to printing only variables in the formatted statement. Because the variables in the format specification are evaluated by Perl, you can place statements in there as well. Consider the program in Listing 19.6, which prints the result of a calculation.


Listing 19.6. Calculations in the format statement.
 1 #!/usr/bin/perl
 2 $count = 0;
 3 while(<>) {
 4     ($country, $code, $currency, $value, $amount)
 5         = split(',');
 6     $count++;
 7     write;
 8     }
 9 format STDOUT_TOP =
10 Id  Country        Currency  Rate   Amount  Value in $
11 =====================================================
12 .
13 format STDOUT =
14 @## @<<<<<<<<<<<<  @<<<<<< @#####.##   @####.##  $ @######.##
15 $count, $country,  $currency, $value, $amount, ($amount/$value)
16 .

Here's the output of the run on the data file.

Id  Country        Currency  Rate      Amount     Value in $
==============================================================
  1 UK              Pound       1.85     100.00  $      54.05
  2 BELGIUM         Franc      32.00     200.00  $       6.25
  3 DENMARK         Krone       6.00    2000.00  $     333.33
  4 FINLAND         Markka      4.69    1000.00  $     213.22
  5 FRAncE          Franc       5.28      50.00  $       9.47
  6 ELSALVADOR      Colon       8.74     340.00  $      38.90
  7 PHILIPPINES     Peso       24.80    1000.00  $      40.32
  8 PAKISTAN        Rupee      38.00    1200.00  $      31.58
  9 BAHRAIN         Dinar       0.38      45.00  $     118.42
 10 IRAQ            Dinar       0.60      10.00  $      16.67
 11 JORDAN          Dinar       0.70     100.00  $     142.86
 12 SAUDIARABIA     Riyal       3.75    1000.00  $     266.67

Notice how the value of the investment is calculated in the format specification itself. Look at line 15 in Listing 19.6 to see how the output fields are set up for use with the format specifiers. Note also that in line 14 the dollar sign is printed verbatim in the output. You can print anything you want as long as it's not misinterpreted as a specifier.

You can even call subroutines that return values in place of variables in the format specification. Listing 19.7 defines a function that tells you if your funds are running too high. The function returns a message indicating that your investment in a foreign currency is too low or too high based on a certain criteria. The value returned from the function is a string and therefore will be printed as left-justified output using @<<<<<<<<.


Listing 19.7. Using subroutines in formats.
 1 #!/usr/bin/perl
 2 while(<>) {
 3  ($country, $code, $currency, $value, $amount) = split(',');
 4     write;
 5     }
 6 format STDOUT_TOP =
 7 Id  Country        Currency  Rate      Amount     Value in $
 8 ============================================================
 9 .
10 format STDOUT =
11 @<<<<<<<<<<<<  @<<<<<< @#####.##   @####.##  $ @>>>>>>>>>>>>>>
12 $country,$currency,$value,$amount,&checkAmount($amount,$value)
13 .
14 sub checkAmount {
15     my ($num, $val) = @_;
16     my $dollars;
17     my $ret;
18     $dollars = $num / $val;
19     $ret = sprintf "%6.2f     ", $dollars;
20     if ($dollars < 10)
         { $ret = sprintf "%6.2f low ", $dollars; }
21     if ($dollars > 200)
         { $ret = sprintf "%6.2f high", $dollars; }
22     $ret;
23 }

Note the format string to the sprintf function call in line numbers 19, 20, and 21 in the code shown in Listing 19.7. The sprintf statement is designed to return the same number of characters regardless of the value of $dollar. The length of the $ret variable in the checkAmount subroutine is a constant. If you leave the length of $ret dependent on sprintf, there is no guarantee that the output will be assigned on a decimal point. It's important that you return a string back from the checkAmount function and not an integer. If you return an integer value, it won't be printed.

Here's the output from Listing 19.7.

Id  Country        Currency  Rate      Amount     Value in $
============================================================
UK              Pound       1.85     100.00  $      54.05
BELGIUM         Franc      32.00     200.00  $       6.25 low
DENMARK         Krone       6.00    2000.00  $     333.33 high
FINLAND         Markka      4.69    1000.00  $     213.22 high
FRAncE          Franc       5.28      50.00  $       9.47 low
ELSALVADOR      Colon       8.74     340.00  $      38.90
PHILIPPINES     Peso       24.80    1000.00  $      40.32
PAKISTAN        Rupee      38.00    1200.00  $      31.58
BAHRAIN         Dinar       0.38      45.00  $     118.42
IRAQ            Dinar       0.60      10.00  $      16.67
JORDAN          Dinar       0.70     100.00  $     142.86
SAUDIARABIA     Riyal       3.75    1000.00  $     266.67 high

Using More Than One Format

There is a standard header that uses the _TOP suffix; however, there is no _END or _BOTTOM suffix to print out when you are done. This is obvious because Perl does not know when you want to stop. All is not lost, though, because you can specify a different format to print out results at the end.

Listing 19.8 defines a new format name, called ENDING. First, it selects the format by setting the internal Perl variable $~ to the name. Next, it uses the format ENDING to print out the total value of the foreign currency portfolio.


Listing 19.8. Using a different format.
 1 #!/usr/bin/perl
 2 while(<>) {
 3     ($country, $code, $currency, $rate, $amount)
 4         = split(',');
 5     $sum += ($amount/$rate) ;
 6     write;
 7     }
 8 $~ = "ENDING";
 9 write;
10 format STDOUT_TOP =
11 Id  Country     Currency   Rate      Amount     Value
12 ============================================================
13 .
14 format STDOUT =
15 @<<<<<<<<<<<<  @<<<<<< @#####.##   @####.##    @#####.##
16 $country,$currency,$rate,$amount,$amount/$rate
17 .
18 format ENDING =
19 ========================================================
20                           Total Value =    $@#######.###
21 $sum
22 ========================================================
23 .

Here is the output of Listing 19.8.

Id  Country     Currency   Rate      Amount     Value
========================================================
UK              Pound       1.85     100.00        54.05
BELGIUM         Franc      32.00     200.00         6.25
DENMARK         Krone       6.00    2000.00       333.33
FINLAND         Markka      4.69    1000.00       213.22
FRAncE          Franc       5.28      50.00         9.47
ELSALVADOR      Colon       8.74     340.00        38.90
PHILIPPINES     Peso       24.80    1000.00        40.32
PAKISTAN        Rupee      38.00    1200.00        31.58
BAHRAIN         Dinar       0.38      45.00       118.42
IRAQ            Dinar       0.60      10.00        16.67
JORDAN          Dinar       0.70     100.00       142.86
SAUDIARABIA     Riyal       3.75    1000.00       266.67
========================================================
                          Total Value =    $    1271.741
========================================================

Controlling the Format

There are some very important internal Perl variables you must be aware of when working with formats. By setting the values in the following variables in a Perl script, you can control the output of where to write to, the number of lines per page, what to write on top of every new page and how to write every entry and so on:

Variable
Description
$~
This variable contains the name of the default format to use. The default is STDOUT if you are writing to STDOUT. This is used in Listing 19.8.
$^
This variable contains the default top-of-page format. Just like the $~ variable, you can set $^ to a different top-of-page format than the default. Set this value to NULL if you don't want a header printed every $= lines.
$=
This variable contains the number of lines to write before writing the top-of-page header. The default value is 60 lines per page.
$-
This variable contains the number of lines left on the page. When $- is equal to 0, it is set to $=, and the next line output triggers the top-of-page format. The value in $- is incremented every time you write to the file handle associated with the current format.
$%
This variable contains the current page number. You can refer to the value in this variable to print page numbers in your output.

Listing 19.9 is a simple program that provides a clean listing of a lengthy source file. There are two options to this program: -file to specify the filename and -line to specify the number of lines per page.


Listing 19.9. A sample program to list files.
 1 #!/usr/bin/perl
 2 use Getopt::Long;
 3 $result = GetOptions('file=s','lines:i');
 4 if !defined($opt_file)
         { print "Usage: $0 filename\n ";
         exit 0; }
 5 open(SFILE,$opt_file) || die "Cannot open $opt_file \!\n";
 6 $= = $opt_lines ? $opt_lines : 50;
 7 $date = `date +%D`;
 8 chop($date);
 9 $i = 0;
10 while (<SFILE>) {
11     write;
12     $i++;
13     }
14 close(SFILE);
15 format STDOUT_TOP =
16 Filename @<<<<<<<<<<<<<<   Date @<<<<<<<<  Page: @#####
17 $opt_file, $date, $%
18 =======================================================
19 .
20 format STDOUT =
21 @#### @*
22 $i,$_
23 .

The code in Listing 19.9 uses the Getopts module to collect the filename and the number of lines per page. If you are not familiar with the use of the Getopts module, please refer to Chapter 16, "Command-line Interface with Perl," for more details. The variable $opt_file contains the filename, and $opt_lines (if defined) contains the number of lines per page.

In line 6 the variable $= is set to the number of lines per page requested by the user. The default value is set to 55 lines per page. A date value is generated and stored in $date on lines 7 and 8. The $i counter is used to print out the line number in the source file.

Because the script is writing to STDOUT with the write() command, it uses the formats STDOUT and STDOUT_TOP for the records and page headers, respectively. In line 17 the script uses $% to print the current page count along with the filename and the current date.

@* in the field specifier is used in line 18 to indicate that the entire value of the $_ variable should be printed as a string. The $_ variable is not truncated as it would be if the other string specifiers such as @<<<<, @>>>>, or @|||||| had been used.

In fact, @* even allows multiline fields. Had there been any carriage returns in the string, they would also be printed out. Therefore, if the string $_ is equal to "One\nTwo\nThree\nFour", it is printed as follows when used in a write() function:

One
Two
Three
Four

To suppress the printing of a format top, you should define the format top specification. To delay the printing, you can keep resetting the value of $- to a non-zero value less than the value in $= as you print out text. When you are ready to print the top of the header, set the value of $- = 0 to signify the start of a new page.

Creating Multiple Lines with the Caret

There is more than one way to print long descriptive text in a field. For instance, you could have multiple lines and use the @* specifier. The problem is that you are not able to control the number of columns in the output at all. To fill in multiple lines, you have to use the caret (^) operator instead of the at (@) operator. Using the ^ operator enables you to specify the same text on multiple lines. Perl chops up the variable for you and presents as much as it can per line.

The ^ operator behaves the same way as the @ operator with one major difference: You can only use scalars with the ^ symbol. The reason for this is that the scalar value being mapped to the ^ format is chopped up into manageable pieces for printing. It's nearly impossible to do the chopping on an evaluated expression.

There are two ways that you can use the ^ operator: to output only a fixed number of lines or output only as much as necessary. To output up to a fixed number of lines, you use the tilde (~) operator. Here's the syntax to print the contents of the $desc variable using up to three lines:

@<<<<<<<        ^<<<<<<<<<<<<<<<<<<<<<<<<
$hours,         $desc
~               ^<<<<<<<<<<<<<<<<<<<<<<<<
              $desc
~               ^<<<<<<<<<<<<<<<<<<<<<<<<
              $desc

If you omit the tildes, you will get exactly three lines of output. The tildes suppress blank lines from being output, so therefore you get up to three lines of output.

The tilde operator is required at the front of the line. You must specify the $desc variable on every line that you want it divided on using the ^ operator. If you are not sure of how many lines you'll be printing, you can use two tildes (~~) together. Listing 19.10 presents a sample invoice-printing program using the padding field with two tildes.


Listing 19.10. Using the tilde to print on multiple lines.
 1 #!/usr/bin/perl
 2 use Getopt::Long;
 3 $result = GetOptions('file=s','invoice=i');
 4 if($opt_file eq undef) { print "Usage: $_[0] filename\n "; exit 0; }
 5 open(SFILE,$opt_file) || die "Cannot open $opt_file \!\n";
 6 $date = `date +%D`;
 7 chop($date);
 8 $i = $opt_invoice ? $opt_invoice : 200;
 9 $rate = 55;
10 while (<SFILE>) {
11      ($name,$addr,$hours,$desc)= split(':');
12     $i++;
13     write;
14     }
15 close(SFILE);
16 format STDOUT =
17                                      INVOICE#@#######
18 $i,
19 To:
20 @<<<<<<<<<<<<<<<                     My Company
21 $name
22 @<<<<<<<<<<<<<<<                     Any town, USA 99999
23 $addr
24 ==================================================================
25                      DESCRIPTION
26 ==================================================================
27 @<<<<<<<<<<<<<<<                     ^<<<<<<<<<<<<<<<<< <<<<<<<<<<<
28 $name, $desc
29 ~~                                   ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
30 $desc
31                                           Total Hours @######
32 $hours
33 ==================================================================
34                                           Total Due $ @######
35 $hours * $rate
36 ==================================================================
37 Have a nice day!
38 .

The important lines in Listing 19.10 are lines 27 through 35. Lines 27 through 30 are used to specify the format for the client name and description fields. Note the use of the two tildes (~~) in the line to format the $desc variable. Line 30 allows Perl to expand the output to multiple lines. Lines 31 through 35 print out the totals in their own lines.

Here is the output of Listing 19.10. The input file is not shown.

                                     INVOICE#     201

To:
El Dictator Corp                     My Company
South of Here                        Any town, USA 99999

==================================================================
                     DESCRIPTION
==================================================================
El Dictator Corp                     Revolutions quelled,
                                     uprisings started, lions
                                     tamed, sanity restored


                                          Tota l Hours     105
==================================================================
                                          Tota l Due $    5775

==================================================================
Have a nice day!

                                     INVOICE#     202

To:
ABC Corp                             My Company
2 Main St, USA                       Any town, USA 99999

==================================================================
                     DESCRIPTION
==================================================================
ABC Corp                             Cratered project, killed
                                     hopes, destroyed goals,
                                     blinded visions and ejected
                                     compentancy


                                          Tota l Hours      21
==================================================================
                                          Tota l Due $    1155

==================================================================
Have a nice day!

Using format in Modules

The Invest.pm module (from Chapter 4, "Introduction to Perl Modules") that we've been working is in need of a face lift, as well. It needs a function to print a formatted report called reportPortfolio() in the Invest.pm file. The code for the function is shown in Listing 19.11.


Listing 19.11. The reportPortfolio() function.
 1 sub Invest::reportPortfolio {
 2 #
 3 #  Save the values to the format parameters
 4 #  before proceeding with the call
 5 #
 6     my $hdrfmt = $~;
 7     my $topfmt = $^;
 8     my $pageCt = $=;
 9     my $lineCt = $-;
10     my $sym;           # symbol
11     my $shr;           # no. of shares
12     my ($key, $i);     # for looping
13     $= = 0;      # for header to print
14     $- = 0;
15     $~ = "PORT_RPT";
16     $^ = "PORT_RPT_TOP";
17 format PORT_RPT_TOP =
18     Report
19 STOCK     SHARES
20 =====   ======
21 .
22 format PORT_RPT =
23 @<<<<   @<<<<
24 $sym, $shr
25 .
26     while (($key,$i) = each(%portfolio)) {
27         $shr = $i->{'shares'};
28         $sym = $i->{'symbol'};
29         write ;
30     }
31 #
32 #  Restore the values to the format parameters
33 #  before the call
34 #
35     $= = $pageCt;
36     $- = $lineCt;
37     $~ = $hdrfmt;
38     $^ = $topfmt;
39 }

The first thing the report function does in this module is to save the current status of the four important variables for report generation. The four global variables that affect any report generation ($-,$=,$% and $^) are first saved in lines 6 through 9. The values for the format variables are then set in lines 13 through 16.

It is important that you set the format defaults ($~, $^, $=, and $-) here because you might be in the middle of a report when this function is called and you do not want to use those values. Rather, you have to reset the values to start a new report.

The original values are saved for restoration later in lines 35 through 38. Using this methodology of preserving the state of each report format, you can embed reports within reports.

Lines 17 through 25 define the format specification in the same block of code as the reportPortfolio() function. It's important that you declare the format in the same block of code in which the my variables are declared. The scope of a variable declared as my is limited to the curly braces in which it is declared. The format specification needs access to the my variables; therefore, it has to be declared in the curly braces as well. If you declare the format outside the curly braces in which the variables are declared, the format statement is not able to see them and therefore prints nothing.

Also, notice how each format specification is left-justified in the code. All white spaces in the format specification are important and are not removed from the output.

Now let's look at the code in Listing 19.12 to see how a report is generated.


Listing 19.12. Testing the reportPortfolio() function.
 1 #!/usr/bin/perl
 2 push(@Inc,`pwd`);
 3 use Invest;
 4 use Invest::Fund;
 5 use Invest::Stock;
 6
 7 $port = new Invest;
 8 @stocks = ( 'INTC','MSFT','XLNX','TSX','SERT');
 9 $n = 50;
10 foreach $x (@stocks) {
11     $n += 50;
12     $i = new Invest::Stock('symbol' => "$x", 'shares' => "$n");
13     $port->Invest::AddItem($i);
14 }
15 # Print the report.
16 $port->reportPortfolio();

In lines 3 through 5, you define the modules that the program is going to be using. At line 7 you create an Invest object. In line 8, you define the stock symbols to use. The $n variable in line 9 is set to a dummy value. The loop in lines 10 through 14 creates new stock objects to add to the Invest object. The dummy value in $n is incremented by 50 on every iteration in the loop. The reportPortfolio() function is then called on the Invest object, $port, to print out a report.

The Perl function to create this output with a call to the reportPortfolio() function is shown in Listing 19.12. The call to the reportPortfolio() function is made at line 16. Here's the output from the reportPortfolio() function in the Invest.pm file:

Report STOCK
SHARES
INTC
100
MSFT
150
XLNX
200
TSX
250
SERT
300

Another Example of Report Generation

Here is another example of how to use formats. The code shown in Listing 19.13 will print out a list of all the files that are greater than 150KB long in a directory tree. It's often necessary to see which files in a directory tree are using up the most space on a disk, especially if you are running out of disk space.


Listing 19.13. Creating a list of files.
  1 #!/usr/bin/perl
  2 #           Print disk usage information
  3 #             Copy and modify freely
  4 # ----------------------------------------------------
  5 # The following entry will be printed out on new pages
  6 # ----------------------------------------------------
  7 format LargeFilesTop =
  8 Large files in @<<<<<<<<<<<<<<<<<<<<<<<  @<<<<<<<<<<<<<<<<<<  Page @>
  9                $itemName,                $today,     &nbs p;       ++$page
 10
 11 Owner        Block   mtime     file
 12 ---------------------------------------------------------------------
 13 .
 14 # -----------------------------------------
 15 # The format for the top of page ends here.
 16 # -----------------------------------------
 17
 18 # ------------------------------------------------------
 19 # The following entry will be printed out for every time
 20 # ------------------------------------------------------
 21 format LargeFilesEntry =
 22 @>>>>>>> @<<<<<<< @<<</@>/@>>>> @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
 23 $size,   $owner,  $mon, $day, $year, $fileName
 24 .
 25
 26 # -----------------------------------------------
 27 # The main program begins here
 28 # -----------------------------------------------
 29 $page = 0;   #
 30 select(STDOUT);
 31 foreach $itemName (@ARGV) {
 32      #
 33      # Do look in subdirectories but not in links
 34      #
 35      if (-d $itemName && !-l $itemName) {
 36      # $- = 0;  # start new page per directory
 37       open(FIND, "find $itemName -size +150  -ls |");
 38 #
 39 # LOOK how the format is used here .
 40 #
 41      $^ = "LargeFilesTop";
 42      $~ = "LargeFilesEntry";
 43      # $page = 0;   # if new page per directory.
 44      while ($line = <FIND>) {
 45     #
 46     # Extract all information
 47     #
 48     ($ino, $blks, $mod, $lnks, $owner,
 49     $grp, $size, $mon, $day,
 50     $year, $fileName) = split(' ', $line);
 51 #
 52 # use only what you need.
 53 #
 54     write;
 55     } # end of while loop.
 56   } # end of if-statement
 57 }

Two formats are defined in Listing 19.13. The first format is defined in lines 7 through 13, and the second format is defined in lines 21 through 24. Note that the names of the format are not defined using only capital letters. You can use any variable name for a format if you wish. Usually, a format name for the top of a page ends in the letters "TOP" and the format for every entry does not need a "TOP" string.

Both formats use a $page variable to track the page number. The $page number variable is set to 0 in line 29. Actually, the value of $page will be set to 0 automatically by the Perl interpreter when $page is first accessed, so line 29 is not really required. However, explicitly initializing key variables, such as page numbers or loop counters, makes the code easier to read and understand. Look at the code in line 30 as another example of putting in code for the sake of making code easier to follow. The code in line 30 explicitly selects the standard output handle STDOUT to write default output to. Actually, the write() statement will default to STDOUT unless a file handle is specified as the the first argument to it. Therefore, lines 29 and 30 are not necessary but are placed in the code only to make the code more readable.

The for loop in line 31 parses the directory names passed in on the command line. Each filename is used if it's a directory name, but not a link, as shown in the conditional block in line 35.

We can modify the program to report the results from searching each directory on a separate page by uncommenting the code at line 36, which forces a new page on every directory by setting the counter of number of lines left on a page (in the $- register) to 0. When the $- register is 0, the top header format is also used by the next write() command to be executed.

The find command in line 37 returns the result of searching for all files with a size greater than 150 blocks. Block sizes vary from 512 to 4096 bytes depending on the system you run this script on. For a typical UNIX system, the block size could be 1024 bytes; therefore, the find command in line 37 will find files that are 150KB or more. The option -ls is used to list a verbose listing from the find command.

The type of format for the header and each entry in the report is set in lines 41 and 42. Line 43 can be uncommented to reset the page count to 0 every time a new directory report is printed. By commenting line 43, you print a page count that enumerates all the pages in the report.

The verbose listing from the find command is set in lines 48 to 50. Only a few of the variables in the list from the split function call are used in the program when the write() function is called. Here's a sample run of this program. (The program shown in Listing 19.13 is called dirUsage.pl):

$ dirUsage.pl /home/khusain/a3
 
Large files in /home/khusain/a3                               Page  1

Owner        Block   mtime     file
---------------------------------------------------------------------
 5611520 khusain  Jul / 4/12:45 /home/khusain/a3/n3b5.tar
 4414116 khusain  Jul / 1/21:31 /home/khusain/a3/netscape
  197472 khusain  Jul / 1/21:22 /home/khusain/a3/Netscape.ad
  921497 khusain  Jul / 1/21:30 /home/khusain/a3/moz3_0.zip
 2194589 khusain  Jul /17/11:00 /home/khusain/a3/pcninstl.exe

Listing 19.13 currently shows only those files whose sizes are greater than 150KB each because 1 block on my machine is set to 1024 bytes, and the find command is set to search all files with 150 or more blocks. You can always modify the program shown to set a different block size argument to the find command.

Summary

This chapter introduced you to using formats to get reports from data on disk. Using formats requires setting up one specification for the top of the output and one for each record printed with the write() statement. The record format being used is specified in the $~ variable, and the format for the top is specified in $^. The number of lines of text per page are specified in the $= variable with $- as the number of lines left on the current page. Page numbers are kept in the $% variable. Use < to left justify, > to right justify, and | to center text. To suppress printing of blank lines in a field, use a tilde (~) in front of a format specification. Use double tildes (~~) instead of a tilde to use as much space as possible. The hash mark (#) is used to specify locations of digits in numeric values. The default format is the name of the file handle being written to. When using a module, be careful to save the state of these variables before using a format so that you can restore the state when returning execution back to the caller.