This chapter introduces you to the types of documentation available for Perl scripts. I cover two documentation methods: embedding nroff pages and using the plain old documentation (POD) format. Documenting your Perl scripts is easy and standardized enough in Perl 5 to allow for generating LaTex, HTML, and man pages from the source files.
Generally, you have to keep the software documentation for your program in a file separate from the source code. This separate storage forces you to remember two different files for every change you make: one for the source file and the other for the documentation. The consequence is that the source file is almost never in sync with the documentation.
Since Perl 4, you can keep your documentation and code in the same source file. The way to do this is to use the nroff tags in the man package and embed these codes in a Perl source file. This trick works only if you are using the -man package with nroff; therefore, if your system is not UNIX-like or if you abhor nroff, you should skip ahead to the next section, which is on POD formats.
The way to do this embedding was documented initially in the book Programming Perl by Larry Wall and Randall Schwartz, published by O'Reilly & Associates. Wall has also written for this book a shell script, called wrapman, that performs embedding as a template. However, it will be instructive to see how the method works.
The trick in embedding man pages is to use the .di and .ig commands in nroff. The .di command in nroff "diverts" text into an nroff macro. It works this way: There are two .di tags; one is defined at the top of the text to be diverted as .di X, and the other .di tag (with no arguments) at the bottom of the text. The first .di X asks nroff to divert all the text into macro X until it sees .di at the start of a line. The .ig macro in nroff works the same way as .di, but it forces nroff to ignore all text between .ig X and any other .X tag. Now comes the important part: The double quotes ("") in both .ig and .di commands can be replaced with a single quote to get 'ig and 'di commands that do the same thing as the .ig and .di commands, except that text output is suppressed until the macro call is over. Note also that the single quote is, interestingly enough, also the defining character for a string in Perl.
Another point to remember is that Perl stops interpreting your script when it sees an _ _END_ _ token. This stopping feature can be used to your advantage because you can put all of your text after the _ _END_ _ token.
So if you were to add the following two statements to the start of a Perl script, your script would still run:
'di';
'ig00 ';
As far as Perl is concerned, these two lines are simply strings. For nroff, the two lines are interpreted as calls to macros. The first line uses the 'di'; macro to divert text until it sees 'di on a line by itself. The next line 'ig00 '; diverts text until it sees .00 on a line by itself.
Now, at the end of the source file, place the following lines, which are valid in Perl and in nroff:
.00 # Terminates the .ig processing
'di # Terminates the 'di X processing.
.nr nl 0-1 # Sets the page to the start of the document
.nr % 0 # Sets the page count back to zero
'; _ _END_ _ # Terminates the 'di macro and all Perl interpreting
'di and '; really do define a Perl string between single quotes. _ _END_ _ stops Perl from processing further, and .00 is conveniently ignored by Perl.
Now you can place the man page contents after the line containing the _ _END_ _ statement. Look at the sample listing shown in Listing 8.1. The output from this listing is shown in Figure 8.1.
Listing 8.1. Embedding man pages in Perl.
1 #!/usr/bin/perl
2 'di ';
3 'ig 00 ';
4
5 print "$#ARGV \n" ;
6 if ( $#ARGV ) {
7 print "\n Usage: $0 file \n";
8 exit 0;
9 }
10 $name = $ARGV[0];
11
12 print "\nTesting flags for $name \n";
13 print "\n========== Effective User ID tests ";
14 print "\n is readable" if ( -r $name);
15 print "\n is writeable" if ( -w $name);
16 print "\n is executeable" if ( -x $name);
17 print "\n is owned " if ( -o $name);
18 print "\n========== Real User ID tests ";
19 print "\n is readable" if ( -r $name);
20 print "\n is writeable" if ( -w $name);
21 print "\n is executeable" if ( -x $name);
22 print "\n is owned by you" if ( -o $name);
23
24 print "\n========== Reality Checks ";
25 print "\n exists " if ( -r $name);
26 print "\n has zero size " if ( -z $name);
27 print "\n has some bytes in it " if ( -s $name);
28
29 print "\n is a file " if (-f $name);
30 print "\n is a directory " if (-d $name);
31 print "\n is a link " if (-l $name);
32 print "\n is a socket " if (-S $name);
33 print "\n is a pipe " if (-p $name);
34
35 print "\n is a block device " if (-b $name);
36 print "\n is a character device " if (-c $name);
37
38
39 print "\n has setuid bit set " if (-u $name);
40 print "\n has sticky bit set " if (-k $name);
41 print "\n has gid bit set " if (-g $name);
42
43 print "\n is open to terminal " if (-t $name);
44 print "\n is a Binary file " if (-B $name);
45 print "\n is a Binary file " if (-T $name);
46
47 print "\n is Binary to terminal " if (-t $name);
48 print "\n is open to terminal " if (-t $name);
49
50
51 .00 ;
52
53 'di \" finish diversion
54 .nr nl 0-1 \" Start new page with -1
55 .nr % 0 \" start at page 1
56 '; _ _END_ _ #### Start Man Page ####
57
58 .TH Test 1 "Apr 15, 1996"
59 .AT 3
60 .SH NAME
61 tf - Test file attributes
62 .SH SYNOPSIS
63 .B tf file
64 .P
65 .B tf directory
66 .SH DESCRIPTION
67 .I tf
68 Prints out the file attributes for a file.
69 .SH FILES
70 Just add perl.
71 .SH AUTHOR
72 Kamran Husain.
73 .SH BUGS
74 We don't believe in bugs, we introduce features.
Warning |
You might have to work with lines 2, 3, and 51 to get the spaces right if you are using different versions of nroff. The groff version of GNU did not work on two machines but worked fine on a Sun with these lines: 'di'; To get the two lines to work properly, I had to introduce a space in the calls to the macros: 'di '; |
You have been warned. The limitation of this method should be obvious by now: It's useful for generating one man page for one source file. In addition, it's too heavily tied to the nroff package. The man page will not be generated on NT machines that do not have the nroff packages installed by default. Obviously, something more generic is needed. This is where the POD format comes in.
The Perl plain old documentation (POD) format is designed to be an easier way to get your Perl files documented. Once you have documented your files in the POD format, you can use a translator program to convert your documents into HTML, LaTeX, or man pages. Nothing really prevents you from writing your own translator program; however, once you convert your documents into HTML, you can use off-the-shelf products to convert them into other word processing formats. For example, the Internet Assistant for Microsoft Word lets you read and convert HTML into a variety of formats.
The POD format lets you introduce some formatting directives into
your source files. Note that the formatting terms in Table 8.1
all begin with an equal sign (=).
Term | Description |
=pod | Begins formatting. The Perl interpreter ignores all text until it sees the =end directive. Only POD-related text is found between the =pod and =end directives. |
=end | Stops formatting. Only POD-related text is found between the =pod and =end directives. |
=head1 | Header level 1. |
=head2 | Header level 2. |
=over N | Starts indentation by moving the text to the right by N columns. By convention, the value of N is 4 to accommodate the translation programs; however, it does not have to be 4. |
=back | Nullifies a previous =over directive. An =over/=back pair is used to print lists of items. |
=item C | Specifies an item to be used between =over/=back pairs. C is a character or number to use as the bulleted item. There must be at least one =item in an =over/=back list. |
An example here will help. In Listing 8.2, a file called tf.pod is constructed to document the man page in POD format.
Listing 8.2. A sample POD file.
0 #!/usr/bin/perl
1 =pod
2 =head1 NAME
3 tf - Test file attributes
4
5 =head1 SYNOPSIS
6
7 Usage:
8
9 tf F<file>
10
11 tf F<directory>
12
13 =head1 DESCRIPTION
14
15 The first thing to rememeber is that text is not formatted in a pod
16 file but rather in the formatter. Paragraphs are left as they are.
17
18 The B<tf> program (notice how tf bold) works on these items:
19
20 =over 4
21
22 =item * Files
23
24 Just file names in your directory tree. The file name could be a
25 regular file, socket, device or a link.
26
27 =item * Directories
28
29 Yes, it'll work on directories too.
30
31 =back
32
33 Ship it!
34
35 =head1 BUGS
36
37 Remember the note about features?
38
39 =head1 Header 1
40
41 This is a header 1
42
43 =head2 Header 2
44
45 This is header 2 in I<Italics>.
46
47 =head2 Another Header 2
48
49 This is header 2 in B<BOLD>.
50
51 Another list with non-bulleted items.
52
53 =over 5
54
55 =item First
56
57 This is the First item.
58
59 =item Second
60
61 This is the Second item.
62
63 =item Third
64
65 This is the Third item.
66
67 =back
68
69 =cut
70 ... the rest of the script will be here ...
Line 1 begins the POD portion, and line 69 is where POD processing is cut. Line 70 is where the executable code would start; that is, right after the line that contains the "=cut" tag. Line 0 is present if this is an executable script and absent if this is only a Perl file. All the tags are separated by a blank line, but this is really unnecessary. In my opinion, the POD documentation is more readable if the tags are separated by blank lines.
Now, look at line 18 in Listing 8.2. The B<text> tag is used here to place text in bold typeface. Several tags exist for formatting text. Table 8.2 lists these tags.
Tag | Description |
B<text> | The text is placed in bold. |
I<text> | The text is placed in italics. |
S<text> | The text contains non-breaking spaces. |
C<code> | A literal code for the formatter. |
L<name> | A link to a man page referred to by name. |
L<name/sec> | A link to a section sec in a man page referred to by name. |
L<name/"sec"> | A link to a section in this man page. |
L<"sec"> | A link to a section in this man page. |
F<file> | A file name. |
X<index> | An indexed entry. |
Z<> | A zero width character. |
In most cases you only wind up using the B<> and I<> tags, as you'll see in the documentation that comes with Perl. Refer to Listing 18.2 to see how some of the formatting codes are used in POD files.
The POD information in a file can be included just about anywhere in a source file, although it's best to place this information either at the top or bottom of the source file. As long as you keep your =pod, =cut, and =over/back pairs matched, you shouldn't run into any problems.
Three filters exist that convert POD formatted documents into three different formats. Here's a list of these filters:
Filter | Description |
pod2html | Used to convert POD files to HTML files |
pod2man | Used to convert POD files to man pages |
pod2latex | Used to convert POD files to LaTeX files |
To run these programs, simply type the command and the filename. For example, to generate HTML files from the POD file shown in Listing 18.2, run this command:
pod2html gnat.pod
You'll find that running the pod2html program on gnat.pod created a file called gnat.html in your directory. The output for Listing 18.2 is shown in Figure 18.2.
Figure 8.2 : HTML output from pod2html
This chapter covered two ways of documenting Perl files: one using man pages and the other using POD documentation. man pages can be embedded in the source file, but they require the use of nroff with the man package. POD files are more generic in that you can use translators to convert from POD to one of three known formats: HTML, man, or LaTeX. In extreme cases, you can even write your own Perl script to decode the POD format and write files in your own format. If you really need to do something elaborate, you might want to consider taking the formatted HTML output from a pod2html program and placing the output in a word processor, such as Microsoft Word, to edit the HTML file directly.