Chapter 15

Perl Modules


CONTENTS

In the last chapter, you were introduced to object-oriented programming. Along the way, you learned some aspects of programming with Modules although you may not have realized it. I believe the shortest definition of a module is a namespace defined in a file. For example, the English module is defined in the English.pm file and the Find module is defined in the Find.pm file.

Of course, modules are more than simply a namespace in a file. But, don't be concerned-there's not much more.

Perl 4, the last version of Perl, depended on libraries to group functions in units. 31 libraries shipped with Perl 4.036 These have been replaced with a standard set of modules. However, the old libraries are still available in case you run across some old Perl scripts that need them.

Libraries-and modules-are generally placed in a subdirectory called Lib. On my machine, the library directory is c:\perl5\lib. If you don't know what your library directory is, ask your system administrator. Some modules are placed in subdirectories like Lib/Net or Lib/File. The modules in these subdirectories are loaded using the subdirectory name, two colons, and the module name. For example, Net::Ping or File::Basename.

Libraries are made available to your script by using the require compiler directive. Directives may seem like functions, but they aren't. The difference is that compiler directives are carried out when the script is compiled and functions are executed while the script is running.

Note
You might think the distinction between compiler directives and functions is minor. And you might be right. I like to be as precise as possible when using computer terminology. After all, the computer is precise; why shouldn't we be, too?
Unfortunately, Perl doesn't make it easy to create simple definitions and place every feature into a nice orderly category. So don't get hung up on attaching a label to everything. If you know what something does, the names won't matter a whole lot.

Some modules are just collections of functions-like the libraries-with some "module" stuff added. Modules should follow these guidelines:

Modules are loaded by the use directive, which is similar to require except it automates the importing of function and variable names.

Modules that are simply a collection of functions can be thought of as classes without constructors. Remember that the package name is the class name. Whenever you see a package name, you're also seeing a class-even if none of the object-oriented techniques are used.

Object-oriented modules keep all function and variable names close to the vest-so to speak. They are not available directly, you access them through the module name. Remember the Inventory_item->new() notation?

However, simple function collections don't have this object-oriented need for secrecy. They want your script to directly access the defined functions. This is done using the Exporter class, @EXPORT, and @EXPORT_OK.

The Exporter class supplies basic functionality that gives your script access to the functions and variables inside the module. The import() function, defined inside the Exporter class, is executed at compile-time by the use compiler directive. The import() function takes function and variable names from the module namespace and places them into the main namespace. Thus, your script can access them directly.

Note
I can almost hear your thoughts at this point. You're thinking, "The exporting of function and variable names is handled by the import() function?" Well, I sympathize. But, look at it this way: The module is exporting and your script is importing.

You may occasionally see a reference to what may look like a nested module. For example, $Outer::Inner::foo. This really refers to a module named Outer::Inner, so named by the statement: package Outer::Inner;. Module designers sometimes use this technique to simulate nested modules.

Module Constructors and Destructors

You may recall constructors and destructors from the discussion about objects in the last chapter. Constructors are used to initialize something and destructors are used to write log messages, close files, and do other clean-up type duties.

Perl has constructors and destructors that work at the module level as well as the class level. The module constructor is called the BEGIN block, while the module destructor is called the END block.

The BEGIN Block

The BEGIN block is evaluated as soon as it is defined. Therefore, it can include other functions using do() or require statements. Since the blocks are evaluated immediately after definition, multiple BEGIN blocks will execute in the order that they appear in the script.

Define a BEGIN block for the main package.
Display a string indicating the begin block is executing.
Start the
Foo package.
Define a
BEGIN block for the Foo package.
Display a string indicating the begin block is executing.

Listing 15.1  15LST01.PL-Using BEGIN Blocks

BEGIN {

    print("main\n");

}



package Foo;

    BEGIN {

        print("Foo\n");

    }


This program displays:


main

Foo

The END Block

The END blocks are the last thing to be evaluated. They are even evaluated after exit() or die() functions are called. Therefore, they can be used to close files or write messages to log files. Multiple END blocks are evaluated in reverse order.


Listing 15.2  15LST02.PL-Using END Blocks

END {

    print("main\n");

}



package Foo;

    END {

        print("Foo\n");

    }


This program displays:


Foo

Main

Note
Signals that are sent to your script can bypass the END blocks. So, if your script is in danger of stopping due to a signal, be sure to define a signal-handler function. See Chapter 13, "Handling Errors and Signals," for more information.

Symbol Tables

Each namespace-and therefore, each module, class, or package-has its own symbol table. A symbol table, in Perl, is a hash that holds all of the names defined in a namespace. All of the variable and function names can be found there. The hash for each namespace is named after the namespace with two colons. For example, the symbol table for the Foo namespace is called %Foo::. Listing 15.3 shows a program that displays all of the entries in the Foo:: namespace.

Define the dispSymbols() function.
Get the hash reference that should be the first parameter.
Declare local temporary variables.
Initialize the
%symbols variable. This is done to make the code easier to read.
Initialize the
@symbols variables. This variable is also used to make the code easier to read.
Iterate over the symbols array displaying the key-value pairs of the symbol table.
Call the
dispSymbols() function to display the symbols for the Foo package.
Start the Foo package.
Initialize the
$bar variable. This will place an entry into the symbol table.
Define the
baz() function. This will also create an entry into the symbol table.

Listing 15.3  15LST03.PL-How to Display the Entries in a Symbol Table

sub dispSymbols {

    my($hashRef) = shift;

    my(%symbols);

    my(@symbols);



    %symbols = %{$hashRef};

    @symbols = sort(keys(%symbols));



    foreach (@symbols) {

        printf("%-10.10s| %s\n", $_, $symbols{$_});

    }

}



dispSymbols(\%Foo::);



package Foo;

    $bar = 2;



    sub baz {

        $bar++;

    }


This program displays:


bar       | *Foo::bar

baz       | *Foo::baz

This example shows that there are only two things in the %Foo:: symbol table-only those things that the script placed there. This is not the case with the %main:: symbol table. When I display the entries in %main::, I see over 85 items. Part of the reason for the large number of names in the main package is that some variables are forced there. For example, STDIN, STDOUT, STDERR, @ARGV, @ARGVOUT, %ENV, @Inc, and %SIG are forced into the main namespace regardless of when they are used.

The require Compiler Directive

The require directive is used to load Perl libraries. If you needed to load a library called Room.pl, you would do so like this:


require Room.pl;

No exporting of symbols is done by the require directive. So all symbols in the libraries must be explicitly placed into the main namespace. For example, you might see a library that looks like this:


package abbrev;



sub main'abbrev {

    # code for the function

}

Two things in this code snippet point out that it is Perl 4 code. The first is that the package name is in all lowercase. And the second is that a single quote is used instead of double colons to indicate a qualifying package name. Even though the abbrev() function is defined inside the abbrev package, it is not part of the %abbrev:: namespace because of the main' in front of the function name.

The require directive can also indicate that your script needs a certain version of Perl to run. For example, if you are using references, you should place the following statement at the top of your script:


require 5.000;

And if you are using a feature that is available only with Perl 5.002-like prototypes-use the following:


require 5.002;

Perl 4 will generate a fatal error if these lines are seen.

Note
Prototypes are not covered in this book. If you are using Perl 5.002 or later, prototypes should be discussed in the documentation that comes with the Perl distribution.

The use Compiler Directive

When it came time to add modules to Perl, thought was given to how this could be done and still support the old libraries. It was decided that a new directive was needed. Thus, use was born.

The use directive will automatically export function and variable names to the main namespace by calling the module's import() function. Most modules don't have their own import() function; instead they inherit it from the Exporter module. You have to keep in mind that the import() function is not applicable to object-oriented modules. Object-oriented modules should not export any of their functions or variables.

You can use the following lines as a template for creating your own modules:


package Module;

    require(Exporter);

    @ISA = qw(Exporter);

    @EXPORT = qw(funcOne $varOne @variable %variable);

    @EXPORT_OK = qw(funcTwo $varTwo);

The names in the @EXPORT array will always be moved into the main namespace. Those names in the @EXPORT_OK will be moved only if you request them. This small module can be loading into your script using this statement:


use Module;

Since use is a compiler directive, the module is loaded as soon as the compiler sees the directive. This means that the variables and functions from the module are available to the rest of your script.

If you need to access some of the names in the @EXPORT_OK array, use a statement like this:


use Module qw(:DEFAULT funcTwo);     # $varTwo is not exported.

Once you add optional elements to the use directive you need to explicitly list all of the names that you want to use. The :DEFAULT is a short way of saying, "give me everything in the @EXPORT list."

What's a Pragma?

In a-hopefully futile-effort to confuse programmers, the use directive, was given a second job to do. It turns other compiler directives on and off. For example, you might want to force Perl to use integer math instead of floating-point match to speed up certain sections of your program.

Remember all of the new terminology that was developed for objects? The computer scientists have also developed their own term for a compiler directive. And that term is Pragma. The use statement controls the other pragmas. Listing 15.4 shows a program that use the integer pragma.


Listing 15.4  15LST04.PL-Using the integer Pragma

print("Floating point math: ", 10 / 3, "\n");

use integer;

print("Integer math:        " 10 / 3, "\n");


This program displays:


Floating point math: 3.33333333333333

Integer math:        3

Pragmas can be turned off using the no compiler directive. For example, the following statement turns off the integer pragma:


no integer;

Table 15.1 shows a list of the pragmas that you can use.

Table 15.1  Perl's Pragmas

Pragma
Description
integer
Forces integer math instead of floating point or double precision math.
less
Requests less of something-like memory or cpu time-from the compiler. This pragma has not been implemented yet.
sigtrap
Enables stack backtracing on unexpected signals.
strict
Restricts unsafe constructs. This pragma is highly recommended! Every program should use it.
subs
Lets you predeclare function names.

The strict Pragma

The most important pragma is strict. This pragma generates compiler errors if unsafe programming is detected. There are three specific things that are detected:

Symbolic references use the name of a variable as the reference to the variable. They are a kind of shorthand widely used in the C programming language, but not available in Perl. Listing 15.5 shows a program that uses symbolic references.

Declare two variables.
Initialize
$ref with a reference to $foo.
Dereference
$ref and display the result.
Initialize
$ref to $foo.
Dereference
$ref and display the result.
Invoke the strict pragma.
Dereference
$ref and display the result.

Listing 15.5  15LST05.PL-Detecting Symbolic References

my($foo) = "Testing.";

my($ref);



$ref = \$foo;

print("${$ref}\n");     # Using a real reference



$ref = $foo;

print("${$ref}\n");     # Using a symbolic reference



use strict;

print("${$ref}\n");


When run with the command perl 15lst05.pl, this program displays:


Testing.



Can't use string ("Testing.") as a SCALAR ref while "strict refs" in 

    use at 15lst05.pl line 14.

The second print statement, even though obviously wrong, does not generate any errors. Imagine if you were using a complicated data structure such as the ones described in Chapter 8 "References." You could spend hours looking for a bug like this. After the strict pragma is turned on, however, a runtime error is generated when the same print statement is repeated. Perl even displays the value of the scalar that attempted to masquerade as the reference value.

The strict pragma ensures that all variables that are used are either local to the current block or they are fully qualified. Fully qualifying a variable name simply means to add the package name where the variable was defined to the variable name. For example, you would specify the $numTables variable in package Room by saying $Room::numTables. If you are not sure which package a variable is defined in, try using the dispSymbols() function from Listing 15.3. Call the dispSymbols() function once for each package that your script uses.

The last type of error that strict will generate an error for is the non-quoted word that is not used as a subroutine name or file handle. For example, the following line is good:


$SIG{'PIPE'} = 'Plumber';

And this line is bad:


$SIG{PIPE} = 'Plumber';

Perl 5, without the strict pragma, will do the correct thing in the bad situation and assume that you meant to create a string literal. However, this is considered bad programming practice.

Tip
Always use the strict pragma in your scripts. It will take a little longer to declare everything, but the time saved in debugging will more than make up for it.

The Standard Modules

Table 15.2 lists the modules that should come with all distributions of Perl. Some of these modules are not portable across all operating systems, however. The descriptions for the modules mention the incompatibility if I know about it.

Table 15.2  Perl's Standard Modules

ModuleDescription
Text::AbbrevCreates an abbreviation table from a list. The abbreviation table consists of the shortest sequence of characters that can uniquely identify each element of the list.
AnyDBM_FileProvides a framework for accessing multiple DBMs. This is a UNIX-based module.
AutoLoaderLoads functions on demand. This enables your scripts to use less memory.
AutoSplitSplits a package or module into its component parts for autoloading.
BenchmarkTracks the running time of code. This module can be modified to run under Windows but some of its functionality will be lost.
CarpProvides an alternative to the warn() and die() functions that report the line number of the calling routine. See "Example: The Carp Module" later in the chapter for more information.
I18N::CollateCompares 8-bit scalar data according to the current locale. This helps to give an international viewpoint to your script.
ConfigAccesses the Perl configuration options.
CwdGets the pathname of the current working directory. This module will generate a warning message when used with the -w command line option under the Windows and VAX VMS operating systems. You can safely ignore the warning.
DynaloaderLets you dynamically load C libraries into Perl code.
EnglishLets you use English terms instead of the normal special variable names.
EnvLets you access the system environment variables using scalars instead of a hash. If you make heavy use of the environment variables, this module might improve the speed of your script.
ExporterControls namespace manipulations.
FcntlLoads file control definition used by the fcntl() function.
FileHandleProvides an object-oriented interface to filehandles.
File::BasenameSeparates a file name and path from a specification.
File::CheckTreeRuns filetest checks on a directory tree.
File::FindTraverse a file tree. This module will not work under the Windows operating systems without modification.
GetoptProvides basic and extended options processing.
ExtUtils::MakeMakerCreates a Makefile for a Perl extension.
Ipc::Open2Opens a process for both reading and writing.
Ipc::Open3Opens a process for reading, writing, and error handling.
POSIXProvides an interface to IEEE 1003.1 namespace.
Net::PingChecks to see if a host is available.
SocketLoads socket definitions used by the socket functions.

strict, my() and Modules

In order to use the strict pragma with modules, you need to know a bit more about the my() function about how it creates lexical variables instead of local variables. You may be tempted to think that variables declared with my() are local to a package, especially since you can have more than one package statement per file. However, my() does the exact opposite; in fact, variables that are declared with my() are never stored inside the symbol table.

If you need to declare variables that are local to a package, fully qualify your variable name in the declaration or initialization statement, like this:


use strict;



$main::foo = '';



package Math;

    $Math::PI = 3.1415 && $Math::PI;

This code snippet declares two variables: $foo in the main namespace and $PI in the Math namespace. The && $Math::PI part of the second declaration is used to avoid getting error messages from the -w command line option. Since the variable is inside a package, there is no guarantee that it will be used by the calling script and the -w command line option generates a warning about any variable that is only used once. By adding the harmless logical and to the declaration, the warning messages are avoided.

Module Examples

This section shows you how to use the Carp, English, and Env modules. After looking at these examples, you should feel comfortable about trying the rest.

Example: The Carp Module

This useful little module lets you do a better job of analyzing runtime errors-like when your script can't open a file or when an unexpected input value is found. It defines the carp(), croak(), and confess() functions. These are similar to warn() and die(). However, instead of reported in the exact script line where the error occurred, the functions in this module will display the line number that called the function that generated the error. Confused? So was I, until I did some experimenting. The results of that experimenting can be found in Listing 15.6.

Load the Carp module.
Invoke the strict pragma.
Start the Foo namespace.
Define the
foo() function.
Call the
carp() function.
Call the
croak() function.
Switch to the main namespace.
Call the
foo() function.

Listing 15.6  15LST06.PL-Using the carp() and croak() from the Carp Module

use Carp;

use strict;



package Foo;

    sub foo {

        main::carp("carp called at line " . __LINE__ .

            ",\n    but foo() was called");



        main::croak("croak called at line " . __LINE__ .

            ",\n    but foo() was called");

}



package main;

    foo::foo();


This program displays:


carp called at line 9, 

    but foo() was called at e.pl line 18

croak called at line 10, 

    but foo() was called at e.pl line 18

This example uses a compiler symbol, __LINE__, to incorporate the current line number in the string passed to both carp() and croak(). This technique enables you to see both the line number where carp() and croak() were called and the line number where foo() was called.

The Carp module also defines a confess() function which is similar to croak() except that a function call history will also be displayed. Listing 15.7 shows how this function can be used. The function declarations were placed after the foo() function call so that the program flow reads from top to bottom with no jumping around.

Load the Carp module.
Invoke the strict pragma.
Call
foo().
Define
foo().
Call
bar().
Define
bar().
Call
baz().
Define
baz().
Call
Confess().

Listing 15.7  15LST07.PL-Using confess() from the Carp Module

use Carp;

use strict;



foo();



sub foo {

    bar();

}



sub bar {

    baz();

}



sub baz {

    confess("I give up!");

}


This program displays:


I give up! at e.pl line 16

        main::baz called at e.pl line 12

        main::bar called at e.pl line 8

        main::foo called at e.pl line 5

This daisy-chain of function calls was done to show you how the function call history looks when displayed. The function call history is also called a stack trace. As each function is called, the address from which it is called gets placed on a stack. When the confess() function is called, the stack is unwound or read. This lets Perl print the function call history.

Example: The English Module

The English module is designed to make your scripts more readable. It creates aliases for all of the special variables that were discussed in Chapter 12, "Using Special Variables." Table 15.3 lists all of the aliases that are defined. After the table, some examples show you how the aliases are used.

Note
Some of the same concepts embodied by the special variables are used by the UNIX-based awk program. The English module also provides aliases that match what the special variables are called in awk.

Tip
I think that this module is especially useful because it provides aliases for the regular expression matching special variables and the formatting special variables. You'll use the other special variables often enough so that their use becomes second nature. Or else you won't need to use them at all.

Table 15.3  Aliases Provided by the English Module

Special Variable
Alias
Miscellaneous
$_
$ARG
@_
@ARG
$"
$LIST_SEPARATOR
$;
$SUBSCRIPT_SEPARATOR or $SUBSEP
Regular Expression or Matching
$&
$MATCH
$`
$PREMATCH
$POSTMATCH
$+
$LAST_PAREN_MATCH
Input
$.
$INPUT_LINE_NUMBER or $NR
$/
$INPUT_RECORD_SEPARATOR or $RS
Output
$|
$OUTPUT_AUTOFLUSH
$,
$OUTPUT_FIELD_SEPARATOR or $OFS
$\
$OUTPUT_RECORD_SEPARATOR or $ORS
Formats
$%
$FORMAT_PAGE_NUMBER
$=
$FORMAT_LINES_PER_PAGE
$_
$FORMAT_LINES_LEFT
$~
$FORMAT_NAME
$^
$FORMAT_TOP_NAME
$:
$FORMAT_LINE_BREAK_CHARACTERS
$^L
$FORMAT_FORMFEED
Error Status
$?
$CHILD_ERROR
$!
$OS_ERROR or $ERRNO
$@
$EVAL_ERROR
Process Information
$$
$PROCESS_ID or $PID
$<
$REAL_USER_ID or $UID
$>
$EFFECTIVE_USER_ID or $EUID
$(
$REAL_GROUP_ID or $GID
$)
$EFFECTIVE_GROUP_ID or $EGID
$0
$PROGRAM_NAME
Internal Variables
$]
$PERL_VERSION
$^A
$AccUMULATOR
$^D
$DEBUGGING
$^F
$SYSTEM_FD_MAX
$^I
$INPLACE_EDIT
$^P
$PERLDB
$^T
$BASETIME
$^W
$WARNING
$^X
$EXECUTABLE_NAME

Listing 15.8 shows a program that uses one of the English variables to access information about a matched string.

Load the English module.
Invoke the strict pragma.
Initialize the search space and pattern variables.
Perform a matching operation to find the pattern
in the
$searchSpace variable.
Display information about the search.
Display the matching string using the English variable names.
Display the matching string using the standard Perl special variables.

Listing 15.8  15LST01.PL-Using the English Module

use English;

use strict;



my($searchSpace) = "TTTT BBBABBB DDDD";

my($pattern)     = "B+AB+";



$searchSpace =~ m/$pattern/;



print("Search space:   $searchSpace\n");

print("Pattern:        /$pattern/\n");

print("Matched String: $English::MATCH\n");  # the English variable

print("Matched String: $&\n");               # the standard Perl variable


This program displays


Search space:   TTTT BBBABBB DDDD

Pattern:        /B+AB+/

Matched String: BBBABBB

Matched String: BBBABBB

You can see that the $& and $MATCH variables are equivalent. This means that you can use another programmer's functions without renaming their variables and still use the English names in your own functions.

Example: The Env Module

If you use environment variables a lot, then you need to look at the Env module. It will enable you to directly access the environment variables as Perl scalar variables instead of through the %Env hash. For example, $PATH is equivalent to $ENV{'PATH'}.

Load the Env module.
Invoke the strict pragma.
Declare the
@files variable.
Open the temporary directory and read all of its files.
Display the name of the temporary directory.
Display the names of all files that end in tmp.

Listing 15.9  15LST09.PL-Displaying Temporary Files Using the Env Module

use Env;

use strict;



my(@files);



opendir(DIR, $main::TEMP);

    @files = readdir(DIR);

closedir(DIR);



print "$main::TEMP\n";

foreach (@files) {

    print("\t$_\n") if m/\.tmp/i;

}


This program displays:


C:\WINDOWS\TEMP

        ~Df182.TMP

        ~Df1B3.TMP

        ~Df8073.TMP

        ~Df8074.TMP

        ~WRS0003.tmp

        ~Df6116.TMP

        ~DFC2C2.TMP

        ~Df9145.TMP

This program is pretty self-explanatory, except perhaps for the manner in which the $main::TEMP variable is specified. The strict pragma requires all variables to be lexically declared or to be fully qualified. The environment variables are declared in the Env package, but exported into the main namespace. Therefore, they need to be qualified using the main:: notation.

Summary

In this chapter, you learned about Perl modules. You read about several guidelines that should be followed when creating modules. For example, package name should have their first letter capitalized and use file extensions of pm.

The require compiler directive is used to load Perl libraries that were distributed with Perl 4. Modules, however, are loaded with the use directive. In addition to loading the module, use will move variable and function names into the main namespace where your script can easily access them. The name movement is done by using the @EXPORT and @EXPORT_OK arrays.

Next, you read about the BEGIN and END blocks which are like module constructors and destructors. The BEGIN block is evaluated as soon as it is defined. END blocks are evaluated just before your program ends-in reverse order. The last END block defined is the first to be evaluated.

Symbols tables are used to hold the function and variable names for each package. You learned that each symbol table is stored in a hash named after the package name. For example, the symbol table for the Room package is stored in %Room::. Listing 15.3 contained a function-dispSymbol-that displays all of the names in a given symbol table.

Libraries are loaded using the require compiler directive and modules are loaded with the use directive. Unlike the require directive, use will automatically call a module's import() function to move function and variable names from the module's namespace into the main namespace. The name movement is controlled using the @EXPORT and @EXPORT_OK array. Names in @EXPORT are always exported. Those in @EXPORT_OK must be explicitly mentioned in the use statement.

The use directive also controls other directives which are called pragmas. The most useful pragmas are integer and strict. Use the integer pragma when you need fast integer math. And use strict all of the time to enforce good programming habits-like using local variables.

Table 15.2 shows the 25 modules that are distributed with Perl. And then some more light was shed on how the my() function won't create variables that are local to a package. In order to create variables in the packages' namespace, you need to fully qualify them with the package name. For example, $Math::PI or $Room::numChairs.

The last section of the chapter looked at specific examples of how to use modules. The Carp, English, and Env modules were discussed. Carp defines three functions: carp(), croak(), and confess() that aid in debugging and error handling. English provides aliases for all of Perl's special variables so that Perl code is easier to understand. Env provides aliases for environmental variables so that you can access them directly instead of through the %Env hash variable.

In the next chapter, you learn about debugging Perl code. You read about syntax or compile-time errors versus runtime errors. The strict pragma will be discussed in more detail.

Review Questions

Answers to Review Questions are in Appendix A.

  1. What is a module?
  2. How is a module different from a library?
  3. What is the correct file extension for a module?
  4. What is a pragma?
  5. What is the most important pragma and why?
  6. What does the END block do?
  7. What is a symbol table?
  8. How can you create a variable that is local to a package?

Review Exercises

  1. Write a program that uses BEGIN and END blocks to write a message to a log file about the start and end times for the program.
  2. Use the English module to display Perl's version number.
  3. Modify the dispSymbols() function from Listing 15.3 to display only function and variable names passed as arguments.
  4. Execute the program in Listing 15.5 with the -w command line option. Describe the results.
  5. Write a module to calculate the area of a rectangle. Use the @EXPORT array to export the name of your function.