to uppercase until a \E sequence is encountered.
\v The vertical tab character.
\w Match a single word character. Word characters are the alphanumeric and underscore characters.
\W Match a single non-word character.
\xnn Any hexadecimal byte.
\Z This meta-sequence represents the end of the string. Its meaning is not affected by the /m option.
\$ The dollar sign character.
\@ The ampersand character.
\% The percent character.



Chapter 12 -- Perl Reference Tables

Chapter 12

Perl Reference Tables


CONTENTS


This chapter includes tables for two important areas of Perl programming. First, although regular expressions are explained in Chapter 7, "Perl Overview," it is useful to have a quick reference table for the various symbols and their meanings in regular expressions. Secondly, a list of the Perl 5 standard modules is included.

Regular Expressions

A regular expression is a way of specifying a pattern so that some strings match the pattern and some strings do not. Parts of the matching pattern can be marked for use in operations such as substitution. This is a powerful tool for processing text, especially when producing text-based reports. Many UNIX utilities, such as egrep, use a form of regular expressions as a pattern-matching mechanism, and Perl has adopted this concept, almost as its own.

Like arithmetic expressions, regular expressions are made up of a sequence of legal symbols linked with legal operators. Table 12.1 lists all of these operators and symbols in one table for easy reference. If you are new to regular expressions, you may find the description in Chapter 7 informative.

Table 12.1  Regular Expression Meta-Characters, Meta-Brackets, and Meta-Sequences

ExpressionDescription
Meta-Characters 
^
This meta-character, the caret, matches the beginning of a string or, if the /m option is used, matches the beginning of a line. It is one of two pattern anchors; the other anchor is the $.
.
This meta-character will match any single character except for the newline character unless the /s option is specified. If the /s option is specified, then the newline will also be matched.
$
This meta-character will match the end of a string or, if the /m option is used, match the end of a line. It is one of two pattern anchors; the other anchor is the ^.
|
This meta-character, called alternation, lets you specify two values that can cause the match to succeed. For instance, m/a|b/ means that the $ variable must contain the "a" or "b" character for the match to succeed.
*
This meta-character indicates that the "thing" immediately to the left should be matched zero or more times in order to be evaluated as true (thus, .* matches any number of characters).
+
This meta-character indicates that the "thing" immediately to the left should be matched one or more times in order to be evaluated as true.
?
This meta-character indicates that the "thing" immediately to the left should be matched zero or one times to be evaluated as true. When used in conjunction with the +, ?, or {n, m} meta-characters and -brackets, it means that the regular expression should be non-greedy and match the smallest possible string.
Meta-Brackets
()
The parentheses let you affect the order of pattern evaluation and act as a form of pattern memory. See Chapter 8, "Perl Special Variables," for more details.
(?...)
If a question mark immediately follows the left parentheses, it indicates that an extended mode component is being specified; this is new to Perl 5.
(?#comment)
Extension: comment is any text.
(?:regx)
Extension: regx is any regular expression but () are not saved as a backreference.
(?=regx)
Extension: Allows matching of zero-width positive lookahead characters (that is, the regular expression is matched but not returned as being matched).
(?!regx)
Extension: Allows matching of zero-width negative lookahead characters (that is, negated form of (=regx)).
(?options)
Extension: Applies the specified options to the pattern bypassing the need for the option to specified in the normal way. Valid options are: i (case-insensitive), m (treat as multiple lines), s (treat as single line), and x (allow whitespace and comments).
{n, m}
Braces let you specify how many times the "thing" immediately to the left should be matched. {n} means that it should be matched exactly n times. {n,} means it must be matched at least n times. {n, m} means that it must be matched at least n times but not more than m times.
[]
Square brackets let you create a character class. For instance, m/[abc]/ evaluates to true if any of a, b, or c is contained in $_. The square brackets are a more readable alternative to the alternation meta-character.
Meta-Sequences
\
This meta-character "escapes" the character which follows. This means that any special meaning normally attached to that character is ignored. For instance, if you need to include a dollar sign in a pattern, you must use \$ to avoid Perl's variable interpolation. Use \\ to specify the backslash character in your pattern.
\nnn
Any octal byte where nnn represents the octal number; this allows any character to be specified by its octal number.
\a
The alarm character; this is a special character which, when printed, produces a warning bell sound.
\A
This meta-sequence represents the beginning of the string. Its meaning is not affected by the /m option.
\b
This meta-sequence represents the backspace character inside a character class; otherwise, it represents a word boundary. A word boundary is the spot between word (\w) and non-word (\W) characters. Perl thinks that the \W meta-sequence matches the imaginary characters of the end of the string.
\B
Match a non-word boundary.
\cn
Any control character where n is the character (for example, \cY for Ctrl+Y).
\d
Match a single digit character.
\D
Match a single non-digit character.
\e
The escape character.
\E
Terminate the \L or \U sequence.
\f
The form feed character.
\G
Match only where the previous m//g left off.
\l
Change the next character to lowercase.
\L
Change the following characters to lowercase until a \E sequence is encountered.
\n
The newline character.
\Q
Quote regular expression meta-characters literally until the \E sequence is encountered.
\r
The carriage return character.
\s
Match a single whitespace character.
\S
Match a single non-whitespace character.
\t
The tab character.
\u
Change the next character to uppercase.
\U
Change the following characters to uppercase until a \E sequence is encountered.
\v
The vertical tab character.
\w
Match a single word character. Word characters are the alphanumeric and underscore characters.
\W
Match a single non-word character.
\xnn
Any hexadecimal byte.
\Z
This meta-sequence represents the end of the string. Its meaning is not affected by the /m option.
\$
The dollar sign character.
\@
The ampersand character.
\%
The percent character.

Perl 5 Standard Modules

This is a list of the standard modules that come with Perl 5 along with a brief description.

For a list of all current modules, including many extra non-standard modules other than those listed here, see the CPAN archive. The contents of the Perl Module List are at
ftp://ftp.funet.fi/pub/languages/perl/CPAN/modules/00modlist.long.html. The modules of the Perl Module List sorted by authors are at ftp://ftp.funet.fi/pub/
languages/perl/CPAN/modules/by-authors
. The modules of the Perl Module List sorted by category are at ftp://ftp.funet.fi/pub/languages/perl/CPAN/modules/by-category. The modules of the Perl Module List sorted by module are at ftp://ftp.funet.fi/pub/languages/perl/CPAN/modules/by-module.

Module NameDescription
AnyDBM_File Accesses external databases.
AutoLoader Special way of loading subroutines on demand.
AutoSplit Special way to set up modules for use of AutoLoader.
Benchmark Time code for benchmarking.
Carp Reports errors across modules.
Config Reports compiler options used when Perl installed.
Cwd Functions to manipulate current directory.
DB_File Accesses Berkley DB files.
Devel::SelfStubber Allows correct inheritance autoloaded methods.
Diagnostics pragma; enables diagnostic warnings.
DynaLoader Used by modules which link to C libraries.
English pragma; allows the use of long special variable names.
Env Allows access to environment variables.
Exporter Standard way for modules to export subroutines.
ExtUtils::Liblist Examines C libraries.
ExtUtils::MakeMaker Creates makefiles for extension modules.
ExtUtils::Manifest Helps maintain a MANIFEST file.
ExtUtils::Miniperl Used by makefiles generated by ExtUtils::MakeMaker.
ExtUtils::Mkbootstrap Used by makefiles generated by ExtUtils::MakeMaker.
Fcntl Accesses to C Fcntl.h.
File::Basename Parses filenames according to various operating system rules.
File::CheckTree Multiple file tests.
File::Find Finds files according to criteria.
File::Path Creates/deletes directories.
FileHandle Allows object syntax for file handles.
Getopt::Long Uses POSIX style command line options.
Getopt::Std Uses single letter command line options.
I18N::Collate Uses POSIX local rules for sorting 8-bit strings.
Integer pragma; uses integer arithmetic.
IPC::Open2 Inter-Process Communications (process with