Today's lesson describes the options you can specify to control how your Perl program operates. These options provide many features, including those that perform the following tasks:
Today's lesson begins with a description of how to supply options to your Perl program.
There are two ways to supply options to a Perl program:
The following sections describe these methods of supplying options.
One way to specify options for a Perl program is to enter them on the command line when you enter the command that starts your program.
The syntax for specifying options on the command line is
perl options program
Here, program is the name of the Perl program you want to run, and options is the list of options you want to supply to the program.
For example, the following command runs the Perl program named test1 and passes it the options -s and -w. (You'll learn about these and other options later today.)
$ perl -s -w test1
Some options need to be specified along with a value. For example, the -0 option requires an integer to be passed with it:
$ perl -0 26 test1
Here, the integer 26 is associated with the option -0.
If you want, you can omit the space between the option and its associated value, as in the following:
$ perl -026 test1
As before, this command associates 26 with the -0
option. In either case, the value associated with an option must
always immediately follow the option.
NOTE |
If an option does not require an associated value, you can put another option immediately after it without specifying an additional - character or space. For example, the following commands are equivalent: $ perl -s -w test1 You can put an option that requires a value as part of a group of options, provided that it is last in the group. For example, the following commands are equivalent: $ perl -s -w -0 26 test1 |
Another way to specify a command option is to include it as part of the header comment for the program. For example, suppose that the first line of your Perl program is this:
#!/usr/local/bin/perl -w
In this case, the -w option is automatically specified
when you start the program.
Perl 4 enables you to specify only one option (or group of options) on the header comment line. This means that the following line generates an "unrecognized switch" error message: #!/usr/local/bin/perl -w -s Perl 5 enables as many switches as you like on the command line. However, some operating systems chop the header line after 32 characters, so be careful if you are planning to use a large number of switches |
NOTE |
Options specified on the command line override options specified in the header comment. For example, if your header comment is #!/usr/local/bin/perl -w and you start your program with the command $ perl -s test1 the program will run with the -s option specified but not the -w option |
The -v option enables you to find out what version of Perl is running on your machine. When the Perl interpreter sees this option, it prints information on itself and then exits without running your program.
This means that if you supply a command such as the following, the file test1 is not executed:
$ perl -v test1
Here is sample output from the -v command:
This is perl, version 5.001 Unofficial patch level 1m Copyright (c) 1987-1994, Larry Wall Perl may be copied only under the terms of either the Artistic License or the GNU General Public License, which may be found in the Perl 5.0 source kit.
The only really useful things here, besides the copyright notice, are the version number of the Perl you are running-in this case, 4.0-and the patch level, which indicates how many repairs, or patches, have been made to this version. Here, the patch level is 36 (which, at this writing, is the latest release of Perl version 4.0).
No other options should be specified if you specify the -v option, because none of them would do anything in this case anyway.
The -c option tells the Perl interpreter to check whether your Perl program is correct without actually running it. If it is correct, the Perl interpreter prints the following message (in which filename is the name of your program) and then exits without executing your program:
filename syntax OK
If the Perl interpreter detects errors, it displays them just as it normally does. After printing the error messages, it prints the following message, in which filename is the name of your program:
filename had compilation errors
Again, there is no point in supplying other options if you specify the -c option because the Perl interpreter isn't actually running the program; the only exception is the -w option, which prints warnings. This option is described in the following section.
As you have seen on the preceding days, some mistakes are easy to make when you are writing a Perl program, such as accidentally typing the wrong variable name, or using == when you really mean to use eq. Because certain mistakes crop up frequently, the Perl interpreter provides an option that checks for them.
This option, the -w option, prints a warning every time the Perl interpreter sees something that might cause a problem. For example, if the interpreter sees the statement
$y = $x;
and hasn't seen $x before (which means that $x is undefined), it prints a warning message in the following form if you are running Perl 4:
Possible typo: "x" at filename line linenum.
Here, filename is the name of your Perl program, and linenum is the number of the line on which the interpreter has detected a potential problem.
If you are running Perl 5, the message is similar, but also includes the name of the current package:
Identifier "main::x" used only once: possible typo at filename line linenum.
For more information on packages, see Day 19, "Object-Oriented Programming in Perl."
The following sections provide a partial list of the potential
problems detected by the -w option. (If you are running
Perl 5, the -w option provides dozens of useful warnings.
Consult the Perl manual pages for a complete list.)
NOTE |
The -w option can be combined with the -c option to provide a means of checking your syntax for errors and problems before you actually run the program |
As you have seen, a statement such as the following one leads to a warning message if $x has not been previously defined:
$y = $x;
The "possible typo" error message also appears in the following circumstances, among others:
Of course, the possible-typo message might flag lines that don't actually contain typos. Following are two of the most common situations in which a possible typo actually is correct code:
format BLANK = .
Possible typo: "BLANK" at file1 line 26.
$~ = "BLANK";
($d1, $d2, $groupid) = getgrnam ($groupname);
One useful feature of the -w option is that it checks whether two subroutines of the same name have been defined in the program. (Normally, if the Perl interpreter sees two subroutines of the same name, it quietly replaces the first subroutine with the second one and carries on.)
If, for example, two subroutines named x are defined in a program, the -w option prints a message similar to the following one:
Subroutine x redefined at file1 line 46.
The line number specified is the line that starts the second subroutine.
When the -w option has detected this problem, you can decide which subroutine to rename or throw away.
Another really helpful feature of the -w option is that it checks whether you are trying to compare a string using the == operator.
In a statement such as the following:
if ($x == "humbug") { ... }
the conditional expression
$x == "humbug"
is equivalent to the expression
$x == 0
because all character strings are converted to 0 when used in a numeric context (a place where a number is expected). This is correct in Perl, but it is not likely to be what you want.
If the -w option is specified and the Perl interpreter sees a statement such as this one, it prints a message similar to the following if you are running Perl 4:
Possible use of == on string value at file1 line 26.
In Perl 5, the following warning is printed:
Argument "humbug" isn't numeric for numeric eq at file1 line 26.
In either case, this warning enables you detect these incorrect
== operators and replace them with eq operators,
which compare strings.
The -w operator doesn't detect the opposite problem, namely: if ($x eq 46) { In this case, the Perl interpreter converts 46 to the string 46 and performs a string comparison. Because a number and its string equivalent usually mean the same thing, this normally doesn't cause a problem. Watch out, though, for octal numbers in string comparisons, as in the following example: if ($x eq 046) { Here, the octal value 046 is converted to the number 38 before being converted to a string. If you really want to compare $x to 046, this code will not produce the results you expect. Another thing to watch out for is this: In Perl 4, the -w option does not check for conditional expressions such as the following: if ($x = 0) { because there are many cases in Perl in which the = assignment operator belongs inside a conditional expression. You will have to manually check that you are not specifying = (assignment) when you really mean to use == (equality comparison). Perl 5 flags this with the following message: Found = in conditional, should be == at filename line filenum |
The -e option enables you to execute a Perl program from your shell command line. For example, the command
$ perl -e "print ('Hello');"
prints the following string on your screen:
Hello
You can also specify multiple -e options. In this case, the Perl statements are executed left to right. For example, the command
$ perl -e "print ('Hello');" -e "print (' there');"
prints the following string on your screen:
Hello there
By itself, the -e option is not all that useful. It becomes
useful, however, when you use it in conjunction with some of the
other options you'll see in today's lesson.
You can leave off the closing semicolon in a Perl statement passed via the -e option, if you want to: $ perl -e "print ('Hello')" If you are supplying two or more -e options, however, the Perl interpreter strings them together and treats them as though they are a single Perl program. This means that the following command generates an error because there must be a semicolon after the statement specified with the first -e option: $ perl -e "print ('Hello')" -e "print (' there') |
As you can see from this chapter, you can control the behavior of Perl by specifying various command-line options. You can control the behavior of your own Perl programs by spec-ifying command-line options for them too. To do this, specify the -s option when you call the program.
Here's an example of a command that passes an option to a Perl program:
$ perl -s testfile -q
This command starts the Perl program testfile and passes
it the -q option.
To be able to pass options to your program, you must specify the Perl-s option. The following command does not pass -q as an option: $ perl testfile -q In this case, -q is just an ordinary argument that is passed to your program and stored in the built-in array variable @ARGV. The easiest way to remember to include -s is to specify it as part of your header comment: #!/usr/local/bin/perl -s This ensures that your program always will check for options. (Unless, of course, you override the option check by providing other Perl options on the command line when you invoke the program. |
If an option is specified when you invoke your Perl program, the scalar variable whose name is the same as the option is automatically set to 1 before program execution begins. For example, if a Perl program named testfile is called with the -q option, as in the following, the scalar variable $q is automatically set to 1:
$ perl -s testfile -q
You then can use this variable in a conditional expression to
test whether the option has been set.
NOTE |
If -q is treated as an option, it does not appear in the system variable @ARGV. A command-line argument either sets an option or is added to @ARGV |
Options can be longer than a single character. For example, the following command sets the value of the scalar variable $potato to 1:
$ perl -s testfile -potato
You also can set an option to a value other than 1 by specifying = and the desired value on the command line:
$ perl -s testfile -potato="hot"
This line sets the value of $potato to hot.
Listing 16.1 is a simple example of a program that uses command-line
options to control its behavior. This program prints information
about the user currently logged in.
Listing 16.1. An example of a program that uses command-line options.
1: #!/usr/local/bin/perl -s 2: 3: # This program prints information as specified by 4: # the following options: 5: # -u: print numeric user ID 6: # -U: print user ID (name) 7: # -g: print group ID 8: # -G: print group name 9: # -d: print home directory 10: # -s: print login shell 11: # -all: print everything (overrides other options) 12: 13: $u = $U = $g = $G = $d = $s = 1 if ($all); 14: $whoami = "whoami"; 15: chop ($whoami); 16: ($name, $d1, $userid, $groupid, $d2, $d3, $d4, 17: $homedir, $shell) = getpwnam ($whoami); 18: print ("user id: $userid\n") if ($u); 19: print ("user name: $name\n") if ($U); 20: print ("group id: $groupid\n") if ($g); 21: if ($G) { 22: ($groupname) = getgrgid ($groupid); 23: print ("group name: $groupname\n"); 24: } 25: print ("home directory: $homedir\n") if ($d); 26: print ("login shell: $shell\n") if ($s);
$ program16_1 -U -d user name: dave home directory: /ag1/dave $
The header comment in line 1 specifies that the -s option is to be automatically specified when this Perl program is invoked. This ensures that options can always be passed to this program (unless, of course, you override the -s option on the command line, as described earlier).
The comments in lines 3-11 provide information on what options the program supports. This information is useful when someone is reading or modifying the program because there is no other way to tell which scalar variables are used to test options.
The option -all indicates that the program is to print everything; if this option is specified, the scalar variable $all is set to 1. To cut down on the number of comparisons later, line 13 checks whether $all is 1; if it is, the other scalar variables corresponding to command-line options are set to 1. This technique ensures that the following commands are equivalent (assuming that your program is named program16_1):
$ program16_1 -all $ program16_1 -u -U -g -G -d -s
The scalar variables listed in line 13 can be assigned to, even though they correspond to possible command-line options, because they behave just like other Perl scalar variables.
Lines 14-17 provide the raw material for the various print operations in this program. To start, when the Perl interpreter sees the string 'whoami', it calls the system command whoami, which returns the name of the user running the program. This name is then passed to getpwnam, which searches the password file /etc/passwd and retrieves the entry for this particular user.
Line 18 checks whether the -u option has been specified. To do this, it checks whether $u has a nonzero value. If it does, the user ID is printed. (The user ID is also printed if -all has been specified because line 13 sets $u to a nonzero value in this case.)
Similarly, line 19 prints the user name if -U has been specified, line 20 prints the group ID if -g has been specified, line 25 prints the home directory if -d has been specified, and line 26 prints the filename of the login shell if -s has been specified.
Lines 21-24 check whether to print the group name. If -g
has been specified, $g is nonzero, and line 22 calls
getgrid to retrieve the group name.
NOTE |
Because command-line options can change the initial values of scalar variables, it is a good idea to always assign a value to a scalar variable before you use it. Consider the following example: #!/usr/local/bin/perl This program normally prints the numbers from 0 to 9 because $count is assumed to have an initial value of 0. However, if this program is called with the -count option, the initial value of $count becomes something other than 0, and the program behaves differently. If you add the following statement before the while loop, the program always prints the numbers 0 to 9 regardless of what options are specified on the command line: $count = 0 |
You can supply both options and command-line arguments to your program (provided that you supply the -s option to Perl). These are the rules that the Perl interpreter follows:
This means, for example, that the following command treats -w as an option to testfile, and foo and -e as ordinary arguments:
$ perl -s testfile -w foo -e
The special argument -- also indicates "end of options." For example, the following command treats -w as an option and -e as an ordinary argument. The -- is thrown away.
$ perl -s testfile -w - -e
The C preprocessor is a program that takes code written in the C programming language and searches for special preprocessor statements. In Perl, the -P option enables you to use this preprocessor with your Perl program:
$ perl -P myprog
Here, the Perl program myprog is first run through the
C preprocessor. The resulting output is then passed to the Perl
interpreter for execution.
NOTE |
Perl provides no way to just run the C preprocessor on a Perl program. To do this, you'll need a C compiler that provides an option which specifies "preprocessor only." Refer to the documentation for your C compiler for details about how to do this |
The following sections describe some of the most commonly used C preprocessor commands.
C preprocessor statements always employ the following syntax:
#command value
Each C preprocessor statement starts with a # character. command is the preprocessor operation to perform, and value is the (optional) value associated with this operation.
The most common preprocessor statement is #define. This statement tells the preprocessor to replace every occurrence of a particular character string with a specified value.
The syntax for #define is
#define macro value
This statement replaces all occurrences of the character string macro with the value specified by value. This operation is known as macro substitution. macro can contain letters, digits, or underscores.
The value specified in a #define statement can be any character string or number. For example, the following statement replaces all occurrences of USERNAME with the string "dave" (including the quotation marks):
#define USERNAME "dave"
This statement replaces EXPRESSION with the string (14+6), including the parentheses:
#define EXPRESSION (14+6)
NOTE |
When you are using #define with a value that is an expression, it is usually a good idea to enclose the value in parentheses. For example, consider the following Perl statement: $result = EXPRESSION * 5; If your preprocessor command is #define EXPRESSION 14+6 the resulting Perl statement becomes $result = 14 + 6 * 5; which assigns 44 to $result (because the multiplication is performed first). If you enclose the preprocessor expression in parentheses, as in #define EXPRESSION (14+6) the statement becomes $result = (14 + 6) * 5; which yields the result 100, which is likely what you want. Also, you always should enclose any parameters (described in the following section) in parentheses, for the same reason |
You can specify one or more parameters with your #define statement. This capability enables you to treat the preprocessor command like a simple function that accepts arguments. For example, the following preprocessor statement takes a specified value and uses it as an exponent:
#define POWEROFTWO(val) (2 ** (val))
In the Perl statement
$result = POWEROFTWO(1.3 + 2.6) + 4;
the preprocessor substitutes the expression 1.3 + 2.6 for val and produces this:
$result = (2 ** (1.3 + 2.6)) + 4;
You can supply more than one parameter with a #define statement. For example, consider the following statement:
#define EXPONENT (base, exp) ((base) ** (exp))
Now, the statement
$result = EXPONENT(4, 11);
yields the following result after preprocessing:
$result = ((4) ** (11));
The Perl interpreter ignores the extra parentheses.
TIP |
By convention, macros defined using #define normally use all uppercase letters (plus occasional digits and underscores). This makes it easier to distinguish macros from other variable names or character strings |
Listing 16.2 is an example of a Perl program that uses a #define
statement to perform macro substitution. This listing is just
Listing 15.4 with the preprocessor statement added.
Listing 16.2. A program that uses a #define statement.
1: #!/usr/local/bin/perl -P 2: 3: #define AF_INET 2 4: print ("Enter an Internet address:\n"); 5: $machine = <STDIN>; 6: $machine =~ s/^\s+|\s+$//g; 7: @addrbytes = split (/\./, $machine); 8: $packaddr = pack ("C4", @addrbytes); 9: if (!(($name, $altnames, $addrtype, $len, @addrlist) = 10: gethostbyaddr ($packaddr, AF_INET))) { 11: die ("Address $machine not found.\n"); 12: } 13: print ("Principal name: $name\n"); 14: if ($altnames ne "") { 15: print ("Alternative names:\n"); 16: @altlist = split (/\s+/, $altnames); 17: for ($i = 0; $i < @altlist; $i++) { 18: print ("\t$altlist[$i]\n"); 19: } 20: }
$ program16_2 Enter an Internet address: 128.174.5.59 Principal name: ux1.cso.uiuc.edu $
Line 3 defines the macro AF_INET and assigns it the value 2. When the C preprocessor sees AF_INET in line 10, it replaces it with 2, which is the value of AF_INET on the current machine (as specified in the header file /usr/include/netdb.h or /usr/include/bsd/netdb.h).
If this program is moved to a machine that defines a different value for AF_INET, all you need to do to get this program to work is change line 3 to use the value on that machine.
You can use a previously defined macro as the value in another #define statement. The following is an example:
#define FIRST 1 #define SECOND FIRST $result = 43 + SECOND;
Here, the macro FIRST is defined to be equivalent to the value 1, and SECOND is defined to be equivalent to FIRST. This means that the statement following the macro definitions is equivalent to the following statement:
$result = 43 + 1;
The #ifdef and #endif statements control whether a given group of statements is to be included as part of your program.
The syntax for the #ifdef and #endif statements is
#ifdef macro code #endif
Here, macro is any character string that can appear in a #define statement. code is one or more lines of your Perl program.
When the C preprocessor sees an #ifdef statement, it
checks whether the macro has been defined using the #define
statement. If it has, the code specified by code is included
as part of the program. If it has not, the code specified by code
is skipped.
NOTE |
The code enclosed by #ifdef and #endif does not have to be a complete Perl statement. For example, the following code is legal: $result = 14 * 2 Here, $result is assigned 17 if PLUSONE is defined, 16 if it's not. Be careful, though: If you abuse #ifdef, the resulting program might become difficult to read |
The #ifndef and #else statements provide additional control over when parts of your program are to be executed.
The #ifndef statement enables you to define code that is to be executed when a particular macro is not defined.
The syntax for #ifndef is the same as for #ifdef:
#ifndef macro code #endif
For example:
#ifndef MYMACRO $result = 26; #endif
The assignment is performed only if MYMACRO has not appeared in a #define statement.
The #else statement enables you to specify code to be executed if a macro is defined and an alternative to choose if the macro is not defined. For example:
#ifdef MYMACRO $result = 47; #else print ("Hello, world!\n"); #endif
Here, if MYMACRO has been defined by a #define statement, the following statement is exe-cuted:
$result = 47;
If MYMACRO has not been defined, the following statement is executed:
print ("Hello, world!\n");
You can use #else with #ifndef, as in the following:
#ifndef MYMACRO print ("Hello, world!\n"); #else $result = 47; #endif
This code is identical to the #ifdef-#else-#endif sequence shown earlier in this section.
The #if statement enables you to specify that certain lines of your program are to be included only if the expression included with the statement is nonzero.
The syntax for the #if statement is
#if expr code #endif
Here, expr is the expression to be evaluated, and code is the code to be executed if expr is nonzero.
For example, the following statement is executed only if the expression 14 + 3 is nonzero (which it always is, of course):
#if 14 + 3 $result = 26; #endif
You can use a macro definition as part of an #if statement. If the macro is defined, it has a nonzero value in an #if expression; if it is not defined, it has the value zero. Consider the following example:
#if MACRO1 || MACRO2 $result = 47; #endif
When the preprocessor sees the #if statement, it evaluates the expression MACRO1 || MACRO2. This expression has a nonzero value if either MACRO1 or MACRO2 is nonzero. Therefore, the following statement is executed if either MACRO1 or MACRO2 is defined:
$result = 47;
The #if statement provides a quick way to remove lines of code from your program temporarily:
#if 0 $result = 46; print ("This line is not printed right now.\n"); #endif
Here, the expression included with the #if statement is always zero, which means that the statements between #if and #endif are always skipped.
You can use #else with #if, as in the following example:
#if MACRO1 || MACRO2 print ("MACRO1 or MACRO2 is defined.\n"); #else print ("MACRO1 and MACRO2 are not defined.\n"); #endif
This code includes the first print statement if MACRO1
or MACRO2 has been defined using #define, and
it includes the second print statement if neither has been defined.
You cannot use the ** (exponentiation) operator in an #if statement because ** is not supported in the C programming language |
You can put one #ifdef-#else-#endif construct inside another. For example:
#ifdef MACRO1 #ifdef MACRO2 print ("MACRO1 yes, MACRO2 yes\n"); #else print ("MACRO1 yes, MACRO2 no\n"); #endif #else #ifdef MACRO2 print ("MACRO1 no, MACRO2 yes\n"); #else print ("MACRO1 no, MACRO2 no\n"); #endif #endif
You also can put an #if-#else-#endif construct or an #ifndef-#else-#endif construct inside an #ifdef-#else-#endif construct, or vice versa. The only restriction is that the inner construct must be completely contained in one part of the outer construct.
Another preprocessor command that is quite useful is the #include command. This command tells the C preprocessor to include the contents of the specified file as part of the program.
The syntax for the #include command is
#include filename
filename is the name of the file to be included.
For example, the following command includes the contents of myincfile.h as part of the program:
#include <myincfile.h>
When an #include statement is found in a Perl program, the C preprocessor searches for the file in the current directory and the /usr/local/lib/perl directory. (The -I option, described in the following section, enables you to search in other directories.) To instruct the C preprocessor to search only the current directory, enclose the filename in double quotation marks rather than angle brackets.
#include "myincfile.h"
This command limits the search for myincfile.h to the current directory.
You can specify an entire pathname in an #include statement, as in the following example:
#include "/u/dave/myincfile.h"
This command retrieves the contents of /u/dave/myincfile.h
and adds them to the program.
NOTE |
Perl also enables you to include other files as part of a program using the require statement. For more information on require, refer to |
You use the -I option with the -P option. It enables you to specify where to look for include files to be processed by the C preprocessor. For example:
perl -P -I /u/dave/myincdir testfile
This command tells the Perl interpreter to search the directory /u/dave/myincdir for include files (as well as the default directories).
To specify multiple directories to search, repeat the -I option:
perl -P -I /u/dave/dir1 -I /u/dave/dir2 testfile
This command searches in both /u/dave/dir1 and /u/dave/dir2.
NOTE |
The directories specified in the -I option also are added to the system variable @INC. This technique ensures that the require function can search in the same directories as the C preprocessor. For more information on @INC, refer to Day 17, "System Variables." For more information on require, refer to Day 19 |
One of the most common tasks in Perl programs and in UNIX commands is to read the contents of several input files one line at a time and process each input line as it is read. In these programs and commands, the names of the input files are supplied on the command line. A simple example is the UNIX command cat:
$ cat file1 file2 file3 ...
This command reads one line of input at a time and writes it to the standard output file.
In Perl, one way to read the contents of several input files, one line at a time, is to enclose the <> operator in a while loop:
while ($line = <>) { # process $line in here }
Another method is to specify the -n option. This option takes your program and executes it once for each line of input in each of the files specified on the command line.
Listing 16.3 is a simple example of a program that uses the -n
option. It puts asterisks around each input line and then prints
it.
Listing 16.3. A simple program that uses the -n option.
1: #!/usr/local/bin/perl -n 2: 3: # input line is stored in the system variable $_ 4: $line = $_; 5: chop ($line); 6: printf ("* %-52s *\n", $line);
$ program16_3 * This test file has only one line in it. * $
The -n option encloses the program shown here in an invisible while loop. Each time the program is executed, the next line of input from one of the input files is read and is stored in the system variable $_. Line 4 takes this line and copies it into another scalar variable, $line; line 5 then removes the last character-the trailing newline character-from this line.
Line 6 uses printf to write the input line to the standard
output file. Because printf is formatting the input,
the asterisks all appear in the same columns (column 1 and column
56) on your screen.
NOTE |
The previous program is equivalent to the following Perl program (which does not use the -n option): #!/usr/local/bin/perl |
The -n and -e options work well together. For example, the following command is equivalent to the cat command:
$ perl -n -e "print $_;" file1 file2 file3
The print $_; argument supplied with the -e option is a one-line Perl program. Because the -n option executes the program once for each input line and reads each input line into the system variable $_, the statement
print $_;
prints each input line in turn, which is exactly what the cat command does. (Note that the parentheses that normally enclose the argument passed to print have been omitted in this case.)
The previous command can be made even simpler:
$ perl -n -e "print" file1 file2 file3
By default, if no argument is supplied, print assumes that it is to print the contents of $_. And, if the program consists of a single statement, there is no need to include the closing semicolon.
The pattern matching and substitution operators also operate on $_ by default. For example, the following statement examines the contents of $_ and searches for a digit:
$found = /[0-9]/;
This default behavior makes it easy to include a search or a substitution in a single-line command. For example:
$ perl -n -e "print if /[0-9]/" file1 file2 file3
This command reads each line of the files file1, file2,
and file3. If an input line contains a digit, it is printed.
NOTE |
Several other functions use $_ as the default scalar variable to operate on, which makes those functions ideal for use with the -n and -e options. A full list of these functions is provided in the description of the $_ system variable, which is contained in Day 17 |
The -p option is similar to the -n option: it reads each line of its input files in turn. However, the -p option also prints each line it reads.
This means, for example, that you can simulate the behavior of the UNIX cat command with the following command:
$ perl -p -e ";" file1 file2 file3
Here, the ; is a Perl program consisting of one statement that does nothing.
The -p option is designed for use with the -i
option, described in the following section.
NOTE |
If both the -p and the -n options are specified, the -n option is ignored |
As you have seen, the -n and -p options read lines from the files specified on the command line. The -i option, when used with the -p option, takes the input lines being read and writes them back out to the files from which they came. This process enables you to edit files using commands similar to those used in the UNIX sed command.
For example, consider the following command:
$ perl -p -i -e "s/abc/def/g;" file1 file2 file3
This command contains a one-line Perl program that examines the
scalar variable $_ and changes all occurrences of abc
into def. (Recall that the substitution operator operates
on $_ if the =~ operator is not specified.)
The -p option ensures that $_ is assigned each
line of each input file in turn and that the program is executed
once for each input line. Thus, this command changes all occurrences
of abc in the files file1, file2, and
file3 to def.
Do not use the -i option with the -n option unless you know what you're doing. The following command also changes all occurrences of abc to def, but it doesn't write out the input lines after it changes them: $ perl -n -i -e "s/abc/def/g;" file1 file2 file3 Because the -i option specifies that the input files are to be edited, the result is that the contents of file1, file2, and file3 are completely destroyed |
The -i option also works on programs that do not use the -p option but do contain the <> operator inside a loop. For example, consider the following command:
$ perl -i file1 file2 file3
In this case, the Perl interpreter copies the first file, file1, to a temporary file and opens the temporary file for reading. Then, it opens file1 for writing and sets the default output file (the file used by calls to print, write, and printf) to be file1.
After the program finishes reading the temporary file to which file1 was copied, it then copies file2 to a temporary file, opens it for reading, opens file2 for writing, and sets the default output file to be file2. This process continues until the program runs out of input files.
Listing 16.4 is a simple example of a program that edits using
the -i option and the < > operator. This
program evaluates any arithmetic expressions (containing integers)
it sees on a single line and replaces them with their results.
Listing 16.4. A program that edits files using the -i option.
1: #!/usr/local/bin/perl -i 2: 3: while ($line = <>) { 4: while ($line =~ 5: s#\d+\s*[*+-/]\s*\d+(\s*[*+-/]\s*\d+)*#<x>#) { 6: eval ("\$result = $&;"); 7: $line =~ s/<x>/$result/; 8: } 9: print ($line); 10: }
This program produces no output because output is written to the files specified on the command line.
The <> operator at the beginning of the while loop (line 3) reads a line at a time from the input file or files. Each line is searched using the pattern shown in line 5. This pattern matches any substring containing the following elements (in the order given):
This pattern is replaced by a placeholder substring, <x>.
Lines 6 and 7 are executed once for each pattern matched in the input line. The matched pattern, an arithmetic expression, is automatically stored in the system variable $&; line 6 substitutes this expression into a character string and passes this character string to the function eval. The call to eval creates a subprogram that evaluates the expression and returns the result in the scalar variable $result. Line 7 replaces the placeholder, <x>, with the result returned in $result.
When all the arithmetic expressions have been evaluated and substituted
for, the inner while loop terminates, and line 9 calls
print. Because the -i option has been set, the
line is written back to the original input file from which it
came.
NOTE |
Even though you do not know the name of the file variable that represents the file being edited, you can still set the default output To perform this task, recall that the select function returns the file variable associated with the current default file: $editfile = select (MYFILE); # change default file After the second select call has been performed, the default output file is, once again, the file being edited |
By default, the -i option overwrites the existing input files. If you wish, you can save a copy of the original input file or files before overwriting them. To do this, specify a file extension with the -i option:
$ perl -i .old file1 file2 file3
Here, the .old file extension specified with the -i option tells the Perl interpreter to copy file1 to file1.old before overwriting it. Similarly, the interpreter copies file2 to file2.old, and file3 to file3.old.
The file extension specified with the -i option can be
any character string. By convention, file extensions usually begin
with a period; this convention makes it easier for you to spot
them when you list the files in your directory.
TIP |
If you are using the -i option with a program you are not familiar with, it is a good idea to specify a file extension. Doing so ensures that your files are not damaged if the program does not work the way you expect |
The -a option is used with the -n or -p option. If the -a option is set, each input line that is read is automatically split into a list of "words" (sequences of characters that are not white space); this list of words is stored in a special system array variable named @F.
For example, if your input file contains the line
This is a test.
and if a program that is called with the -a option reads
this line, the array @F contains
the list
("This", "is", "a", "test.")
The -a option is useful for extracting information from files. Suppose that your input files contain records of the form
company_name quantity_ordered total_cost
such as, for example,
JOHN H. SMITH 10 47.32
Listing 16.5 shows how you can use the -a option to easily
produce a program that extracts the quantity and total cost fields
from these files.
Listing 16.5. An example of the -a option.
1: #!/usr/local/bin/perl 2: 3: # This program is called with the -a and -n options. 4: while ($F[0] =~ /[^\d.]/) { 5: shift (@F); 6: next if (!defined($F[0])); 7: } 8: print ("$F[0] $F[1]\n");
$ perl -a -n program16_5 10 47.32 106 11.54 $
Because the program is called with the -a option, the array variable @F contains a list, each element of which is a word from the current input line.
Because the company name in the input file might consist of more than one word (such as JOHN H. SMITH), the while loop in lines 4-7 is needed to get rid of everything that isn't a quantity field or a total cost field. After these fields have been eliminated, line 8 can print the useful fields.
Note that this program just skips over any nonstandard input lines.
The -F option, defined only in Perl 5, is designed to be used in conjunction with the -a option, and specifies the pattern to use when you split input lines into words. For example, suppose Listing 16.5 is called as follows:
$ perl -a -n -F:: program16_5
In this case, the words in the input file are assumed to be separated by a pair of colons, which means that the program is expecting to read lines such as the following:
JOHN H. SMITH::10::47.32
NOTE |
The -F option ignores opening and closing slashes if they are present because it interprets them as pattern delimiters. This means that the following program invocations are identical: $ perl -a -n -F:: program16_5 |
In all the programs you have seen so far, when the Perl interpreter reads a line from an input file or from the keyboard, it reads until it sees a newline character. You can tell Perl that you want the "end-of-line" input character to be something other than the newline character by specifying the -0 option. (The 0 here is the digit zero, not the letter O.)
With the -0 option, you specify which character is to be the end-of-line character for your input file by providing its ASCII representation in base 8 (octal). For example, the command
$ perl -0 040 prog1 infile
calls the Perl program named prog1 and specifies that it is to use the space character (ASCII 32, or 40 octal) as the end-of-line character when it reads the input file infile (or any other input file).
This means, for example, that if this program reads an input file containing the following:
Test input. Here's another line.
it will read a total of four input lines:
The -0 option provides a quick way to read an input file
one word at a time, assuming that each line ends with at least
one blank character. (If it doesn't, you can quickly write a Perl
program that uses the -i and -p options to add
a space to the end of each line in each file.) Listing 16.6 is
an example of a program that uses -0 to read an input
file one word at a time.
Listing 16.6. A program that uses the -0 option.
1: #!/usr/local/bin/perl -0040 2: 3: while ($line = <>) { 4: $line =~ s/\n//g; 5: next if ($line eq ""); 6: print ("$line\n"); 7: }
$ program16_6 file1 This line contains five words. $
The header comment (line 1) specifies that the -0 option is to be used and that the space character is to become the end-of-line character. (Recall that you do not need a space between an option and the value associated with an option.) This means that line 3 reads from the input file until it sees a blank space.
Not everything read by line 3 is a word, of course. There are two types of lines that are not particularly useful that the program must check for:
Line 4 checks whether any newline characters are contained in the current input line. The substitution in this line is a global substitution, because an input line can contain two or more newline characters. (This occurs when an input file contains a blank line.)
After all the newline characters have been eliminated, line 5
checks whether the resulting input line is empty. If it is, the
program continues with the next input line. If the resulting input
line is not empty, the input line must be a useful word, and line
6 prints it.
NOTE |
If you specify the value 00 (octal zero) with the -0 option, the Perl interpreter reads until it sees two newline characters. This enables you to read an entire paragraph at a time. If you specify no value with the -0 option, the null character (ASCII 0) is assumed |
The -l option enables you to specify an output end-of-line character for use in print statements.
Like the -0 option, the -l option accepts a base-8 (octal) integer that indicates the ASCII representation of the character you want to use.
When the -l option is specified, the Perl interpreter does two things:
If you do not specify a value with the -l option, the
Perl interpreter uses the character specified by the -0
option, if it is defined. If -0 has not been specified,
the end-of-line character is defined to be the newline character.
If you are using both the -l and the -0 option and you do not provide a value with the -l option, the order of the options becomes significant because the options are processed from left to right. If the -l option appears first, the output end-of-line character is set to the newline character. If the -0 option appears first, the output end-of-line character (set by -l) becomes the same as the input end-of-line character (set by -0) |
Listing 16.7 is a simple example of a program that uses -l.
Listing 16.7. A program that uses the -l option.
1: #!/usr/local/bin/perl -l014 2: 3: print ("Hello!"); 4: print ("This is a very simple test program!");
$ program16_7 Hello! This is a very simple test program! $
The -l014 option in the header comment
in line 1 sets the output line character to the newline character.
This means that every print statement in the program
will have a newline character added to it. As a consequence, the
output from lines 3 and 4 appear on separate lines.
NOTE |
You can control the input and output end-of-line characters also by using the system variables $/ and $\. For a description of these system variables, refer to Day 17 |
The -x option enables you to process a Perl program that
appears in the middle of a file (such as a file containing an
electronic mail message, which usually contains some mail routing
information). When the -x option is specified, the Perl
interpreter ignores every line in the program until it sees a
header comment (a comment beginning with the #! characters).
If you are using Perl 5, the header comment must also contain the word "perl. |
After the Perl interpreter sees the header comment, it then processes the program as usual until one of the following three conditions occurs:
_ _END_ _
If the Perl interpreter reads one of the end-of-program lines (the second and third conditions listed previously), it ignores everything appearing after that line in the file.
Listing 16.8 is a simple example of a program that works if run
with the -x option.
Listing 16.8. A Perl program contained in a file.
1: Here is a Perl program that appears in the middle 2: of a file. 3: The stuff up here is junk, and the Perl interpreter 4: will ignore it. 5: The next line is the start of the actual program. 6: #!/usr/local/bin/perl 7: 8: print ("Hello, world!\n"); 9: _ _END_ _ 10: This line is also ignored, because it is not part 11: of the program.
$ program16_8 Hello, world! $
If this program is started with the -x option, the Perl interpreter skips over everything until it sees line 6. (Needless to say, if you try to run this program without specifying the -x option, the Perl interpreter will complain.) Line 8 then prints the message Hello, world.
Line 9 is the special end-of-program line. When the Perl interpreter
sees this line, it skips the rest of the program.
NOTE |
Of course, you can't specify the -x option in the header comment itself because the Perl interpreter has to know in advance that the program contains lines that must be skipped |
The following sections describe some of the more exotic options you can pass to the Perl interpreter. You are not likely to need any of these options unless you are doing something unusual (and you really know what you are doing).
The -u option tells the Perl interpreter to generate a core dump file. This file can then be examined and manipulated.
The -U option tells the Perl interpreter to enable you to perform "unsafe" operations in your program. (Basically, you'll know that an operation is considered unsafe when the Perl interpreter doesn't let you perform it without specifying the -U option!)
The -S option tells the Perl interpreter that your program
might be contained in any of the directories specified by your
PATH environment variable. The Perl interpreter checks
each of these directories in turn, in the order in which they
are specified, to see whether your program is located there. (This
is the normal behavior of the shell for commands in the UNIX environment.)
NOTE |
You need to use -S only if you are running your Perl program using the perl command, as in $ perl myprog If you are running the program using a command such as $ myprog your shell (normally) treats it like any other command and searches the directories specified in your PATH environment variable even if you don't specify the -S option |
The -D option sets the Perl interpreter's internal debugging flags. This option is specified with an integer value (for example, -D 256).
For details on this option, refer to the online manual page for
Perl.
NOTE |
The internal debugging flags specified by -D have nothing to do with the Perl debugger, which is specified by the -d option. The debugging flags specified by -D provide information on how Perl itself works, not on how your program works |
The -T option specifies that data obtained from the outside world cannot be used in any command that modifies your file system. This feature enables you to write secure programs for system administration tasks.
This option is only available in Perl 5. If you are running Perl 4, use a special version of Perl named taintperl. For details on taintperl, see the online documentation supplied with your Perl distribution.
One final option that is quite useful is -d. This option
tells the Perl interpreter to run your program using the Perl
debugger. For a complete description of the Perl debugger and
how to use it, refer to Day 21, "The Perl Debugger."
NOTE |
If you are specifying the -d option, you still can use other options |
Today you learned how to specify options when you run your Perl programs. An option is a dash followed by a single letter, and optionally followed by a value to be associated with the option. Options lacking associated values can be grouped together.
You can specify options in two ways: on the command line and in the header comment. Only one option or group of options can be supplied in the header comment.
Available options include those that list the Perl version number, check your syntax, display warnings, allow single-line programs on the command line, invoke the C preprocessor, automatically read from the input files, and edit files in place.
Q: | Why can you specify only one option in the header comment? |
A: | This is a restriction imposed by the UNIX operating system. |
Q: | Why does v display the Perl version number without running the program? |
A: | This option enables you to check whether the version of Perl you are running is capable of running your program. If an old copy of Perl is running on your machine, your program might not work properly. |
Q: | What options enable me to write a program that edits every line of a file? |
A: | Use the -i (edit in place) and -p (print each line) options. (These options are often used with the -e option to perform an editing command similar to those used by the UNIX sed command.) |
Q: | I have a program that needs to run on two or more different machines. Is there a way of writing the program that ensures that I don't have to change the program each time I change machines? |
A: | Here's how to carry out this task:
|
Q: | Why does the -p option override the -n option? |
A: | The -p option tells the Perl interpreter that you want to print each input line that you read, and the -n option tells it that you don't want to do so. These options basically contradict one another. -p overrides -n because -p is safer; if you really want -n, you can throw away the output from -p. If you really want -p and get -n, you won't get the output you want. |
The Workshop provides quiz questions to help you solidify your understanding of the material covered and exercises to give you experience in using what you've learned. Try and understand the quiz and exercise answers before you go on to tomorrow's lesson.