Chapter 2

A Brief Introduction to Perl


CONTENTS


This chapter offers a very brief introduction to Perl programming and syntax. If this is the first time you are working with Perl, do not despair at the barrage of information in this chapter. As you progress through the book, any new or elaborate syntax will be explained. This chapter is intended as an introduction to Perl, not a complete tutorial-you'll learn more about the advanced features of Perl in the subsequent chapters. If you are already familiar with Perl, you might want to glance through this chapter to get a quick overview of the syntax and reserved words.

Note
Please refer to the inside front cover for a quick reference of all the special variables in Perl.

Running Perl

Perl is a program just like any other program on your system, only it's more powerful than most other programs! To run Perl, you can simply type perl at the prompt and then type your code. In almost all cases, you'll want to keep your Perl code in files just like shell scripts. A Perl program is referred to as a script.

Normally, the Perl program on your machine will be located in the /usr/bin, /usr/bin/perl5, or /usr/local/bin/perl5 directory. Use a find command to see whether you can locate Perl on your system. If you are certain that you do not have Perl on your system, turn to Chapter 24, "Building and Installing the Perl 5 Interpreter," for information on how to install Perl on your machine. Perl scripts are of the following form:

#!/usr/bin/perl
... insert code here ...
# comments are text after the # mark.
                  # comments can begin anywhere on a line.

Here's a simple Perl script:

#/usr/bin/perl
print "\n Whoa! That was good!\n";

If the path to the Perl program on your system is different, you'll have to use that pathname instead of /usr/bin/perl. You also can specify programs on the command line with the -e switch to Perl. For example, entering the following command at the prompt will print Howdy!.

$ perl -e 'print "Howdy !\n";'

In all but the shortest of Perl programs, you'll use a file to store your Perl code as a script. Using a script file offers you the ease of not having to type all the commands interactively and thus not being able to correct typing errors easily. Also, a script file provides a written record of what commands to use to accomplish a certain task.

To fire off a command on all lines in the input, use -n option. Thus, the line

$perl -n 's/old/new/g' test.txt

runs the command to substitute all strings old with new on each line from the file test.txt. If you use the -p option, it prints each line as it is read in. The -v option prints the version number of Perl you are running. This book is written for Perl 5.002.

Now, let's begin the introduction to the Perl language.

Variables in Perl

Perl has three basic types of variables: scalars, arrays, and associative arrays. A scalar variable is anything that can hold one number (either as a floating point number or as an integer) or a string. An array stores many scalars in a sequence, where each scalar can be indexed using a number starting with 0 on up. An associative array is like an array in that it stores strings in sequence but uses another string as an index to address individual items instead of a number. I cover how to use these three types of variables in this chapter.

The syntax for a scalar variable is $variable_name. A variable name is set up and addressed in the same way as Bourne shell variables. To assign values to a scalar, you use statements like these:

$name = "Kamran";
$number= 100;
$phone_Number = '555-1232';

A variable in Perl is evaluated at runtime to derive a value that is one of the following: a string, a number, or a pointer to scalar. (To see the use of pointers and references, refer to Chapter 3, "References.")

To print out the value of a variable, you use a print statement. Therefore, to print the value of $name, you would make the following call:

print $name;

The value of $name is printed to the screen. Perl scripts "expect" input from a standard input (the keyboard) and to write to the standard output. Of course, you can also use the print statement to print the values of special variables that are built into Perl.

Special Variables

Table 2.1 lists the special variables in Perl. The first column contains the variable, and the second contains a verbose name that you can use to make the code readable. The third column in the table describes the contents of each variable.

You can use the verbose names (in column 2) by including the following line in the beginning of your code:

use English;

This statement will let you use the English.pm module in your code. (I cover the use of modules in Chapter 4, "Introduction to Perl Modules.") Not all Perl variables have an equivalent name in the English.pm module. The entry "n/a" in the second column indicates that there is not an English name for the variable.

Table 2.1. Special variables in Perl.
VariableEnglish Name Description
$_ $ARGThe default input and output pattern searching space
$1-$9 n/aThe subpattern from the last set of parentheses in a pattern match
$& $MATch The last pattern matched (RO)
$` $PREMATch The string preceding a pattern match (RO)
$POSTMATch The string following a pattern match (RO)
$+ $LAST_PAREN_MATch The last bracket matched in a pattern (RO)
$* $MULTILINE_MATchING Set to 1 to enable multi-line matching; set to 0 by default
$. $INPUT_LINE_NUMBER The current input line number; reset on close() call only
$/ $INPUT_RECORD_SEPARATOR The newline by default
$| $AUTO_FLUSH If set to 1, forces a flush on every write or print; 0 by default
$, $OUTPUT_FIELD_SEPARATOR Specifies what is printed between fields
$\ $INPUT_RECORD_SEPARATOR The output record separator for the print operator
$" $LIST_SEPARATOR The separator for elements within a list
$; $SUBSCRIPT_SEPARATOR The character for multidimensional array emulation
$# $FORMAT Output format for printed numbers
$% $FORMAT_PAGE_NUMBER The current page number
$= $FORMAT_LINES_PER_PAGE The number of lines per page
$- $FORMAT_LINES_LEFT The number of lines still left to draw on the page
$~ $FORMAT_NAME The name of the current format being used
$^ $FORMAT_TOP_NAME The name of the current top-of-page format
$: $FORMAT_LINE_BREAK_chARACTERS The set of characters after which a string can be broken up to fill with continuation characters
$^L $FORMAT_FORMFEED The default form feed operator
$^A $AccUMULATOR The current format line accumulator for format() lines
$? $chILD_ERROR The status from the last tilde command
$! $ERRNO The last errno value
$@ $EVAL_ERROR The Perl error message from the last eval statement
$$ $PROCESS_ID The process number of this Perl script
$< $REAL_USER_ID The real UID of this process
$> $EFFECTIVE_USER_ID The effective UID of this process
$( $REAL_GROUP_ID The real group GID of this process
$) $EFFECTIVE_GROUP_ID The effective GID of this process
$0 $PROGRAM_NAME The name of the program in $ARGV[0]
$[ n/aIndex of the first element in the array
$] $PERL_VERSION The Perl version string
$^D $DEBUGGING The current value of the debugging flag
$^F $SYSTEM_FD_MAX The maximum file descriptors in the system (RO)
$^I $INPLACE_EDIT The in-place edit extension
$^P $PERLDB The value of the internal debugger flag
$^T $BASETIME The time at which the debugged script started running
$^W $WARNING The value of the -w switch
$^X $EXECUTABLE_NAME The name of the program in $ARGV[0]
$ARGV n/aThe name of the current file while reading from the <> in a while loop
$VERSION n/aThe version number of the Perl interpreter
%ENV n/aThe hash of the environment variables for the process
%Inc n/aThe hash of filenames that have been included in the current file
%SIG n/aThe hash of all signal handlers for the current process
@ARGV n/aThe command-line arguments for the script
@EXPORT n/aThe names of all exported functions in a module
@F n/aThe command-line options used for the current program
@Inc n/aThe pathnames of places to look in for all included files
@ISA n/aThe names of all modules to search for when looking for a module

Don't worry if you do not recognize some of these strange characters. I will be covering them all in the course of this book.

Now let's see how you can use these built-in variables as well as your own variables in code.

Code Blocks

Variables and assignment statements exist in code blocks. Each code block is a section of code between two curly braces. Recognizing code blocks matters when you are concerned about the scope of influence of code on the value of a variable. (More on scope in a moment.) Code blocks are simply assignment statements enclosed between curly braces. Normally, you see code blocks in loop constructs and conditionals. It's syntactically correct to use statements like this in Perl programs:

{
print something;
print more of something;
more statements;
}

This coding style is rare and is usually done only if the programmer explicitly wants to keep some special variables within the curly braces. Usually, most of the application's code will be in one type of block, either a subroutine, loop, or conditional, with only the lines not in such blocks being those that are global to the rest of the components of the program.

Here are some examples of code blocks available in Perl:

{
# a simple code block with statements in here.
}

while(condition) {
    ... execute code here while condition is true;
}

until(condition) {  # opposite of while statement.
    ... execute code here while condition is false;
}

do {
    ... do this at least once ...
    ... stop if condition is false ...
} while(condition);

do {
    ... do this at least once ...
    ... stop if condition is true ...
} until(condition);

if (condition1) {
    condition1_code true;
} else {
...    no condition1 up to conditionN  is true;
}

if (condition1) {
...    condition1_code true;
} elsif (condition2) {
    condition1_code true;
....
} elsif (conditionN) {
    conditionN_code true;
} else {
...    no condition from 1 up to N  is true;
}

unless (condition1) { # opposite of "if" statement.
...    do this if condition is false;
}

The condition in these blocks of code is anything from a Perl variable to an expression that returns either a true or false value. A true value is a non-zero value or a non-empty string.

Code blocks can be declared within code blocks to create levels of code blocks. Variables declared in one code block are usually global to the rest of the program. To keep the scope of the variable limited to the code block in which it is declared, use the my $variableName syntax. If you declare with local $variableName syntax, the $variableName will be available to all lower levels but not outside the code block.

Figure 2.1 illustrates how the scoping rules work in Perl. The main block declares two variables, $x and $y. There are two blocks of code between curly braces, block A and block B. The variable $x is not available to either of these blocks, but $y will be available.

Figure 2.1 : Scoping rules in Perl

Because block A is declared in the main block, the code in it will be able to access $y but not $x because $x is declared as "my". The variable $f will not be available to other blocks of code even if they are declared within block A. The variable $g is not declared as "local" or "my", so it's not visible to the main module nor to block B.

The code in block B declares two variables, $k and $m. The variable $k can be assigned the value of $g, provided that the code in block A is called before the code in block B. If the code in block B is called before the code in block A, the variable $g will not be declared, and a value of 'undef' will be assigned to $k. Also, $m cannot use the value of $f because $f is declared in block A as a "my" variable. The values of $y and $g are available to code in block B.

Finally, another code block (call it C) could be assigned in block B. Block C is not shown in the figure. All variables in this new block C that are declared as neither "my" nor "local" would be available to blocks A and B and the main program. Code in block C would not be able to access variables $f, $k, and $m because they are declared as "my". The variable $g would not be available to code in block B or C because it is local to block A.

Keep in mind that variables in code blocks are also declared at the first time they are assigned a value. This creation includes arrays and strings. Variables are then evaluated by the parser when they appear in code, and even in strings. There are times when you do not want the variable to be evaluated. This is the time when you should be aware of quoting rules in Perl.

Quoting Rules

Three different types of quotes can be used in Perl. Double quotes (") are used to enclose strings. Any scalars in double-quoted strings are evaluated by Perl. To force Perl not to evaluate anything in a quote, you'll have to use single quotes ('). Anything that looks like code and is not quoted is interpreted as code by the Perl interpreter, which attempts to evaluate the code as an expression or a set of executable code statements. Finally, to run some values in a shell program and get its return value back, use the back quote (`) symbol. See the Perl script in Listing 2.1 for an example.


Listing 2.1. Quoting in a Perl script.
1 #!/usr/bin/perl
2 $folks="100";
3 print "\$folks = $folks \n";
4 print '\$folks = $folks \n';
5 print "\n\n BEEP! \a  \LSOME BLANK \ELINES HERE \n\n";
6 $date = `date +%D`;
7 print "Today is [$date] \n";
8 chop $date;
9 print "Date after chopping off carriage return: [".$date."]\n";

The output from the code in Listing 2.1 is as follows:

$folks = 100
\$folks = $folks \n

BEEP!  some blank LINES HERE

Today is [03/29/96
]
Date after chopping off carriage return: [03/29/96]

Let's go over the code shown in Listing 2.1. First of all, note that the actual listing did not have line numbers. The line numbers in this and subsequent scripts are used to identify specific lines of code.

Line 1 is the mandatory first line of the Perl script. Change the path shown in Listing 2.1 to where your Perl interpreter is located if the script does not run. Be sure to make a similar change to the rest of the source listings in this book.

Line 2 assigns a string value to the $folks variable. Note that you did not have to declare the variable $folks because it was created when used for the first time.

Line 3 prints the value of $folks in between double quotes. The $ sign in $folks has to be escaped with a backslash to prevent Perl from evaluating the value of $folks instead of printing the following line:

$folks = 100

In line 4, Perl does not evaluate anything between the single quotes. Therefore, the entire contents of the line are left untouched and printed here:

\$folks = $folks \n

Perl has several special characters to format text data for you. Line 5 prints multiple blank lines with the \n character and beeps at the terminal. Notice how the words SOME BLANK are printed in lowercase letters. This is because they are encased between the \L and \E special characters, which force all characters to be lowercase. Some of these special characters are listed in Table 2.2.

Table 2.2. Special characters in Perl.
CharacterMeaning
\\ Backslash.
\0ooo Octal number in ooo (for example, \0213).
\a Beep.
\b Backspace.
\c Inserts the next character literally (for example, \$ puts $).
\cC Inserts control character C.
\l Next character is lowercase.
\L \E All characters between \L and \E are lowercase.
\n New line (line feed).
\r Carriage return (MS-DOS).
\t Tab.
\u Next character is uppercase.
\U \E All characters between \U and \E are uppercase.
\x## Hex number in ## (for example, \x1d).

In line 6, the script uses the back quotes (`) to execute a command and return the results in the $date variable. The string in between the two back quotes is what you would type at the command line, with one exception: if you use Perl variables in the command line for the back quotes, Perl evaluates these variables before passing them off to the shell for execution. For example, line 6 could be rewritten as this:

$parm = "+%D";
$date = `$date $parm`;

The returned value in $date is printed out in line 7. Note that there is an extra carriage return in the text for data. To remove it, use the chop command as shown in line 8.

Then in line 9 the $date output is shown to print correctly. Note how the period (.) is used to concatenate three strings together for the output.

It's easy to construct strings in Perl with the period (.) operator. Given two strings, $first and $last, you can construct the string $fullname like this to get "Jim Smith":

$first = "Jim";
$last = "Smith";
$fullname = $first . " " . $last;

Numbers in Perl are stored as floating-point numbers; even variables used as integers are really stored as floating point numbers. There are a set of operations you can do with numbers. These operations are listed in Table 2.3. The table also lists Boolean operators.

Table 2.3. Numeric operations with Perl.
OperationDescription
$r = $x + $y Adds $x to $y and assigns the result to $r
$r = $x - $y Subtracts $y from $x and assigns the result to $r
$r = $x * $y Multiplies $y and $x and assigns the result to $r
$r = $x / $y Divides $x by $y and assigns the result to $r
$r = $x % $y Modulo; divides $x by $y and assigns the remainder to $r
$r = $x ** $y Raises $x to the power of $y and assigns the result to $r
$r = $x << $n Shifts bits in $x left $n times and assigns to $r
$r = $x >> $n Shifts bits in $x right $n times and assigns to $r
$r = ++$x Increments $x and assigns $x to $r
$r = $x++ Assigns $x to $r and then increments $x
$r += $x; Adds $x to $r and then assigns to $r
$r = --$x Decrements $x and assigns $x to $r
$r = $x-- Assigns $x to $r and then decrements $x
$r -= $x; Subtracts $x from $r and then assigns to $r
$r /= $x; Divides $r by $x and then assigns to $r
$r *= $x; Multiplies $r by $x and then assigns to $r
$r = $x <=> $y $r is 1 if $x > $y; 0 if $x == $y; -1 if $x < $y
$r = $x || $y $r is the logical OR of variables $x and $y
$r = $x && $y $r is the logical AND of variables $x and $y
$r = ! $x $r is the opposite Boolean value of $x

You can compare values of variables to check results of operations. Table 2.4 lists the comparison operators for numbers and strings.

Table 2.4. Comparison operations with Perl.
OperationDescription
$x == $y True if $x is equal to $y
$x != $y True if $x is not equal to $y
$x < $y True if $x is less than $y
$x <= $y True if $x is less than or equal to $y
$x > $y True if $x is greater than $y
$x >= $y True if $x is greater than or equal to $y
$x eq $y True if string $x is equal to string $y
$x ne $y True if string $x is not equal to string $y
$x lt $y True if string $x is less than string $y
$x le $y True if string $x is less than or equal to string $y
$x gt $y True if string $x is greater than string $y
$x ge $y True if string $x is greater than or equal to string $y
$x x $y Repeats $x, $y times
$x . $y Returns the concatenated value of $x and $y
$x cmp $y Returns 1 if $x gt $y; 0 if $x eq $y; -1 if $x lt $y
$w ? $x : $y Returns $x if $w is true; $y if $w is false

Arrays and Associative Arrays

Perl has arrays to let you group items using a single variable name. Perl offers two types of arrays: those whose items are indexed by number (arrays) and those whose items are indexed by a string (associative arrays). An index into an array is referred to as the subscript of the array.

Tip
An associative array is referred to as "hash" because of the way it's stored internally in Perl.

Arrays are referred to with the @ symbol. Individual items in an array are derived with a $ and the subscript. Therefore, the first item in an array @count would be $count[0], the second item would be $count[1], and so on. See Listing 2.2 for usage of arrays.


Listing 2.2. Using arrays.
 1 #!/usr/bin/perl
 2 #
 3 # An example to show how arrays work in Perl
 4 #
 5 @amounts = (10,24,39);
 6 @parts = ('computer', 'rat', "kbd");
 7
 8 $a = 1; $b = 2; $c = '3';
 9 @count = ($a, $b, $c);
10
11 @empty = ();
12
13 @spare = @parts;
14
15 print '@amounts = ';
16 print "@amounts \n";
17
18 print '@parts = ';
19 print "@parts \n";
20
21 print '@count = ';
22 print "@count \n";
23
24 print '@empty = ';
25 print "@empty \n";
26
27 print '@spare = ';
28 print "@spare \n";
29
30
31 #
32 # Accessing individual items in an array
33 #
34 print '$amounts[0] = ';
35 print "$amounts[0] \n";
36 print '$amounts[1] = ';
37 print "$amounts[1] \n";
38 print '$amounts[2] = ';
39 print "$amounts[2] \n";
40 print '$amounts[3] = ';
41 print "$amounts[3] \n";
42
43 print "Items in \@amounts  = $#amounts \n";
44 $size = @amounts; print "Size of Amount  = $size\n";
45 print "Item 0 in \@amounts = $amounts[$[]\n";
46

Here's the output from Listing 2.2:

@amounts = 10 24 39
@parts = computer rat kbd
@count = 1 2 3
@empty =
@spare = computer rat kbd
$amounts[0] = 10
$amounts[1] = 24
$amounts[2] = 39
$amounts[3] =
Items in @amounts  = 2
Size of Amount  = 3
Item 0 in @amounts = 10

In line 5, three integer values are assigned to the @amounts array. In line 6, three strings are assigned to the @parts array. In line 8, the script assigns both string and numeric values to variables and then assigns the values of the variables to the @count array. An empty array is created in line 11. In line 12, the @spare array is assigned the same values as those in @parts.

Lines 15 through 28 print out the first five lines of the output. In lines 34 to 41, the script addresses individual items of the @amounts array. Note that $amounts[3] does not exist; therefore, it is printed as an empty item.

The @#array syntax is used in line 43 to print the last index in an array, so the script prints 2. The size of the amounts array is ($#amounts + 1). If an array is assigned to a scalar, as shown in line 44, the size of the array is assigned to the scalar.

Line 45 shows the use of a special Perl variable, $[, which is the base subscript (0) of an array.

What Are Associative Arrays?

An associative array is really an array with two items per index. The first item at each index is called a key and the other item is called a value. You index into an associative array using keys to get values. An associative array name is preceded with a percent (%) sign and indexed items are enclosed within curly braces ({}). See Listing 2.3 for some sample uses of associative arrays.


Listing 2.3. Using associative arrays.
 1 #!/usr/bin/perl
 2 #
 3 # Associative Arrays.
 4 #
 5
 6 %subscripts = (
 7      'bmp', 'Bitmap',
 8      "cpp", "C++ Source",
 9      "txt", 'Text file' );
10
11 $bm = 'asc';
12 $subscripts{$bm} = 'Ascii File';
13
14 print "\n =========== Raw dump of hash  ========= \n";
15 print %subscripts;
16
17 print "\n =========== using foreach  ========= \n";
18 foreach $key (keys (%subscripts)) {
19     $value = $subscripts{$key};
20     print "Key = $key, Value = $value \n";
21     }
22
23 print "\n === using foreach with sort ========= \n";
24 foreach $key (sort keys (%subscripts)) {
25     $value = $subscripts{$key};
26     print "Key = $key, Value = $value \n";
27     }
28
29 print "\n =========== using each()  ========= \n";
30 while (($key,$value) = each(%subscripts)) {
31     print "Key = $key, Value = $value \n";
32     }
33

Here's the output from Listing 2.3:

=========== Raw dump of hash  =========
txtText filecppC++ SourceascAscii FilebmpBitmap
=========== using foreach  =========
Key = txt, Value = Text file
Key = cpp, Value = C++ Source
Key = asc, Value = Ascii File
Key = bmp, Value = Bitmap

=== using foreach with sort =========
Key = asc, Value = Ascii File
Key = bmp, Value = Bitmap
Key = cpp, Value = C++ Source
Key = txt, Value = Text file

=========== using each()  =========
Key = txt, Value = Text file
Key = cpp, Value = C++ Source
Key = asc, Value = Ascii File
Key = bmp, Value = Bitmap

An associative array called %subscripts is created in line 6 to line 9. Three items of (key,value) pairs are added to %subscripts as a list. At line 11, a new item is added to the %subscript array by assigning $bm to a key and then using $bm as the index. We could have just as easily added the string 'Ascii File' with this hard-coded statement:

$subscripts{'asc'} = 'Ascii File';

Items in an associative array are referred to as items stored in a hash, because this is the way items are stored internally. Look at the output from line 15, which dumps out the associative array items.

In line 17, the script uses a foreach statement to loop over the keys in the %subscripts array. The keys() function returns a list of keys for a given hash. The value of the item at $subscripts{$key} is assigned to $value at line 19. You could combine lines 18 and 19 into one statement like this without loss of meaning:

print "Key = $key, Value = $subscripts{$key} \n";

Using the keys alone did not list the contents of the %subscripts hash in the order we want. To sort the output, you should sort the keys into the hash. This is shown in line 24. The sort() function takes a list of items and returns a text-sorted version. The foreach function takes the output from the sort() function applied to the value returned by the keys() function. To sort in decreasing order, you can apply the reverse function to the returned value of sort() to get this line:

for $i (reverse sort (keys %@array)) {

It's more efficient to use the each() function when working with associative arrays because only one lookup is required per item to get both the key and its value. See Line 30 where the ($key,$value) pairs are assigned to the returned values by the each() command. The variable $key is assigned to the first item, and the variable $value is assigned to the second item that is returned from the each() function call.

The code in line 30 is important and deserves some explaining. First of all, the while() loop is used here. The format for a while loop is defined as this:

while( conditionIsTrue) {
    codeInLOOP
}

codeOutOfLOOP

If the condition in the while loop is a nonzero number, a nonempty string, or a nonempty list, the code in the area codeInLOOP is executed. Otherwise, the next statement outside the loop (that is, after the curly brace) is executed.

Second, look at how the list ($key,$value) is mapped onto the list returned by the each() function. The first item of the returned list is assigned to $key, the next item to $value. This is part of the array-slicing operations available in Perl.

Array Operations

When working with arrays in Perl, you are really working with lists. You can add or remove items from the front or back of the list. Items in the middle of the list can be indexed using subscripts or keys. Sublists can be created by extracting items from lists, and lists can be concatenated to create one or more new lists.

Let's view some examples of how they fit together. See Listing 2.4, which uses some of these concepts.


Listing 2.4. Array operations.
 1 #!/usr/bin/perl
 2 #
 3 # Array operations
 4 #
 5
 6 $a = 'RFI';
 7 $b = 'UPS';
 8 $c = 'SPIKE';
 9
10 @words = ('DC','AC','EMI','SURGE');
11
12 $count = @words;  # Get the count
13
14 #
15 # Using the for operator on a list
16 #
17 print "\n \@words = ";
18 for $i (@words) {
19     print "[$i] ";
20     }
21
22 print "\n";
23 #
24 # Using the for loop for indexing
25 #
26 for ($i=0;$i<$count;$i++) {
27     print "\n Words[$i] : $words[$i];";
28     }
29 #
30 # print 40 equal signs
31 #
32 print "\n";
33 print "=" x 40;
34 print "\n";
35 #
36 # Extracting items into scalars
37 #
38 ($x,$y) = @words;
39 print "x = $x, y = $y \n";
40 ($w,$x,$y,$z) = @words;
41 print "w = $x, x = $x, y = $y, z = $z\n";
42
43 ($anew[0], $anew[3], $anew[9], $anew[5]) = @words;
44
45 $temp = @anew;
46
47 #
48 # print 40 equal signs
49 #
50 print "=" x 40;
51 print "\n";
52
53 print "Number of elements in anew = ". $temp, "\n";
54 print "Last index in anew = ". $#anew, "\n";
55 print "The newly created Anew arrary is: ";
56 $j = 0;
57 for $i (@anew) {
58     print "\n \$anew[$j] = is $i ";
59     $j++;
60     }
61 print "\n";
62
63

Here's the output from Listing 2.4:

 @words = [DC] [AC] [EMI] [SURGE]

 Words[0] : DC;
 Words[1] : AC;
 Words[2] : EMI;
 Words[3] : SURGE;
========================================
x = DC, y = AC
w = AC, x = AC, y = EMI z = SURGE
========================================
Number of elements in anew = 10
Last index in anew = 9
The newly created Anew arrary is:
 $anew[0] = is DC
 $anew[1] = is
 $anew[2] = is
 $anew[3] = is AC
 $anew[4] = is
 $anew[5] = is SURGE
 $anew[6] = is
 $anew[7] = is
 $anew[8] = is
 $anew[9] = is EMI

Lines 6, 7, and 8 assign values to scalars $a, $b, and $c, respectively. In line 10, four values are assigned to the @words array. At line 12, you get a count of the number of elements in the array.

The for() loop statement is used to cycle through each element in the list. Perl takes each item in the @words array, assigns it to $i, and then executes the statements in the block of code between the curly braces. You could rewrite line 17 as the following and get the same result:

for $i ('DC','AC','EMI','SURGE') {

In the example in Listing 2.4, the value of each item is printed with square brackets around it. Line 22 simply prints a new line.

Now look at line 26, where the for loop is defined. The syntax in the for loop will be very familiar to C programmers:

for (startingCondition; endingCondition; at_end_of_every_loop) {
        execute_statements_in_this_block;
    }

In line 26, $i is set to zero when the for loop is started. Before Perl executes the next statement within the block, it checks to see whether $i is less than $count. If $i is less than $count, the print statement is executed. If $i is greater than or equal to $count, the next statement following the ending curly brace is executed. After executing the last statement in a for loop code block (see line 28), Perl increments the value of $i with the statement for the end of loop: $i++. So $i is incremented. Perl goes back to the top of the loop to test for the ending condition to see what to do next.

In lines 32 through 34, an output-delimiting line is printed with 40 equal signs. The x operator in line 33 causes = to be repeated by the number following it. Another way to print a somewhat fancier line would be to use the following in lines 32 through 34:

32 print "[\n";
33 print "-=" x 20;
34 print "]\n";

Next, in line 38 the first two items in @words are assigned to variables $x and $y, respectively. The rest of the items in @words are not used. In line 40, four items from @words are assigned to four variables. The mapping of items from @words to variables is done on a one-to-one basis, based on the type of parameter on the left side of the equal sign.

Had I used the following line in place of line 40, I would get the value of $words[0] in $x and the rest of @words in @sublist:

($x,@sublist) = @words;

In line 43 a new array, @anew, is created and assigned values from the @words array, but not on a one-to-one basis. In fact, you'll see that the @anew array is not even the same size as @words. Perl automatically resizes the @anew array to be at least as large the largest index. In this case, because $anew[9] is being assigned a value, @anew will be at least 10 items long to cover items from 0 to 9.

In lines 53 and 54, the script prints out the value of the number of elements in the array and the highest valid index in the array. Lines 57 through 60 print out the value of each item in the anew area. Notice that items in the @anew array are not assigned any values.

You can create other lists from lists, as well. See the example in Listing 2.5.


Listing 2.5. Creating sublists.
 1 #!/usr/bin/perl
 2 #
 3 # Array operations
 4 #
 5
 6 $a = 'RFI';
 7 $b = 'UPS';
 8 $c = 'SPIKE';
 9
10 @words = ('DC','AC','EMI','SURGE');
11
12 $count = @words;  # Get the count
13 #
14 # Using the for operator on a list
15 #
16 print "\n \@words = ";
17 for $i (@words) {
18     print "[$i] ";
19     }
20
21 print "\n";
22 print "=" x 40;
23 print "\n";
24
25 #
26 # Concatenate lists together
27 #
28 @more = ($c,@words,$a,$b);
29 print "\n  Putting a list together: ";
30 $j = 0;
31 for $i (@more) {
32     print "\n \$more[$j] = is $i ";
33     $j++;
34     }
35 print "\n";
36
37 @more = (@words,($a,$b,$c));
38 $j = 0;
39 for $i (@more) {
40     print "\n \$more[$j] = is $i ";
41     $j++;
42     }
43 print "\n";
44
45
46 $fourth = ($a x 4);
47 print " $fourth\n";

Here's the output from Listing 2.5:

 @words = [DC] [AC] [EMI] [SURGE]
========================================

  Putting a list together:
 $more[0] = is SPIKE
 $more[1] = is DC
 $more[2] = is AC
 $more[3] = is EMI
 $more[4] = is SURGE
 $more[5] = is RFI
 $more[6] = is UPS

 $more[0] = is DC
 $more[1] = is AC
 $more[2] = is EMI
 $more[3] = is SURGE
 $more[4] = is RFI
 $more[5] = is UPS
 $more[6] = is SPIKE

 RFIRFIRFIRFI

In Listing 2.5, one list is created from another list. In Line 10, the script creates and fills the @words array. In Lines 16 through 19, the script prints the array. Lines 21 through 23 are repeated again (which we will convert into a subroutine soon).

At line 28, the @more array is created by placing together the value of $c, all the items in the entire @words array, followed by the values $a and $b. The size of the @more array will therefore be 6. The items in the @more array are printed in lines 31 through 35.

The code at line 37 creates another @more array with a different ordering. The previously created @more array is freed back to the memory pool. The newly ordered @more list is printed from lines 40 through 43.

The script then uses the x operator in line 46 to create another item by concatenating four copies of $a into the variable $fourth.

I have covered how to add items to arrays but not how to remove them. To remove an item from an array, use the delete command on an array item. For example, to delete $more[2], you would use the command:

delete $more[2];

If you are like me, you probably do want to type the same lines of code again and again. For example, the code in lines 21 through 23 of Listing 2.5 could be made into a function that looks like this:

sub printLine {
  print "\n";
  print "=" x 40;
  print "\n";
}

Now when you want print the lines, call the subroutine with this line of code:

&printLine;

I cover other aspects of subroutines in the section "Subroutines" of this chapter, and a bit more in Chapter 3.

Now let's get back to some of the things you can do with arrays using the functions supplied with Perl. See Listing 2.6 for a script that uses the array functions I discuss here.


Listing 2.6. Using array functions.
 1 #!/usr/bin/perl
 2 #
 3 # Functions for Arrays
 4 #
 5 sub printLine {
 6 print "\n"; print "=" x 60; print "\n";
 7 }
 8
 9 $quote= 'Listen to me slowly';
10
11 #
12 # USING THE SPLIT function
13 #
14 @words = split(' ',$quote);
15
16 #
17 # Using the for operator on a list
18 #
19 &printLine;
20 print "The quote from Sam Goldwyn: $quote ";
21 &printLine;
22 print "The words \@words = ";
23 for $i (@words) {
24     print "[$i] ";
25     }
26
27 #
28 # chOP
29 #
30 &printLine;
31 chop(@words);
32 print "The chopped words \@words = ";
33 for $i (@words) {
34     print "[$i] ";
35     }
36 print "\n .. restore";
37 #
38 # Restore!
39 #
40 @words = split(' ',$quote);
41
42 #
43 # Using PUSH
44 #
45 @temp = push(@words,"please");
46 &printLine;
47 print "After pushing \@words = ";
48 for $i (@words) {
49     print "[$i] ";
50     }
51
52 #
53 # USING POP
54 #
55 $temp = pop(@words);  # Take the 'please' off
56 $temp = pop(@words);  # Take the 'slowly' off
57 &printLine;
58 print "Popping twice \@words = ";
59 for $i (@words) {
60     print "[$i] ";
61     }
62 #
63 # SHIFT from the front of the array.
64 #
65 $temp = shift @words;
66 &printLine;
67 print "Shift $temp off, \@words= ";
68 for $i (@words) {
69     print "[$i] ";
70     }
71 #
72 # Restore words
73 #
74 @words = ();
75 @words = split(' ',$quote);
76 &printLine;
77 print "Restore words";
78 #
79 # SPLICE FUncTION
80 #
81 @two = splice(@words,1,2);
82 print "\n Words after splice = ";
83 for $i (@words) {
84     print " [$i]";
85     }
86 print "\n Returned from splice = ";
87 for $i (@two) {
88     print " [$i]";
89     }
90 &printLine;
91
92 #
93 # Using the join function
94 #
95 $joined = join(":",@words,@two);
96 print "\n Returned from join = $joined ";
97 &printLine;

The split() function is used in line 14 to split the items in the string $quote into the @words array.

Next, the script uses chop() on a list. This function removes a character from a string. When applied to an array, chop() removes a character from each item on the list. See lines 31 through 35.

You can add or delete items from an array using the pop(@Array) or push(@Array) functions. The pop() function removes the last item from a list and returns it as a scalar. Look at the push(ARRAY,LIST); call to add items to a list. The push() function takes an array as the first parameter and treats the rest of the parameters as items to place at the end of the array. At line 45, the push() function pushes the word please into the back of the @words array. In lines 55 and 56, two words are popped off the @words list. The size of the array @words changes with each command.

Let's look at how the shift() function is used in line 67. The shift(ARRAY) function returns the first element of an array. The size of the array is decreased by 1. You can use shift() in one of three ways:

shift (@mine); # return first item of @mine
shift @mine; # return first item of @mine
shift; # return first item in @ARGV

The special variable @ARGV is the argument vector for your Perl program. The number of elements in @ARGV is easily found by assigning a scalar to $ARGC that is equal to @#ARGV before any operations are applied to @ARGV.

Then, after restoring @words to its original value, the script uses the splice() function to remove items from the @words array. The splice() function is a very important function and is really the key behind the pop(), push(), and shift() functions. Here's the syntax for the splice() function:

splice(@array,$offset,$length,$list)

The splice() function returns the items removed in the form of a list. It replaces the $length items in @array starting from $offset with the contents of $list. If you leave out the $list parameter and just use splice(@array,$offset,$length), nothing is inserted in the original array. Any removed items are returned from splice(). If you leave out the $length parameter to splice() and use it as splice(@array,$offset), the value of $length is used to determine the number of the @array to use starting from the offset.

File Handles and Operators

Now that I have covered basic array and numeric operations, let's cover some of the input/output operations where files are concerned. A Perl program has three file handles when it starts up: STDIN (for standard input), STDOUT (for standard output), and STDERR (for standard error message output). Note the use of capitals and the lack of a dollar ($) sign to signify that these are file handles. For a C/C++ programmer, the three handles are akin to stdin, stdout, and stderr.

To open a file for I/O you have to use the open statement. Here's the syntax for the open call:

open(HANDLE, $filename);

HANDLE is then used for all the operations on a file. To close a file, you use the function close HANDLE;.

For writing text to a file given a handle, you can use the print() statement to write to the file:

print HANDLE $output;

The HANDLE defaults to STDIN if no handle is specified. To read one line from the file given a HANDLE, you use the <> operators:

$line = <HANDLE>

In this code, $line will be assigned all the input until a carriage return or eof. When writing interactive scripts, you normally use the chop() function to remove the end-of-line character. To read from the standard input into a variable $response, you use these statements in sequence:

$response = <STDIN>;
chop $response; # remove offensive carriage return.

You can perform binary read and write operations on a file using the read() and write() functions. Here's the syntax for each type of function:

read(HANDLE,$buffer,$length[,$offset]);
write(HANDLE,$buffer,$length[,$offset]);

The read function is used to read from HANDLE into $buffer, up to $length bytes from the $offset in bytes from the start of the file. The $offset is optional, and read() defaults reading to the current location in the file if $offset is left out. The location in the file to read from is advanced $length bytes. To check if you have reached the end of the file, use the command:

eof(HANDLE);

A nonzero value returned signifies the end of the file; a zero returned indicates that there is more to read in the file.

The write function is used to write the contents of $buffer to HANDLE. The number of bytes to write is set in $length. The location to write at the handle is set in the variable $offset as the number of bytes from the start of the file. The $offset is optional, and write() defaults writing to the current location in the file if $offset is left out. The location in the file written to is advanced $length bytes.

You can move to a position in the file using the seek() function:

seek(HANDLE,$offset,$base)

The $offset is from the location specified in $base. The seek function behaves exactly like the C function call in that if $base is 0, the $offset is from the start of the file. If $base is set to 1, the program uses the current location of the file pointer. If $base is $2, the program uses an offset from the end of the file where the value of $offset is negative.

There can be errors associated with opening files. It's a good idea to see what the errors are before proceeding further in a program. To print error messages before a script crashes, the die function is used. A call to open a file called test.data would like this:

open(TESTFILE,"test.data") || die "\n $0 Cannot open $! \n";

This line literally reads Open test.data for input or die if you cannot open it. The $0 is the Perl special variable for the process name, and the special variable $! is set to a string corresponding to the value of the system variable, errno.

The syntax in the string used for the filename also signifies the type of operation you intend to perform with the file. Table 2.5 shows some of the ways you can open a file.

Table 2.5. File open types.
FileAction
test.data Opens test.data for reading. The file must exist.
>test.data Opens test.data for writing. Creates the file if it does not exist and destroys any previous file called test.data.
>>test.data Opens test.data for writing. Creates the file if it does not exist and appends to any existing file called test.data.
+>test.data Opens test.data for reading and writing. Creates the file if it does not exist.
| cmd Opens a pipe to write to. (Chapter 14, "Signals, Pipes, FIFOs, and Perl," covers pipes.)
cm | Opens a pipe to read from.

When working with multiple files, you can have more than one unique handle to write to or read from. Use the select HANDLE; call to set the default file handle to use with print statements. For example, suppose you have two file handles, LARRY and CURLY; here's how to switch between handles:

select LARRY;
print "Whatsssa matter?\n"; # write to LARRY
select CURLY;
print "Whoop, whoop, whoop!"; # write to CURLY
select LARRY;
print "I oughta.... "; # write to LARRY again

Of course, by explicitly stating the handle name you could get the same result with these three lines of code:

print LARRY "Whatsssa matter?\n"; # write to LARRY
print CURLY "Whoop, whoop, whoop!"; # write to CURLY
print LARRY "I oughta.... "; # write to LARRY again

This is a very brief introduction to using file handles in Perl. I cover the use of file handles throughout the rest of this book, so don't worry if this pace of information is too quick. You'll see plenty of examples throughout the book.

You can also check for the status of a file given a filename. The available tests are listed in the source test file shown in Listing 2.7.


Listing 2.7. Testing file parameters.
 1 #!/usr/bin/perl
 2
 3 $name = "test.txt";
 4 print "\nTesting flags for $name \n";
 5 print "\n========== Effective User ID tests ";
 6 print "\n is readable" if ( -r $name);
 7 print "\n is writable" if ( -w $name);
 8 print "\n is executable" if ( -x $name);
 9 print "\n is owned " if ( -o $name);
10 print "\n========== Real User ID tests ";
11 print "\n is readable" if ( -R $name);
12 print "\n is writable" if ( -W $name);
13 print "\n is executable" if ( -X $name);
14 print "\n is owned by " if ( -O $name);
15
16 print "\n========== Reality Checks ";
17 print "\n exists " if ( -e $name);
18 print "\n has zero size " if ( -z $name);
19 print "\n has some bytes in it " if ( -s $name);
20
21 print "\n is a file " if (-f $name);
22 print "\n is a directory " if (-d $name);
23 print "\n is a link " if (-l $name);
24 print "\n is a socket " if (-S $name);
25 print "\n is a pipe " if (-p $name);
26
27 print "\n is a block device " if (-b $name);
28 print "\n is a character device " if (-c $name);
29
30 print "\n has setuid bit set " if (-u $name);
31 print "\n has sticky bit set " if (-k $name);
32 print "\n has gid bit set " if (-g $name);
33
34 print "\n is open to terminal " if (-t $name);
35 print "\n is a Binary file " if (-B $name);
36 print "\n is a Text file " if (-T $name);
37
38 printf "\n";

Working with Patterns

Perl has a very powerful regular expression parser as well as a powerful string search-and-replace function. To search for a substring, you use the following syntax (normally within an if block):

if ($a =~ /"menu"/) {
    print "\n Found menu in $a! \n";
}

The value in $a is the number of matched strings. To search in a case-insensitive manner, use an i at the end of the search statement, like this:

if ($a =~ /"mEnU"/i) {
    print "\n Found menu in $a! \n";
}

You can even search for items in an array. For example, if $a was an array @a, the returned value from the search operation is an array with all the matched strings. If you do not specify the @a =~ portion, Perl uses the $_ default name space to search on.

To search and replace strings, use the following syntax:

$expr =~ s/"old"/"new"/gie

The g, i, and e are optional parameters. If g is not specified, only the first match to the old string will be replaced with new. The i flag specifies a case-insensitive search, and e forces Perl to use the new string as a Perl expression. Therefore, in the following example, the value of $a will be "HIGHWAY":

$a = "DRIVEWAY";
$a =~ s/"DRIVE"/"HIGH"/
print $a;

Perl has a grep() function that is very similar the grep function in UNIX. Perl's grep function takes a regular expression and a list. The return value from grep can be handled one of two ways: if assigned to a scalar, it's the number of matches found, or if assigned to a list, it's a sublist of all the items found via grep.

Please check the man pages for using grep. Some of the main types of predefined patterns are shown in the following list:

Code Pattern
*Zero or more of the previous pattern
+One or more of the previous pattern
.Any character
?Zero or one of the previous pattern
\0 Null
\000 Octal
\cX ASCII control character
\d Digits [0-9]
\D Anything but digits
\f Formfeed
\n Newline
\r Carriage return
\s Space or tab or return or newline
\S Anything but \s
\t Tab
\w [0-9a-zA-Z]
\W Anything but \w
\X00 Hex

Perl uses a special variable called $_. This is the default variable to use in Perl if you do not explicitly specify a variable name and Perl expects a variable. For example, in the grep() function, if you omit LIST, grep() will use the string in the variable $_. The $_ variable is Perl's default string in which to search, assign input, or read for data for a number.

Subroutines

Perl 5 supports subroutines and functions with the sub command. You can use pointers to subroutines, too. Here's the syntax for subroutines:

sub Name {

}

The ending curly brace does not require a semicolon to terminate it. If you are using a reference to a subroutine, it can be declared without a Name, as shown here:

$ptr = sub {

};

Note the use of the semicolon to terminate the end of the subroutine. To call this function, you use the following line:

&\$ptr(argument list);

Parameters to subroutines are passed in the @_ array. To get the individual items in the array, you can use $_[0], $_[1], and so on. You can define your own local variables with the local keyword. Here's an example:

sub sample {
local ($a, $b, @c, $x) = @_
    &lowerFunc();
}

In this subroutine, you'll find that $a = $_[0], $b = $_[1], and @c point to the rest of the arguments as one list with $x empty. Generally, an array is the last assignment in such an assignment because it chews up all your parameters.

The local variables will all be available for use in the lowerFunc() function. To hide $a, $b, @c, and $x from lowerFunc, use the my keyword like this:

my ($a, $b, @c, $x) = @_

Remember, $x is empty. Now, the code in lowerFunc() is not be able to access $a, $b, @c, or $x.

Parameters in Perl can be in form, from the looks of it. Since Perl 5.002, you can define
prototypes for subroutine arguments with the following syntax:

sub   Name (parameters) {

}

If the parameters are not what the function expects, Perl bails out with an error. The parameter format is as follows: $ for a scalar, @ for an array, % for a hash, & for a reference to a subroutine, and * for anything. Therefore, if you want your function to accept only three scalars, you would declare it as this:

sub func1($$$) {
    my ($x,$y,$z) = @_;
    code here
}

To pass the value of an array by reference (by pointer), you would use a backslash (\). If you pass two arrays without the backslash specifier, the contents of the two arrays will be concatenated into one long array in @_. The function prototype to pass three arrays, a hash, and the rest in an array, would look like this:

sub func2(\@\@\@\%@)

The returned value from a subroutine is always the value of the last expression executed in the statement. The value can be a scalar, array, hash, or reference to an array.

A Final Note

The Perl distribution comes with two programs: a2p to convert awk programs to Perl, and s2p to convert sed programs to Perl. It's often convenient to write a sed script or an awk program to do a certain task. To see how to do the same thing in Perl, run the a2p or s2p program. For example, to convert mine.awk to mine.pl, you use the following command:

$ a2p mine.awk > mine.pl

Summary

This chapter has been a whirlwind introduction to Perl. I must admit that this chapter does not cover every aspect of Perl programming basics. As you progress through the book, you'll learn more ways to do things than are described here. Even if you are new to Perl, you should not have any problems understanding how to use Perl because the programming paradigms in Perl are not that different from any other programming language.

For more information, consult the following books: