Chapter 2 Principles of General Text Processing-
The Backbone of Perl

CONTENTS

Scalar Data
Scalar Variables
Arrays Defined

In the first chapter of this book there was a brief mention of what text was, how it is the primary building block for CGI communication, and how Perl is very good at dealing with text. Text, to reiterate, is data in the form of characters, integers, and non-alphanumeric characters that you use in creating text files, HTML files, and Perl scripts.

Getting into Perl means getting into text manipulation, which is what you're going to do in this chapter. You are also going to explore basic programming concepts as they apply to Perl and its building blocks, also called data structures.

The areas of text processing and programming you should understand from this chapter are scalar data, arrays and list data, control structures, associative arrays, regular expressions, functions, filehandles and file tests, and formats. All of these will be covered in this chapter as they apply to Perl.

Scalar Data

The term scalar in Perl is applied to either a number, like 12 or 4.3213e32, or a string of characters, like the words "Hey, now!," or the play Hamlet. Perl makes no distinction between numbers and character strings, treating them the same. Any collection of these numbers or characters is collectively called scalar data.

Scalar variables are used, or manipulated, with operators. This manipulation may produce another scalar value. Scalar values are stored in scalar variables. You can have scalar values read from files or written to them.

While numbers and strings are treated the same by Perl, there are some fine details that you should be aware of, if only for the fact that knowing fine details is something that separates the programmers from the hackers.

Floats, Integers, and Literals

When dealing with numbers, some are written as their value is, like 4, and some are written using short forms, like 2.5 (which is two and a half) or -3.453e32 (which is negative three point four five three times ten to the power of thirty-two). There are obvious reasons why short forms are used for some numbers. In Perl the numbers that are written as their value are called integers. Those that are representations of one kind or another are called floats.

Perl treats both integers or floats as literals. A literal is the way a value is designated in the actual coding of a program. This is the data that is fed to the Perl compiler. Perl will accept the following kinds of number types (whole, fractions, negatives, and exponents) as floats:

2.5-Two and a half
5.321e7-5.321 times 10 to the power of 7
-8.34e8-Negative 8.34 times 10 to the power of 8
-4.76e-13-Negative 4.76 times 10 to the power of negative 13

When using this notation you can substitute an uppercase "E" for the lowercase "e" without changing the value of the number.

Integers

Integers use the familiar notation:

18
-32
1ØØØØ32458

but you don't use the number 0 at the beginning of an integer literal because Perl can handle hexadecimal numbers, as well as octal numbers, both of which use zeros at the beginning of their notations.

Character Strings

The characters used to make up character strings, or just strings, each have an 8-bit value. There is a 256 character set that is recognized by Perl.

A string can range in size from having no characters to one so long it would be longer than you need it to be. This reflects one of the premises of Perl, and that is to have "no built-in limits" in its various abilities whenever possible.

This ability to process strings, regardless of the characters that make it up, is what makes Perl adept at CGI programming.

Perl also treats characters as literal notations. There are two kinds of literal strings: single- and double-quoted (see Figure 2.1).

Figure 2.1 : Examples of single- and double-quoted strings.

Single-Quoted Strings

If a string is contained by a single pair of quotes, like 'Hey, now!', it is called a single-quoted string. These quote marks are not part of the string, they merely indicate to Perl where the string starts and ends. If you want to put a single quote inside a string (and not have it treated as the delimiter for your string), precede it with a backslash, since the backslash is used to denote special characters. If you want to put a backslash into your string, precede the backslash with a backslash, as well. These are the only two instances of special meaning using a backslash inside a single-quoted string.

Double-Quoted Strings

When a string is enclosed by a double pair of quotes, like "Hey, now!", it is a double-quoted string. With double-quoted strings the backslash has much more "umph." Inside the double quotes, a backslash can be used to indicate some control characters or octal and even hexadecimal representations of special characters. Examples of such might be:

"Hey, now!\n"-where the string Hey, now! is followed by a newline command
"No flipping! \177"-where the string No flipping! is followed by octal 177, the delete character
"Live \tto tape"-where the string Live to tape is spliced by a tab, outputting:
Live to tape

The backslash can place many powerful commands inside a string. These are called backslash escapes. Table 2.1 outlines them.

Table 2.1 Double-Quoted String Backslash Escapes

Backslash Escape	Command Function
\n	Newline
\r	Return
\t	Tab
\f	Formfeed
\b	Backspace
\v	Vertical tab
\a	Bell
\e	Escape
\177	Any octal ASCII value, like 177-delete character
\x7f	Any hex ASCII value, like x7f-delete character
\cC	Any control character, like control C
\\	Backslash
\"	Double-quote
\l	Make the next letter lowercase
\L	Make all the next letters lowercase until \E
\u	Make the next letter uppercase
\U	Make all the next letters uppercase until \E
\E	Terminate \L or \U

One other facet of double-quoted strings is that they are variable interpolated. This means that a variable inside the string can have its current value replaced once the string is read.

Relating this back to what we know about CGI, we can write a script that demonstrates the difference between single- and double-quoted strings and puts it up as an HTML page to our browser. In Listing 2.1 we will also encounter some Perl commands, which are explained in a comment line that starts with the "#" symbol.

Listing 2.1 Perl Command Script

#! usr/bin/perop
# quotes_examples.pl
print "Content-type: text/html\n \n"; # print is a command that outputs 
# data. The string being printed is a
# common header used in CGI for
#returning HTML documents
$date='date';
# date is a system command and $date # is a scalar variable
chop ($date);
# chop is an operator
print <<"eop"; #the end of perl tag using double quotes
<HTML>
<HEAD>
<TITLE>Examples of single and double quoted strings</TITLE>
</HEAD>
<BODY>
<H2>Examples of single and double quoted strings</H2>
<P>
Hey, now!
<BR>
Today the date is $date.
<HR NOSHADE> 
eop
print <<'eop'; # the eop tag using 
# single quotes
<H2>Examples of single and double quoted strings</H2>
<P>
Hey, now!
<BR>
Today the date is $date.
<HR NOSHADE>
</BODY>
</HTML>
eop

Right away you will see that using double quotes on the eop string has a much different effect than the single quotes. The scalar variable $date is set by the system command with single quotes. This directs Perl to execute the system command within the single quotes. The "=" symbol is an assignment statement that tells Perl to assign the output of the system command to the scalar variable $date.

The Perl operator chop is used to remove the last character from the argument within its parentheses. In quotes_examples.pl chop takes off the last character from the scalar variable $date. Don't ask why, but there are very handy uses for chop listed in the next section.

The Perl operator print is used to output the signified scalar variable, in this case eop, into standard output. When the print operator is used in Perl it should really have a set of parentheses around the variable it is assigning to standard output, as with our example.

When you run the script from a browser you get:

print <<("eop");

which is better syntax than:

print <<"eop";

but in almost all cases leaving off the parentheses will not affect your script. The parentheses help get rid of any ambiguity that may exist in a larger Perl program. Keep this in mind if you are having trouble with your larger scripts.

In the first print statement double-quotes are used. This tells Perl to decipher any variables that occur in its print string between the eop tags. This makes Perl put the value of the current date in the system command "date" in the variable $date.

When the single quotes are used around the variable eop, it tells Perl to ignore all variables inside the print string. This makes the $date variable part of the HMTL document text, and so it is presented on the page with the other text. Amazing what one little pair of quotes can do to you if you're not careful, eh?

Both chop and print are Perl operators. There are more commands like this in Perl that will help you get things done in your scripts.

Operators

An operator in Perl makes a new value, called a result, from one or more operands, or other values. An example of this might be the plus sign used in simple addition. The operator "+" can take two values, like "1" and "2," and make the result "3," as in 1 + 2 = 3. Operators work on both numbers and character strings in conjunction with the suitable operands.

If you accidentally use a number operand with a string, Perl will convert it based on the operand, not the number or string value. If you put a "+" operand between "Beverly Hills 90210" and "Oceans 11" you'll end up with the numeric result 90221. White space and nonnumeric characters are given the value of 0 by the operand, and then ignored.

If you, with equal abandon, put a string operand between two numbers, you'll get a number that has been expanded into whatever its string equivalent might be. An example would be putting the string concatenate (a pretty fancy word that means putting two strings together) "." between "The Dirty" and (2*6) like this:

"The Dirty" . (2*6)

which will give you this string result:

"The Dirty12"

Remember, before Perl processes the operand outside the parentheses, it processes the operand inside the parentheses.

Numeric Operators

Operators for numbers are pretty much what you might expect. One odd one is from the C programming language and it's called a modulus. It is the "%" symbol and it divides two numbers by their integer value, not their actual value, and then takes the remainder as the new value. For example,

25.3 % 4.4378742

is first converted to

25 % 4

and then the new value is 1, which is what remains after 4 divides into 25. A full listing of numeric operators appears in Table 2.2.

Table 2.2 Numeric Operators

Operator	Action	Example
+	Addition	1+2, or 3
-	Subtraction	1-2, or -1
*	Multiplication	2*2, or 4
/	Division	2/2, or 1
**	Exponentiation	2**3, or 8
%	Modulus	2.3%3.2, or 0
<	Less than	2<3
<=	Less than or equal to	2<=3
==	Equal to	2==4/2
>=	Greater than or equal to	3>=2
>	Greater than	5>1
!=	Not equal to	5!=1

When these operators are used, Perl makes a comparison and then returns a true or false value. Operators for strings are a little different.

String Operators

These operators work the same way as the numeric ones, and are represented by characters. The concatenate operator used previously can have these different results:

"Hey," . "now!" produces "Hey, now!"
"Hey, now!" . "\n" produces "Hey, now!\n"
"Hey," . " " . "now!" produces "Hey, now!"

Other string operators are listed in Table 2.3.

Table 2.3 String Operators

Operator	Action	Example
	Concatenate	"bi" . "g", or "big"
eq	Equal	"small" eq "small"
ne	Not equal	"small" ne "tiny"
lt	Less than	"30" lt "7"
gt	Greater than	"50" gt "300"
le	Less than or equal to	"ten" le "ten"
ge	Greater than or equal to	"eleven" ge "eleven"
x	String repetition	"more" x 2, or "moremore"

You might have noticed something odd in Table 2.3 with the less than and greater than examples. They seem confusing, but remember, we're dealing with strings now, not numbers, so Perl takes the ASCII values of the number characters because it sees them as strings with these operators. This makes 30 less than 7 because 3 is less than 7 in ASCII.

It should be noted that, just as Perl will process what's inside parentheses first, it also gives special precedence and associativity (paying attention to which side of the operator a value is on) to each of its operators. For a full list of these relationships check Appendix A.

Scalar Variables

While a scalar value may change during the run of a program, the scalar variable remains the same. The variable is merely the jug into which the sweet wine of value is poured. A scalar variable can contain only one value at a time, but that value can change as the program demands. A scalar variable has this format:

$the_variable's_name $another_variable

where you see that white space ends the variable name. Variables are case-sensitive, so $JAZZ is different from $jazz, which is different again from $Jazz. When choosing a variable name it is highly recommended to pick something that reflects the value or task that the variable is fufilling in your script to make it easier for anyone (including yourself) to read the script.

Variables work with operators to give them their values. The most common use you will encounter is something like this:

$IQ = 13Ø

$Weight = $IQ + 2Ø

$page = $weight - 17

where the different scalar variables are assigned values based on integers, or other scalar variables. You can also use scalar variables on both sides of an operator, like so:

$income = $income - $food

where the value of $income is assigned a new value of $income minus $food. This type of relationship is so common in Perl, as in other languages, that Perl has developed a shortcut to do this called a binary assignment operator. Almost all of Perl's binary assignment operators are made of some modified "=" symbol. For example, a binary assignment that could be used in place of an ordinary operator might take $income = $income-$rent and change it to a binary assignment like so:

$income -= $rent

where the operator "-" is made into a binary assignment when added to the front of the "=" operator. Binary is a clever name for these new assignments, considering that two operators are used to make them. Almost any operator can be used as a binary assignment in this way.

Beyond the binary assignment are the autoincrement and autodecrement operators. These can be used to automatically add or subtract one to a variable. Autoincrement is signified with the "++" symbols. It could be used like this:

$debt += 1, is the same as
++$debt

with the value of $debt going up one with every pass. Autodecrement is very similar, with the "--" symbols being placed in front of a variable to lower its value by one with each pass.

You can only use the autoincrement and autodecrement operators to move your variable up or down one step.

The Chop Operator

This operator, as has been described before, takes a scalar variable and chops the last character from it. And what use does this have?

It all has to do with Perl's unending search for truth. The real power behind Perl can be found in the way it compares variables, finds them "true" or "false," records the result, and then moves on. One of the catches is that Perl should treat a blank line like a null sting, which would give it a false value. Always one step ahead, Perl was made to see a blank line as a new line, which is given the value "\n," and this has a true value.

However, you might not want that new line there, so you can use the chop operator to get rid of it. The default value of chop() is to chop off the last character. You will see chop used, especially in relation to the "\n" value, in several of the scripts in this book.

Interpolation of Scalars into Strings

Interpolation is related to double-quoted strings. When a string literal, like "Hey, now!" is double-quoted, Perl knows to look through the string for any scalar variables. When Perl finds one of these variables, its literal value is put in its place before it is output. For example:

$uid = "user_id";
$new = "new $uid"; # $new has the value 
# "new user_id"
$x = "$new and $data"; # $x only has 
# the value "new user_id and" because
# $data is an undefined variable

Interpolation, a fancy word that means "putting a literal value into a string" in Perl, works on the first pass through a line only. You can see this in the following example:

$min = "$time";
$clock = "$min left to go";

The value of $clock is "$time left to go." There is no double substitution.

If you want to avoid the substitution of a variable with its value, then you have to either add single quotes to that area of the string, or put a backslash in front of the "$" symbol to turn off its special significance. This might look like:

$x = "The variable used for the user id is ".'$user';
$y = "The variable used for the user id is \$user";

where both of the outputs will be "The variable used for the user id is $user."

You can use interpolation to avoid tripping up the Perl interpreter with your variables. Say you want to have some text that remains constant after a variable. With Perl, the variable name is the longest possible name it can find, so that causes trouble with finding your variable. This example might help clear up what I mean. In this script we are trying to unite a changing variable with a constant piece of text:

$old = "name_one";
$new = "name_two";
$previous = "You are $old user.";
$current = "You are {$new} user.";

If we print $previous we get "You are $old user." because Perl holds $olduser as a new variable. But if we add curly braces to the line, we get a more satisfactory result. When $current is printed we get "You are name_two user." We avoided the confusion by enclosing the name of the variable in curly braces.

Curly braces, also referred to as curly brackets, perform a number of duties in Perl, from delimiting compound statements, as above, to marking the beginning and end of loops and subroutines. They are even involved with regular expressions (a part of Perl explored in the next chapter). Generally they act as high-end markers, or delimiters, but there are no hard and fast rules for using curly braces, so close attention must be paid to when and where they are used.

You can also change the case of your variables and values using the case-shifting backslash operators mentioned earlier. The following are all ways of changing case:

$user = "\LNAME"; # changes value
# from 'NAME' to 'name'
$new = "NAME"; $user = "\L$new"; 
# gives us 'name' for the value of
     # $user

or the operators can be used together:

$user = "\LNAME"; # $user is 'name'
$biguser = "\u\LNAME"; # $biguser is 'Name'
$biguser = "\u\L$user"; # $biguser is 'Name'

This method works because Perl will remember case-shifting operators that are in a string until they are used, so you can put \L between \u and $user, and \u will see the first letter in the value of $user as the letter is should modify.

With variable interpolation, only double-quoted strings can make use of these functions. That may be why another name for it is double-quote interpolation.

Standard Input <STDIN>

In Perl there is one special variable called Standard Input, or <STDIN>. Actually, <STDIN> is much more than a scalar variable, it is a filehandle. Other common filehandles you might use in Perl are standard output, or <STDOUT>, and standard error, or <STDERR>. A filehandle is a way for Perl to make an input and output, or I/O, connection between a running script, or process, and a user. A user can be a human, or another process.

NOTE

A process is sort of like an application in the UNIX world, except it's more complicated than that. You will see processes referred to in discussions of Perl, CGI, and HTTP servers because of the UNIX link they all share. A process is a fascinating way to harness the resources a computer has, but knowing this is not fundamental to what we are looking at in this book. A full discussion of processes isn't necessary here, and we will treat them like the executables with which you are more familiar.

When <STDIN> is used in place of a scalar variable, the next full line of text is read as the value of <STDIN>. The full line is all the text that occurs until the next newline or a blank, or null string. <STDIN> usually represents the user's terminal. This means that usually the string value of <STDIN> has a new line at the end of it, where the user has hit return on his or her keyboard, or clicked the "done" button. Remember the chop operator? It can be used to get rid of that new line command that has come in as part of the value of <STDIN>. It might work like this:

$name = <STDIN>; # the user is asked
# to input their name
     chop $name; # to get rid of the
     # newline command

and you can amalgamate these two lines into this form:

chop ($name = <STDIN>)

which allows chop to work on $name at the same time.

When working with HTML forms, there are all kinds of data that is input from the user. You can use <STDIN> to deal with some of this. Every time data is sent via the POST method from a form, it goes into <STDIN>. A full discussion of HTML forms (including the command Post), and their relation to the CGI, can be found in Chapter 10.

A Note about Filehandles

When a file is open to be read, the name of the file is assigned to the filehandle variable. The Perl script references the filehandle, instead of the file itself, when it needs to read or write from that file. Perl considers your keyboard and mouse, or data from the CGI, as another file, as it puts this data into <STDIN>. The Perl interpretor also treats your screen as a file, giving it the filehandle <STDOUT>.

The Print Operator

This operator will allow you to use the values and variables we've been covering here. Once you get the data you need into your script, you need to get it out. By "getting it out" I mean moving values around inside your script, and then outputting the results to the desired location, whether that be to another script, a file, printer, e-mail address, or a user's screen.

We've briefly touched on <STDOUT>, and the print operator that works in conjunction with <STDOUT> to output the script's data. Perl uses Print to put data where it needs to go. The print operator takes a scalar value and places it into <STDOUT>. The print operator can also be used to move an array's list of values. From <STDOUT> this data is sent to the desired destination, for example, another script, a file, printer,and so forth.

The Value undef

If no value is assigned to a scalar variable it will not necessarily crash your script. Perl gives it the value of undef, short for undefined, which is represented by a zero when used as a number and a zero length empty string when used as a string.

One special use of the undef value is when <STDIN> is done reading a file. When it reads the next line, and finds a blank line (or null string) then it returns the value undef. This means you have to account for this in your script when it happens. More of this is explained in detail in the section dealing with input and output, or I/O, in Chapter 3

Arrays Defined

Scalar data can be made into an ordered list, which in Perl is called an array. An array is made up of separate elements, which are the individual scalar variables and their values. It is the elements that are ordered in the array; there is a sequence from the lowest to the highest. There are no size restrictions on arrays.

Arrays use literals, too-otherwise how could we notate them to Perl in our scripts? An array literal conforms to this format:

(1,2,3); # or,
($A,$B,$C,); # or,
($A,1,$B,2,$C,3);

where the different variables are separated with commas inside a pair of parentheses. These variables do not have to be constants, but can be expressions that are modified within the array, as with:

($A+1,$B,$C-1);

When dealing with arrays, Perl provides a list constructor operator, signified by two periods between two values, which can be used in different ways, for example:

(1..1Ø); # is equal to (1,2,3,4,5,6,7,8,9,1Ø)

or,

(1.2..4.2); # is equal to (1.2,2.2,3.2,4.2,5.2,6.2)

or,

(1..4,8,1Ø); # is equal to (1,2,3,4,8,1Ø)

or,

($A..$B); # is equal to the range of # values between $A and $B

If the value to the left of the list constructor operator is less than the value to the right, an empty list is created. The list constructor operator also only works on whole numbers. When it comes to the last whole number in the sequence, it stops:

(1.3..4.2); # would create
# (1.3,2.3,3.3) and then stop

You can use the print operator with arrays to output a statement when you want to have a constant piece of text updated, as below:

$total = @user_list # $total gets the length of 
     # @user_list
print ("You are visitor number ",$total,"\n");

where each time a new user accesses the site, a new value is added to the array @user_list. The number of values in the array, or its length, is printed using the variable $total.

Array Variables

Just like scalars can be variables, so can arrays. While the scalar variable is marked with the "$" symbol, an array variable is marked with a "@" symbol, like @total, @DATE, or @A_very_long_name_for_an_array_variable. Other than this difference, they follow the same naming format, not being limited in character length at all. An array variable's value will not replace that of a scalar variable that has the same name. Perl keeps all these values separate.

Two other points about array variables are that an empty array gives a null value, or empty list, and that an expression can modify either an entire array variable or only a part of an array variable. This is different from the scalar variable, which only holds one value, so it is either completely modified or left the same.

Array Operators

Array operators work in much the same way as scalar operators. They can return values, which can then be placed into another value for another operator, or assigned into another array variable. What sets them apart from scalar operators is that array operators are concerned with modifying a list of values, where as scalar operators only modify one value.

Among the array operators that are most important are the assignment, subscripting, push & pop, shift & unshift, reverse, sort, and chop operators. Other array features are used with accessing elements and <STDIN>. An element in an array is the name given to each value in the list. Elements are also used when dealing with associative arrays, covered in Chapter 3 in referencing each value/key pair.

The Assignment Operator

This array operator places a value into an array variable. Using the same symbol for this as scalar variables, the array assignment operator uses the "=" symbol to place variable values. Perl differentiates between a scalar and array assignment based on the variable being assigned; it notes a scalar assignment with a scalar variable, and an array assignment with an array variable. When a scalar value is assigned to an array variable it becomes an element of the array variable. Arrays can hold one or more elements.

Array variables can be included inside an array list. They are converted to their array literal value when the list is determined, as in this example:

@olduser = ("Bob","John");
@newuser = ("Nancy","Lisa",@olduser);
@alluser = (@newuser,"Magdalene");
@alluser = ("Admin","Jack",@alluser);

which gives the array list @alluser the literal values of

 Admin,Jack,Nancy,Lisa,Bob,John,Magdalene

You can see where adding the array variable to the array variable's own list places new values in front, or behind, the current list.

An array literal housing a list with no expressions can be handled as a variable as well, but a list with added array elements cannot. Array literals working as variables would look like this:

($F,$Y,$I) = (1,2,3); # here $F is 1,
# $Y is 2 and $I is 3.
($P,$C) = ($C,$P); # here $P and $C
# trade values
($P,@C) = ($F,$Y,$I); # here $P is $F
# and @C is ($Y,$I)
($A,@C) = @C; # here the first value
# in @C is transferred to $A, making
# $A = $Y and @C = ($I)

where any extra variables on the left side of the assignment operator are given the undef value, and an extra value on the right disappears into the ether of Perl.

If you use an array variable in a literal array list, it must be the last variable. Any other array variable used in the literal will be assigned the undef value. This is important if your script doesn't have values for each element in an array, because the undef value in Perl is treated as a false value. The undef value can also be used to find the end of the elements with assigned values in an array.

When you assign an array variable to a scalar variable, the value defined by the scalar is the number of array list entries, or length of the list, as with:

@IMHO = (5,4,6,7,2);
$IMHO= @IMHO; # where $IMHO now has
# the value of 5, the length of
     # the list of @IMHO.

This is also true when an array variable name is put in place of a scalar value, as with:

$IMHO = @PGP; # where $IMHO gets the
# length of @PGP

($SSE) = @VTR; # where $SSE gets
# the first element of @VTR as
    # its value

In the last example, if there are other values of @NEC, they are dropped because there is only one scalar variable on the left.

You can nest array assignments inside other array assignments to get this result:

@BBQ = (@ORA = (4,8,6)); # where
# @BBQ and @ORA have the list (4,8,6)

@BBQ = @ORA = (4,8,6) # will give
# you the same result.

Array and Scalar Context As mentioned earlier, it depends on which side of the assignment operator the scalar and/or array appears. If Perl is told the operand is scalar, then the operation is figured with a scalar context. Conversely, if Perl feels the operand is an array, then it is computed with an array context.

Context can affect your routines by changing the values of your arrays and scalars, so be careful. An expression can be made to assess an operand in a scalar context by concatenating a null string, like so:

@Z = ("d","a","v","i","s");
print (@Z." is a magic number\n");
to output
% 5 is a magic number

NOTE

The "%" symbol is used here to denote the command line in the Perl terminal, a terminal screen that looks like the MS DOS terminal screen. The "%" symbol is typically presented by the interpreter in the DOS-like terminal window to show the user where the command line is. Perl also uses this symbol for the modulus operator, so please don't get confused between the Perl scripting use of the "%" symbol and my signifying the Perl command line with it.

The Subscripting Operator

In an array list each element is assigned an integer marker, or index value, starting on the left with 0 and going up one whole number to the end of the list. These indicators are used by the element access operator, or subscripting operator, to transfer values to other variables. It can be useful to use elements from an array for another purpose by transfering them into an scalar, or modifying the existing elements in an array. These abilities are useful to update any kind of list of data. In this example, the various ways to use subscripting operators are demonstrated.

@numbers = (3,5,7);
$pick_one = $numbers[Ø]; # this
# transfers 3 to $pick_one
$numbers[Ø] = 4; # gives @numbers the
# elements (4,5,7)
	$pick_another = $numbers[2]; 
# transfers 7 to $pick_another
	$numbers[1]++ # autoincrement the 
# second element in @numbers
	$numbers[Ø] += 8; # adds 8 to the
# first element, list value now
#  (12,6,7)
	($numbers[1],$numbers[Ø] = ($numbers[Ø],$numbers[1]);
        # the first and second elements in
        # $numbers are switched, list value
        # now (6,12,7)

The method of accessing of an array's list elements used in the previous example is called a slice. A slice is given its own short-form:

($numbers[1],$numbers[Ø]) = @numbers[1,Ø]

where the array variable replaces the scalar variable. Slices can be used to change the elements quickly in any array literal list. Slices can also be used with array expressions, as:

@hand1 = (3,5,7,1Ø,4);
@hand2 = (5,2,1,3,9);
@kitty = @hand1[@hand2]; # giving 
# @kitty the value of @hand1 
# [5,2,1,3,9], which moves the 
# literal values to be (3,7,5,1Ø,4)

where the values in @hand2 become the index value in the slice @hand1 [@hand2].

When you go past the array elements given in either direction (meaning less than 0 or greater than the last element's index value) when you are accessing it, only the undef value will be returned. If you go beyond the upper value, you will extend the array to that new value, with undef given to all the values in between. An example of this is:

@images = (4,5,6);
$images[3] = "moon"; # making @images
# now (4,5,6,"moon")
$images[5] = "sun"; # producing 
     # (4,5,6,"moon",undef,"sun")

NOTE

It is unclear what happens if you assign an array element of less than 0 using a subscript operator, but all the sources I checked advise heavily against it.

One last trick in element accessing is to use the "$#" symbols to find out the index value of the last element of an array variable. The format for this is:

@numbers = (1,2,4,3); # an array
# variable with 4 as
# the last index value
$#numbers = $winner; # gives
# $winner the value of 4

The Operators Push and Pop

A popular use of arrays is to store large amounts of data. To better handle the addition and subtraction of elements from the last index value of array variables, Perl has the push and pop operators. With push a new element is added to the array, and with pop the last element is take away. It works like this:

push(@months,$april); # adding the value
# of $april as the last element
    # in @months

pop(@months) = $last; # where the last element in
	# @months is transferred to $last

You might notice that using the push statement is the same as writing

@months = (@months,$april);

and this is true.

NOTE

In Perl you can solve the same problems in many ways. This is one of the best all around aspects of Perl; it can conform to how you view problem-solving and how you understand Perl. This also might provide you with more options than you need, leading to more confusion than is necessary. In this book I try and promote the simplest, and (I hope) the quickest, ways to solve your scripting problems. That doesn't mean they are the only ways, just one of many ways. Don't ever be afraid to experiment with Perl-it is one of the best ways to gain a better understanding of the language, and you can possibly create new tools for your problem solving bag of tricks.

Another use for the push operator is to add a number of new elements to an array variable. This is done like so:

@images = (4,5,8);
push(@images,3,8,1); # giving @images
# the new list value of (4,5,8,3,8,1)

The Shift and Unshift Operators

Similar to push and pop, the shift and unshift operators add and subtract elements from arrays, but they work on the left side, or the start, of the array's elements, while push and pop work on the right side, or the end.

Building on this understanding, we would expect shift and unshift to work like this:

@images = (4,3,6);
unshift(@images,1,8,4); # makes
# @images = (1,8,4,4,3,6)

and

$minus = shift(@images); # takes the
# value 1 away from @images and
# places it in $minus

When working with the pop, push, shift, and unshift operators, the first argument has to be an array variable, or nothing good will happen. If pop or shift is given an empty array variable, they will return the undef value.

The Reverse Operator

This operator performs the obvious by reversing the order of an array variable's elements, as with:

@up = (1,2,3);
@down = reverse(@up); # giving @down
# the value (3,2,1)

The reverse operator leaves the original array variable intact. If you want to reverse an array variable's order and have this as its new value try the following:

@up = reverse(@up);

The Sort Operator

This is another way to reorder the elements in your arrays. Using sort will arrange the elements in ascending order based on their ASCII significance. A typical use of sort is:

@users = sort("bob","tim","anne","daisy");
@alpha_users = @users;
print @alpha_users;
% anne,bob,daisy,tim

Remember, the "%" signifies the command line of a Perl terminal screen. The sort operator works with numbers too, but based on their ASCII order, not numeric, as with:

@count = (1,2,8,34,67,15);
@count = sort(@count);
print @count;
% 1,15,2,34,67,8

so be careful using sort with numbers. Perl can sort numerically using a comparison routine with the sort operator. This is covered in the next chapter.

The Chop Operator Again

Yes, you've seen it before, and here it is again, good old chop. When working with array variables, or in an array context, the chop operator behaves in the same way as with scalar variables. By using chop on an array you remove the last character of each element. For example:

@word_guess = ("honk/n"," feed/n"," feathers");
chop @word_guess;
print "You guessed @word_guess";

produces this at the command line:

% You guessed honk, feed, feather

and as you can see, chop can be very useful in eliminating /n commands from an entire array.

Arrays and <STDIN>

Like the chop operator, <STDIN> is also used with arrays. With arrays, <STDIN> will return all the remaining lines of the file. Every line is read as a separate element. In a Perl terminal environment, <STDIN> could be used with an array, like so:

# word_guess.pl
print "Your guess?"
@guess = <STDIN>; # where the user's
# guess is input to <STDIN>
% moo 
% milk
% udder

and then the user types Control D to end the cycle. This gives the script these array values:

@guess = ("moo\n","milk\n","udder\n")

And now you can see where chop comes in to get rid of the pesky /n's, like so:

chop (@guess);
# so now @guess = (moo,milk,udder)

Array Interpolation

Double-quoted strings can be used the same way with arrays as they are with scalars. When you use them, they might look like this:

@weather = ("sunny","rainy");
$forcast = "Tomorrow will be $weather[Ø]";
# giving $forcast the literal value
# of "Tomorrow will be sunny"

When you interpolate, the expression used to indicate the index is not treated as a variable inside the string. This means that Perl reads the first value of an expression as the index expression. To illustrate:

@weather = ("sunny","rainy");
$change = (2*2);
$forcast = "The farm has $weather[$change -1]"; # gives
# $forcast the literal value of
# "The farm has mud" as well

where the value of $change is not computed inside the string, so the value of $change is read as being 2.

You can also interpolate value lists from array variables. This will include the list in a scalar literal value, like this:

@pig = ("corn","hay","slop");
$eat = "Pigs like to eat @pig when they are hungry";
print $eat;
% Pigs like to eat corn hay slop when they are hungry

With some clever editing, you can include commas, periods, and spaces to produce grammatical sentences in your interpolated array variables.

If you don't want to add the entire array list, you can use a slice to move what you want, as with:

@pig = ("corn","hay","slop");
$eat = "Pigs like to eat @pig[1,2] when they are hungry";
print $eat;
% Pigs like to eat hay slop when they are hungry

which brings us to the end of arrays. For now, anyway.

Believe it or not, we've learned a lot about Perl and text manipulation in general. We distinguished between scalar and array variables, scalars being able to hold one value, while arrays can hold a series, or list, or values. Both scalars and arrays have their own literal values. These variables can be modified by various operators, whether their literal values are strings or numbers.

Before we move ahead and start programming with examples that lead us into the CGI routines, we need to have an understanding of the one feature of Perl that provides a great deal of power: regular expressions. Learning what regular expressions are and how they work in Perl is the focus of the next chapter. You'll also explore the other important features of Perl.

Chapter 2

Principles of General Text Processing-The Backbone of Perl

Single-Quoted Strings

Double-Quoted Strings

Numeric Operators

String Operators

A Note about Filehandles

The Assignment Operator

The Subscripting Operator

The Operators Push and Pop

The Shift and Unshift Operators

The Reverse Operator

The Sort Operator

The Chop Operator Again

Principles of General Text Processing-
The Backbone of Perl