Chapter 13

Process, String, and Mathematical Functions


CONTENTS

Today's lesson describes three groups of built-in Perl functions:

Many of the functions described today use features of the UNIX operating system. If you are using Perl on a machine that is not running UNIX, some of these functions might not be defined or might behave differently.
Check the documentation supplied with your version of Perl for details on which functions are supported or emulated on your machine

Process- and Program-Manipulation Functions

Perl provides a wide range of functions that manipulate both the program currently being executed and other programs (also called processes) running on your machine. These functions are divided into four groups:

The following sections describe these four groups of process- and program-manipulation functions.

Starting a Process

Several built-in functions provide different ways of creating processes: eval, system, fork, pipe, exec, and syscall. These functions are described in the following subsections.

The eval Function

The eval function treats a character string as an executable Perl program.

The syntax for the eval function is


eval (string);

Here, string is the character string that is to become a Perl program.

For example, these two lines of code:


$print = "print (\"hello, world\\n\");";

eval ($print);

print the following message on your screen:


hello, world

The character string passed to eval can be a character-string constant or any expression that has a value which is a character string. In this example, the following string is assigned to $print, which is then passed to eval:


print ("hello, world\n");

The eval function uses the special system variable $@ to indicate whether the Perl program contained in the character string has executed properly. If no error has occurred, $@ contains the null string. If an error has been detected, $@ contains the text of the message.

The subprogram executed by eval affects the program that called it; for example, any variables that are changed by the subprogram remain changed in the main program. Listing 13.1 provides a simple example of this.


Listing 13.1. A program that illustrates the behavior of eval.

1:  #!/usr/local/bin/perl

2:  

3:  $myvar = 1;

4:  eval ("print (\"hi!\\n\"); \$myvar = 2;");

5:  print ("the value of \$myvar is $myvar\n");



$ program13_1

hi!

the value of $myvar is 2

$

The call to eval in line 4 first executes the statement


print ("hi!\n");

Then it executes the following assignment, which assigns 2 to $myvar:


$myvar = 2;

The value of $myvar remains 2 in the main program, which means that line 5 prints the value 2. (The backslash preceding the $ in $myvar ensures that the Perl interpreter does not substitute the value of $myvar for the name before passing it to eval.)

NOTE
If you like, you can leave off the final semicolon in the character string passed to eval, as follows:
eval ("print (\"hi!\\n\"); \$myvar = 2");
As before, this prints hi! and assigns 2 to $myvar

The eval function has one very useful property: If the subprogram executed by eval encounters a fatal error, the main program does not halt. Instead, the subprogram terminates, copies the error message into the system variable $@, and returns to the main program.

This feature is very useful if you are moving a Perl program from one machine to another and you are not sure whether the new machine contains a built-in function you need. For example, Listing 13.2 tests whether the tell function is implemented.


Listing 13.2. A program that uses eval to test whether a function is implemented.

1:  #!/usr/local/bin/perl

2:  

3:  open (MYFILE, "file1") || die ("Can't open file1");

4:  eval ("\$start = tell(MYFILE);");

5:  if ($@ eq "") {

6:          print ("The tell function is defined.\n");

7:  } else {

8:          print ("The tell function is not defined!\n");

9:  }



$ program13_2

The tell function is defined.

$

The call to eval in line 4 creates a subprogram that calls the function tell. If tell is defined, the subprogram assigns the location of the next line (which, in this case, is the first line) to read to the scalar variable $start. If tell is not defined, the subprogram places the error message in $@.

Line 5 checks whether $@ is the null string. If $@ is empty, the subprogram in line 4 executed without generating an error, which means that the tell function is implemented. (Because assignments performed in the subprogram remain in effect in the main program, the main program can call seek using the value in $start, if desired.) If $@ is not empty, the program assumes that tell is not defined, and it prints a message proclaiming that fact. (This program is assuming that the only reason the subprogram could fail is because tell is not defined. This is a reasonable assumption, because you know that the file referenced by MYFILE has been successfully opened.)

Although eval is very useful, it is best to use it only for small programs. If you need to generate a larger program, it might be better to write the program to a file and call system to execute it. (The system function is described in the following section.)
Because statements executed by eval affect the program that calls it, the behavior of complicated programs might become difficult to track if eval is used to excess.

The system Function

You have seen examples of the system function in earlier lessons.

The syntax for the system function is


system (list);

This function is passed a list as follows: The first element of the list contains the name of a program to execute, and the other elements are arguments to be passed to the program.

When system is called, it starts a process that runs the program and waits until the process terminates. When the process terminates, the error code is shifted left eight bits, and the resulting value becomes system's return value. Listing 13.3 is a simple example of a program that calls system.


Listing 13.3. A program that calls system.

1:  #!/usr/local/bin/perl

2:  

3:  @proglist = ("echo", "hello, world!");

4:  system(@proglist);



$ program13_3

hello, world!

$

In this program, the call to system executes the UNIX program echo, which displays its arguments. The argument passed to echo is hello, world!.

TIP
When you start another program using system, output data might be mixed, out of sequence, or duplicated.
To get around this problem, set the system variable $|, defined for each file, to 1. The following is an example:
select (STDOUT);
$| = 1;
select (STDERR);
$| = 1;
When $| is set to 1, no buffer is defined for that file, and output is written out right away. This ensures that the output behaves properly when system is called.
See "Redirecting One File to Another" on Day 12, "Working with the File System," for more information on select and $|

The fork Function

The fork function creates two copies of your program: the parent process and the child process. These copies execute simultaneously.

The syntax for the fork function is


procid = fork();

fork returns zero to the child process and a nonzero value to the parent process. This nonzero value is the process ID of the child process. (A process ID is an integer that enables the system to distinguish this process from the other processes currently running on the machine.)

The return value from fork enables you to determine which process is the child process and which is the parent. For example:


$retval = fork();

if ($retval == 0) {

        # this is the child process

        exit;   # this terminates the child process

} else {

        # this is the parent process

}

If fork is unable to execute, the return value is a special undefined value for which you can test by using the defined function. (For more information on defined, see Day 14, "Scalar- Conversion and List-Manipulation Functions.")

To terminate a child process created by fork, use the built-in function exit, which is described later in today's lesson.

Be careful when you use the fork function. The following are a few examples of what can go wrong:
  • If both copies of the program execute calls to print or any other output-generating function, the output from one copy might be mixed with the output from the other copy. There is no way to guarantee that output from one copy will appear before output from the other, unless you force one process to wait for the other.
  • If you use fork in a loop, the program might wind up generating many copies of itself. This can affect the performance of your system (or crash it completely).
  • Your child process might wind up executing code that your parent process is supposed to execute, or vice versa

The pipe Function

The pipe function is designed to be used in conjunction with the fork function. It provides a way for the child and parent processes to communicate.

The syntax for the pipe function is


pipe (infile, outfile);

pipe requires two arguments, each of which is a file variable that is not currently in use-in this case, infile and outfile. After pipe has been called, information sent via the outfile file variable can be read using the infile file variable. In effect, the output from outfile is piped to infile.

To use pipe with fork, do the following:

  1. Call pipe.
  2. Call fork to split the program into parent and child processes.
  3. Have one of the processes close infile, and have the other close outfile.

The process in which outfile is still open can now send data to the process in which infile is still open. (The child can send data to the parent, or vice versa, depending on which process closes input and which closes output.)

Listing 13.4 shows how pipe works. It uses fork to create a parent and child process. The parent process reads a line of input, which it passes to the child process. The child process then prints it.


Listing 13.4. A program that uses fork and pipe.

 1:  #!/usr/local/bin/perl

 2:  

 3:  pipe (INPUT, OUTPUT);

 4:  $retval = fork();

 5:  if ($retval != 0) {

 6:          # this is the parent process

 7:          close (INPUT);

 8:          print ("Enter a line of input:\n");

 9:          $line = <STDIN>;

10:         print OUTPUT ($line);

11: } else {

12:         # this is the child process

13:         close (OUTPUT);

14:         $line = <INPUT>;

15:         print ($line);

16:         exit (0);

17: }



$ program13_4

Enter a line of input:

Here is a test line

Here is a test line

$

Line 3 defines the file variables INPUT and OUTPUT. Data sent to OUTPUT can be now read from INPUT.

Line 4 splits the program into a parent process and a child process. Line 5 then determines which process is which.

The parent process executes lines 7-10. Because the parent process is sending data through OUTPUT, it has no need to access INPUT; therefore, line 7 closes INPUT.

Lines 8 and 9 obtain a line of data from the standard input file. Line 10 then sends this line of data to the child process via the file variable OUTPUT.

The child process executes lines 13-16. Because the child process is receiving data through INPUT, it does not need access to OUTPUT; therefore, line 13 closes OUTPUT.

Line 14 reads data from INPUT. Because data from OUTPUT is piped to INPUT, the program waits until the data is actually sent before continuing with line 15.

Line 16 uses exit to terminate the child process. This also automatically closes INPUT.

Note that the <INPUT> operator behaves like any other operator that reads input (such as, for instance, <STDIN>). If there is no more data to read, INPUT is assumed to be at the "end of file," and <INPUT> returns the null string.

Traffic through the file variables specified by pipe can flow in only one direction. You cannot have a process both send and receive on the same pipe.
If you need to establish two-way communication, you can open two pipes, one in each direction

The exec Function

The exec function is similar to the system function, except that it terminates the current program before starting the new one.

The syntax for the exec function is


exec (list);

This function is passed a list as follows: The first element of the list contains the name of a program to execute, and the other elements are arguments to be passed to the program.

For example, the following statement terminates the Perl program and starts the command mail dave:


exec ("mail dave");

Like system, exec accepts additional arguments that are assumed to be passed to the command being invoked. For example, the following statement executes the command vi file1:


exec ("vi", "file1");

You can specify the name that the system is to use as the program name, as follows:


exec "maildave" ("mail dave");

Here, the command mail dave is invoked, but the program name is set to maildave. (This affects the value of the system variable $0, which contains the name of the running program. It also affects the value of argv[0] if the program to be invoked was originally written in C.)

exec often is used in conjunction with fork: when fork splits into two processes, the child process starts another program using exec.

exec has the same output-buffering problems as system. See the description of system, earlier in today's lesson, for a description of these problems and how to deal with them

The syscall Function

The syscall function calls a system function.

The syntax for the syscall function is


syscall (list);

syscall expects a list as its argument. The first element of the list is the name of the system call to invoke, and the remaining elements are arguments to be passed to the call.

If an argument in the list passed to syscall is a numeric value, it is converted to a C integer (type int). Otherwise, a pointer to the string value is passed. See the syscall UNIX manual page or the Perl documentation for more details.

NOTE
The Perl header file syscall.ph must be included in order to use syscall:
require ("syscall.ph")
For more information on require, see Day 20, "Miscellaneous Features of Perl."

Terminating a Program or Process

The following sections describe the functions that terminate either the currently executing program or a process running elsewhere on the system: die, warn, exit, and kill.

The die and warn Functions

The die and warn functions provide a way for programs to pass urgent messages back to the user who is running them.

The die function terminates the program and prints an error message on the standard error file.

The syntax for the die function is


die (message);

message is the error message to be displayed.

For example, the call


die ("Cannot open input file\n");

prints the following message and then exits:


Cannot open input file

die can accept a list as its argument, in which case all elements of the list are printed.


@diemsg = ("I'm about ", "to die\n");

die (@diemsg);

This prints out the following message and then exits:


I'm about to die

If the last argument passed to die ends with a newline character, the error message is printed as is. If the last argument to die does not end with a newline character, the program filename and line number are printed, along with the line number of the input file (if applicable). For example, if line 6 of the file myprog is


die ("Cannot open input file");

the message it prints is


Cannot open input file at myprog line 6.

The warn function, like die, prints a message on the standard error file.

The syntax for the warn function is


warn (message);

As with die, message is the message to be displayed.

warn, unlike die, does not terminate. For example, the statement


warn ("Input file is empty");

sends the following message to the standard error file, and then continues executing:


Input file is empty at myprog line 76.

If the string passed to warn is terminated by a newline character, the warning message is printed as is. For example, the statement


warn("Danger! Danger!\n");

sends


Danger! Danger!

to the standard error file.

NOTE
If eval is used to invoke a program that calls die, the error message printed by die is not printed; instead, the error message is assigned to the system variable $@

The exit Function

The exit function terminates a program.

If you like, you can specify a return code to be passed to the system by passing exit an argument using the following syntax:


exit (retcode);

retcode is the return code you want to pass.

For example, the following statement terminates the program with a return code of 2:


exit(2);

The kill Function

The kill function enables you to send a signal to a group of processes.

The syntax for invoking the kill function is


kill (signal, proclist);

In this case, signal is the numeric signal to send. (For example, a signal of 9 kills the listed processes.) proclist is a list of process IDs (such as the child process ID returned by fork).

signal also can be a signal name enclosed in quotes, as in "INT".

For more details on the signals you can send, refer to the kill UNIX manual page.

Execution Control Functions

The sleep, wait, and waitpid functions delay the execution of a particular program or process.

The sleep Function

The sleep function suspends the program for a specified number of seconds.

The syntax for the sleep function is


sleep (time);

time is the number of seconds to suspend program execution.

The function returns the number of seconds that the program was actually stopped.

For example, the following statement puts the program to sleep for five seconds:


sleep (5);

The wait and waitpid Functions

The wait function suspends execution and waits for a child process to terminate (such as a process created by fork).

The wait function requires no arguments:


procid = wait();

When a child process terminates, wait returns the process ID, procid, of the process that has terminated. If no child processes exist, wait returns -1.

The waitpid function waits for a particular child process.

The syntax for the waitpid function is


waitpid (procid, waitflag);

procid is the process ID of the process to wait for, and waitflag is a special wait flag (as defined by the waitpid or wait4 manual page). By default, waitflag is 0 (a normal wait). waitpid returns 1 if the process is found and has terminated, and it returns -1 if the child process does not exist.

Listing 13.5 shows how waitpid can be used to control process execution.


Listing 13.5. A program that uses waitpid.

 1:  #!/usr/local/bin/perl

 2:  

 3:  $procid = fork();

 4:  if ($procid == 0) {

 5:          # this is the child process

 6:          print ("this line is printed first\n");

 7:          exit(0);

 8:  } else {

 9:          # this is the parent process

10:         waitpid ($procid, 0);

11:         print ("this line is printed last\n");

12: }



$ program13_5

this line is printed first

this line is printed last

$

Line 3 splits the program into a parent process and a child process. The parent process is returned the process ID of the child process, which is stored in $procid.

Lines 6 and 7 are executed by the child process. Line 6 prints the following line:


this line is printed first

Line 7 then calls exit, which terminates the child process.

Lines 10 and 11 are executed by the parent process. Line 10 calls waitpid and passes it the ID of the child process; therefore, the parent process waits until the child process terminates before continuing. This means that line 11, which prints the second line, is guaranteed to be executed after the first line is printed.

As you can see, wait can be used to force the order of execution of processes.

NOTE
For more information on the possible values that can be passed as waitflag, examine the file wait.ph, which is available from the same place you retrieved your copy of Perl. (It might already be on your system.) You can find out more also by investigating the waitpid and wait4 manual pages

Miscellaneous Control Functions

The caller, chroot, local, and times functions perform various process and program-related actions.

The caller Function

The caller function returns the name and the line number of the program that called the currently executing subroutine.

The syntax for the caller function is


subinfo = caller();

caller returns a three-element list, subinfo, consisting of the following:

This routine is used by the Perl debugger, which you'll learn about on Day 21, "The Perl Debugger." For more information on packages, refer to Day 20, "Miscellaneous Features of Perl."

The chroot Function

The chroot function duplicates the functionality of the chroot function call.

The syntax for the chroot function is


chroot (dir);

dir is the new root directory.

In the following example, the specified directory becomes the root directory for the program:


chroot ("/u/jqpublic");

For more information, refer to the chroot manual page.

The local Function

The local function was introduced on Day 9, "Using Subroutines." It declares that a copy of a named variable is to be defined for a subroutine. (Refer to that day for examples that use local inside a subroutine.)

local can be used also to define a copy of a variable for use inside a statement block (a collection of statements enclosed in brace brackets), as follows:


if ($var == 14) {

        local ($localvar);

        # stuff goes here

}

This defines a local copy of the variable $localvar for use inside the statement block. Any other copies of $localvar that exist are not affected by the changes to this local copy.

DON'T use local inside a loop, as in this example:
while ($var <= 14) {
local ($myvar);
# stuff goes here
}
Here, a new copy of $myvar is defined each time the loop iterates. This is probably not what you want.

The times Function

The times function returns the amount of job time consumed by this program and any child processes of this program.

The syntax for the times function is


timelist = times

As you can see, times accepts no arguments. It returns timelist, a list consisting of the following four floating-point numbers:

Mathematical Functions

Perl provides functions that perform the standard trigonometric operations, plus some other useful mathematical operations. The following sections describe these functions: sin, cos, atan2, sqrt, exp, log, abs, rand, and srand.

The sin and cos Functions

The sin and cos functions are passed a scalar value and return the sine and cosine, respectively, of the value.

The syntax of the sin and cos functions is


retval = sin (value);

retval = cos (value);

value is a placeholder here. It can be the value stored in a scalar variable or the result of an expression; it is assumed to be in radians. See the following section, "The atan2 Function," to find out how to convert from radians to degrees.

The atan2 Function

The atan2 function calculates and returns the arctangent of one value divided by another, in the range -p to p.

The syntax of the atan2 function is


retval = atan2 (value1, value2);

If value1 and value2 are equal, retval is the value of p divided by 4.

Listing 13.6 shows how you can use this to convert from degrees to radians.


Listing 13.6. A program that contains a subroutine that converts from degrees to radians.

 1:  #!/usr/local/bin/perl

 2:  

 3:  $rad90 = &degrees_to_radians(90);

 4:  $sin90 = sin($rad90);

 5:  $cos90 = cos($rad90);

 6:  print ("90 degrees:\nsine is $sin90\ncosine is $cos90\n");

 7:  

 8:  sub degrees_to_radians {

 9:          local ($degrees) = @_;

10:         local ($radians);

11: 

12:         $radians = atan2(1,1) * $degrees / 45;

13: }



$ program13_6

90 degrees:

sine is 1

cosine is 6.1230317691118962911e-17

$

The subroutine degrees_to_radians converts from degrees to radians by multiplying by p divided by 180. Because atan2(1,1) returns p divided by 4, all the subroutine needs to do after that is divide by 45 to obtain the number of radians.

In the main body of the program, line 3 converts 90 degrees to the equivalent value in radians (p divided by 2). Line 4 then passes this value to sin, and line 5 passes it to cos.

NOTE
The trigonometric operations provided here are sufficient to enable you to perform the other important trigonometric operations. For example, to obtain the tangent of a value, obtain the sine and cosine of the value by calling sin and cos, and then divide the sine by the cosine

The sqrt Function

The sqrt function returns the square root of the value it is passed.

The syntax for the sqrt function is


retval = sqrt (value);

value can be any positive number.

The exp Function

The exp function returns the number e ** value, where e is the standard mathematical constant (the base for the natural logarithm) and value is the argument passed to exp.

The syntax for the exp function is


retval = exp (value);

To retrieve e itself, pass exp the value 1.

The log Function

The log function takes a value and returns the natural (base e) logarithm of the value.

The syntax for the log function is


retval = log (value);

The log function undoes exp; the expression


$var = log (exp ($var));

always leaves $var with the value it started with (if you factor in round-off error).

The abs Function

The abs function returns the absolute value of a number. This is defined as follows: if a value is less than zero, abs negates it and returns the result.


$result = $abs(-3.5);   # returns 3.5

Otherwise, the result is identical to the value:


$result = $abs(3.5);    # returns 3.5

$result = $abs(0);      # returns 0

The syntax for the abs function is


retval = abs (value);

value can be any number.

NOTE
abs is not defined in Perl 4

The rand and srand Functions

The rand and srand functions enable Perl programs to generate random numbers.

The rand function is passed an integer value and generates a random floating-point number between 0 and the value.

The syntax for the rand function is


retval = rand (num);

num is the integer value passed to rand, and retval is a random floating-point number between 0 and the num.

For example, the following statement generates a number between 0 and 10 and returns it in $retval:


$retval = rand (10);

srand initializes the random-number generator used by rand. This ensures that the random numbers generated are, in fact, random. (If you do not use srand, you'll get the same set of random numbers each time.)

The syntax for the srand function is


srand (value);

srand accepts an integer value as an argument; if no argument is supplied, srand calls the time function and uses its return value as the random-number seed.

For an example that uses rand and srand, see the section titled "Returning a Value from a Subroutine" on Day 9.

NOTE
The following values and functions return numbers that can make useful random-number seeds:
  • The system variable $$ contains the process ID of the current program. (See Day 17, "System Variables," for more information on $$.)
  • time returns the current time value.
  • Many of the functions described on Day 15, "System Functions," return useful values. For example, getppid returns the process ID of the program's parent process.
For best results, combine two or more of these using the | (bitwise OR) operator

String-Manipulation Functions

This section describes the built-in Perl functions that manipulate character strings. These functions enable you to do the following:

The index Function

The index function provides a way of indicating the location of a substring in a string.

The syntax for the index function is


position = index (string, substring);

string is the character string to search in, and substring is the character string being searched for. position returns the number of characters skipped before substring is located; if substring is not found, position is set to -1.

Listing 13.7 is a program that uses index to locate a substring in a string.


Listing 13.7. A program that uses the index function.

1:  #!/usr/local/bin/perl

2:  

3:  $input = <STDIN>;

4:  $position = index($input, "the");

5:  if ($position >= 0) {

6:          print ("pattern found at position $position\n");

7:  } else {

8:          print ("pattern not found\n");

9:  }



$ program13 7

Here is the input line I have typed.

pattern found at position 8

$

This program searches for the first occurrence of the word the. If it is found, the program prints the location of the pattern; if it is not found, the program prints pattern not found.

You can use the index function to find more than one copy of a substring in a string. To do this, pass a third argument to index, which tells it how many characters to skip before starting to search. For example:


$position = index($line, "foo", 5);

This call to index skips five characters before starting to search for foo in the string stored in $line. As before, if index finds the substring, it returns the total number of characters skipped (including the number specified by the third argument to index). If index does not find the substring in the portion of the string that it searches, it returns -1.

This feature of index enables you to find all occurrences of a substring in a string. Listing 13.8 is a modified version of Listing 13.7 that searches for all occurrences of the in an input line.


Listing 13.8. A program that uses index to search a line repeatedly.

 1:  #!/usr/local/bin/perl

 2:  

 3:  $input = <STDIN>;

 4:  $position = $found = 0;

 5:  while (1) {

 6:          $position = index($input, "the", $position);

 7:          last if ($position == -1);

 8:          if ($found == 0) {

 9:                  $found = 1;

10:                 print ("pattern found - characters skipped:");

11:         }

12:         print (" $position");

13:         $position++;

14: }

15: if ($found == 0) {

16:         print ("pattern not found\n");

17: } else {

18:         print ("\n");

19: }



$ program13 8

Here is the test line containing the words.

pattern found - characters skipped: 8 33

$

Line 6 of this program calls index. Because the initial value of $position is 0, the first call to index starts searching from the beginning of the string. Eight charact-ers are skipped before the first occurrence of the is found; this means that $position is assigned 8.

Line 7 tests whether a match has been found by comparing $position with -1, which is the value index returns when it does not find the string for which it is looking. Because a match has been found, the loop continues to execute.

When the loop iterates again, line 6 calls index again. This time, index skips nine characters before beginning the search again, which ensures that the previously found occurrence of the is skipped. A total of 33 bytes are skipped before the is found again. Once again, the loop continues, because the conditional expression in line 7 is false.

On the final iteration of the loop, line 6 calls index and skips 34 characters before starting the search. This time, the is not found, index returns -1, and the conditional expression in line 7 is true. At this point, the loop terminates.

NOTE
To extract a substring found by index, use the substr function, which is described later in today's lesson

The rindex Function

The rindex function is similar to the index function. The only difference is that rindex starts searching from the right end of the string, not the left.

The syntax for the rindex function is


position = rindex (string, substring);

This syntax is identical to the syntax for index. string is the character string to search in, and substring is the character string being searched for. position returns the number of characters skipped before substring is located; if substring is not found, position is setto -1.

The following is an example:


$string = "Here is the test line containing the words.";

$position = rindex($string, "the");

In this example, rindex finds the second occurrence of the. As with index, rindex returns the number of characters between the left end of the string and the location of the found substring. In this case, 33 characters are skipped, and $position is assigned 33.

You can specify a third argument to rindex, indicating the maximum number of characters that can be skipped. For example, if you want rindex to find the first occurrence of the in the preceding example, you can call it as follows:


$string = "Here is the test line containing the words.";

$position = rindex($string, "the", 32);

Here, the second occurrence of the cannot be matched, because it is to the right of the specified limit of 32 skipped characters. rindex, therefore, finds the first occurrence of the. Because there are eight characters between the beginning of the string and the occurrence, $position is assigned 8.

Like index, rindex returns -1 if it cannot find the string it is looking for.

The length Function

The length function returns the number of characters contained in a character string.

The syntax for the length function is


num = length (string);

string is the character string for which you want to determine the length, and num is the returned length.

Here is an example using length:


$string = "Here is a string";

$strlen = length($string);

In this example, length determines that the string in $string is 16 characters long, and it assigns 16 to $strlen.

Listing 13.9 is a program that calculates the average word length used in an input file. (This is sometimes used to determine the "complexity" of the text.) Numbers are skipped.


Listing 13.9. A program that demonstrates the use of length.

 1:  #!/usr/local/bin/perl

 2:  

 3:  $wordcount = $charcount = 0;

 4:  while ($line = <STDIN>) {

 5:          @words = split(/\s+/, $line);

 6:          foreach $word (@words) {

 7:                  next if ($word =~ /^\d+\.?\d+$/);

 8:                  $word =~ s/[,.;:]$//;

 9:                  $wordcount += 1;

10:                 $charcount += length($word);

11:         }

12: }

13: print ("Average word length: ", $charcount / $wordcount, "\n");



$ program13 9

Here is the test input.

Here is the last line.

^D

Average word length: 3.5

$

This program reads a line of input at a time from the standard input file, breaking the input line into words. Line 7 tests whether the word is a number, and skips it if it is. Line 8 strips any trailing punctuation character from the word, which ensures that the punctuation is not counted as part of the word length.

Line 10 calls length to retrieve the number of characters in the word. This number is added to $charcount, which contains the total number of characters in all of the words that have been read so far. To determine the average word length of the file, line 13 takes this value and divides it by the number of words in the file, which is stored in $wordcount.

Retrieving String Length Using tr

The tr function provides another way of determining the length of a character string, in conjunction with the built-in system variable $_.

The syntax for the tr function is


tr/sourcelist/replacelist/

sourcelist is the list of characters to replace, and replacelist is the list of characters to replace with. (For details, see the following listing and the explanation provided with it.)

Listing 13.10 shows how tr works.


Listing 13.10. A program that uses tr to retrieve the length of a string.

1:  #!/usr/local/bin/perl

2:  

3:  $string = "here is a string";

4:  $_ = $string;

5:  $length = tr/a-zA-Z /a-zA-Z /;

6:  print ("the string is $length characters long\n");



$ program13 10

the string is 16 characters long

$

Line 3 of this program creates a string named here is a string and assigns it to the scalar variable $string. Line 4 copies this string into a built-in scalar variable, $_.

Line 5 exploits two features of the tr operator that have not yet been discussed:

In line 5, both the search pattern (the set of characters to look for) and the replacement pattern (the characters to replace them with) are the same. This pattern, /a-zA-Z /, tells tr to search for all lowercase letters, uppercase letters, and blank spaces, and then replace them with themselves. This pattern matches every character in the string, which means that every character is being translated.

Because every character is being translated, the number of characters translated is equivalent to the length of the string. This string length is assigned to the scalar variable $length.

tr can be used also to count the number of occurrences of a specific character, as shown in Listing 13.11.


Listing 13.11. A program that uses tr to count the occurrences of specific characters.

 1:  #!/usr/local/bin/perl

 2:  

 3:  $punctuation = $blanks = $total = 0;

 4:  while ($input = <STDIN>) {

 5:          chop ($input);

 6:          $total += length($input);

 7:          $_ = $input;

 8:          $punctuation += tr/,:;.-/,:;.-/;

 9:          $blanks += tr/ / /;

10: }

11: print ("In this file, there are:\n");

12: print ("\t$punctuation punctuation characters,\n");

13: print ("\t$blanks blank characters,\n");

14: print ("\t", $total - $punctuation - $blanks);

15: print (" other characters.\n");



$ program13 11

Here is a line of input.

This line, another line, contains punctuation.

^D

In this file, there are:

         4 punctuation characters,

         10 blank characters,

         56 other characters.

$

This program uses the scalar variable $total and the built-in function length to count the total number of characters in the input file (excluding the trailing newline characters, which are removed by the call to chop in line 5).

Lines 8 and 9 use tr to count the number of occurrences of particular characters. Line 8 replaces all punctuation characters with themselves; the number of replacements performed, and hence the number of punctuation characters found, is added to the total stored in $punctuation. Similarly, line 9 replaces all blanks with themselves and adds the number of blanks found to the total stored in $blanks. In both cases, tr operates on the contents of the scalar variable $_, because the =~ operator has not been used to specify another value to translate.

Line 14 uses $total, $punctuation, and $blanks to calculate the total number of characters that are not blank and not punctuation.

NOTE
Many other functions and operators accept $_ as the default variable on which to work. For example, lines 4-7 of this program also can be written as follows:
while (<STDIN>) {
chop();
$total += length();
For more information on $_, refer to Day 17, "System Variables.

The pos Function

The pos function, defined only in Perl 5, returns the location of the last pattern match in a string. It is ideal for use when repeated pattern matches are specified using the g (global) pattern-matching operator.

The syntax for the pos function is


offset = pos(string);

string is the string whose pattern is being matched. offset is the number of characters already matched or skipped.

Listing 13.12 illustrates the use of pos.


Listing 13.12. A program that uses pos to display pattern match positions.

1: #!/usr/local/bin/perl

2:

3: $string = "Mississippi";

4: while ($string =~ /i/g) {

5:         $position = pos($string);

6:         print("matched at position $position\n");

7: }



$ program13 12

matched at position 2

matched at position 5

matched at position 8

matched at position 11

This program loops every time an i in Mississippi is matched. The number displayed by line 6 is the number of characters to skip to reach the point at which pattern matching resumes. For example, the first i is the second character in the string, so the second pattern search starts at position 2.

NOTE
You can also use pos to change the position at which pattern matching is to resume. To do this, put the call to pos on the left side of an assignment:
pos($string) = 5;
This tells the Perl interpreter to start the next pattern search with the sixth character in the string. (To restart searching from the beginning, use 0.

The substr Function

The substr function lets you assign a part of a character string to a scalar variable (or to a component of an array variable).

The syntax for calls to the substr function is


substr (expr, skipchars, length)

expr is the character string from which a substring is to be copied; this character string can be the value stored in a variable or the value resulting from the evaluation of an expression. skipchars is the number of characters to skip before starting copying. length is the number of characters to copy; length can be omitted, in which case the rest of the string is copied.

Listing 13.13 provides a simple example of substr.


Listing 13.13. A program that demonstrates the use of substr.

1:  #!/usr/local/bin/perl

2:  

3:  $string = "This is a sample character string";

4:  $sub1 = substr ($string, 10, 6);

5:  $sub2 = substr ($string, 17);

6:  print ("\$sub1 is \"$sub1\"\n\$sub2 is \"$sub2\"\n");



$ program13 13

$sub1 is "sample"

$sub2 is "character string"

$

Line 4 calls substr, which copies a portion of the string stored in $string. This call specifies that ten characters are to be skipped before copying starts, and that a total of six characters are to be copied. This means that the substring sample is copied and stored in $sub1.

Line 5 is another call to substr. Here, 17 characters are skipped. Because the length field is omitted, substr copies the remaining characters in the string. This means that the substring character string is copied and stored in $sub2.

Note that lines 4 and 5 do not change the contents of $string.

String Insertion Using substr

In Listing 13.13, which you've just seen, calls to substr appear to the right of the assignment operator =. This means that the return value from substr-the extracted substring-is assigned to the variable appearing to the left of the =.

Calls to substr can appear also on the left of the assignment operator =. In this case, the portion of the string specified by substr is replaced by the value appearing to the right of the assignment operator.

The syntax for these calls to substr is basically the same as before:


substr (expr, skipchars, length) = newval;

Here, expr must be something that can be assigned to-for example, a scalar variable or an element of an array variable. skipchars represents the number of characters to skip before beginning the overwriting operation, which cannot be greater than the length of the string. length is the number of characters to be replaced by the overwriting operation. If length is not specified, the remainder of the string is replaced.

newval is the string that replaces the substring specified by skipchars and length. If newval is larger than length, the character string automatically grows to hold it, and the rest of the string is pushed aside (but not overwritten). If newval is smaller than length, the character string automatically shrinks. Basically, everything appears where it is supposed to without you having to worry about it.

NOTE
By the way, things that can be assigned to are sometimes known as lvalues, because they appear to the left of assignment statements (the l in lvalue stands for "left"). Things that appear to the right of assignment statements are, similarly, called rvalues.
This book does not use the terms lvalue and rvalue, but you might find that knowing them will prove useful when you read other books on programming languages

Listing 13.14 is an example of a program that uses substr to replace portions of a string.


Listing 13.14. A program that replaces parts of a string using substr.

1:  #!/usr/local/bin/perl

2:  

3:  $string = "Here is a sample character string";

4:  substr($string, 0, 4) = "This";

5:  substr($string, 8, 1) = "the";

6:  substr($string, 19) = "string";

7:  substr($string, -1, 1) = "g.";

8:  substr($string, 0, 0) = "Behold! ";

9:  print ("$string\n");



$ program13 14

Behold! This is the sample string.

$

This program illustrates the many ways you can use substr to replace portions of a string.

The call to substr in line 4 specifies that no characters are to be skipped before overwriting, and that four characters in the original string are to be overwritten. This means that the substring Here is replaced by This, and that the following is the new value of the string stored in $string:


This is a sample character string

Similarly, the call to substr in line 5 specifies that eight characters are to be skipped and one character is to be replaced. This means that the word a is replaced by the. Now, $string contains the following:


This is the sample character string

Note that the character string is now larger than the original, because the new substring, the, is larger than the substring it replaced.

Line 6 is an example of a call to substr that shrinks the string. Here, 19 characters are skipped, and the rest of the string is replaced by the substring string (because no length field has been specified). Now, the following is the value stored in $string:


This is the sample string

In line 7, the call to substr is passed -1 in the skipchars field and is passed 1 in the length field. This tells substr to replace the last character of the string with the substring g. (g followed by a period). $string now contains


This is the sample string.

NOTE
If substr is passed a skipchars value of -n, where n is a positive integer, substr skips to n characters from the right end of the string. For example, the following call replaces the last two characters in $string with the string hello:
substr($string, -2, 2) = "hello"

Finally, line 8 specifies that no characters are to be skipped and no characters are to be replaced. This means that the substring "Behold! " (including a trailing space) is added to the front of the existing string and that $string now contains the following:


Behold! This is the sample string.

Line 9 prints this final value of $string.

TIP
If you are a C programmer and are used to manipulating strings using pointers, note that substr with a length field of 1 can be used to simulate pointer-like behavior in Perl.
For example, you can simulate the C statement
char = *str++;
as follows in Perl:
$char = substr($str, $offset++, 1);
You'll need to define a counter variable (such as $offset) to keep track of where you are in the string. However, this is no more of a chore than remembering to initialize your C pointer variable.
You can simulate the following C statement:
*str++ = char;
by assigning values using substr in the same way:
substr($str, $offset++, 1) = $char;
You shouldn't use substr in this way unless you really have to. Perl supplies more powerful and useful tools, such as pattern matching and substitution, to get the job done more efficiently

The study Function

The study function is a special function that tells the Perl interpreter that the specified scalar variable is about to be searched many times.

The syntax for the study function is


study (scalar);

scalar is the scalar variable to be "studied." The Perl interpreter takes the value stored in the specified scalar variable and represents it in an internal format that allows faster access.

For example:


study ($myvar);

Here, the value stored in the scalar variable $myvar is about to be repeatedly searched.

You can call study for only one scalar variable at a time. Previous calls to study are superseded if study is called again.

TIP
To check whether study actually makes your program more efficient, use the function times, which displays the user and CPU times for a program or program fragment. (times is discussed earlier today.

Case Conversion Functions

Perl 5 provides functions that perform case conversion on strings. These are

The lc and uc Functions

The syntax for the lc and uc functions is


retval = lc(string);

retval = uc(string);

string is the string to be converted. retval is a copy of the string, converted to either lowercase or uppercase:


$lower = lc("aBcDe");  # $lower is assigned "abcde"

$upper = uc("aBcDe");  # $upper is assigned "ABCDE"

The lcfirst and ucfirst Functions

The syntax for the lcfirst and ucfirst functions is


retval = lcfirst(string);

retval = ucfirst(string);

string is the string whose first character is to be converted. retval is a copy of the string, with the first character converted to either lowercase or uppercase:


$lower = lcfirst("HELLO");  # $lower is assigned "hELLO"

$upper = ucfirst("hello");  # $upper is assigned "Hello"

The quotemeta Function

The quotemeta function, defined only in Perl 5, places a backslash character in front of any non-word character in a string. The following statements are equivalent:


$string = quotemeta($string);

$string =~ s/(\W)/\\$1/g;

The syntax for quotemeta is


newstring = quotemeta(oldstring);

oldstring is the string to be converted. newstring is the string with backslashes added.

quotemeta is useful when a string is to be used in a subsequent pattern-matching operation. It ensures that there are no characters in the string which are to be treated as special pattern-matching characters.

The join Function

The join function has been used many times in this book. It takes the elements of a list and converts them into a single character string.

The syntax for the join function is


join (joinstr, list);

joinstr is the character string that is to be used to glue the elements of list together.

For example:


@list = ("Here", "is", "a", "list");

$newstr = join ("::", @list);

After join is called, the value stored in $newstr becomes the following string:


Here::is::a::list

The join string, :: in this case, appears between each pair of joined elements. The most common join string is a single blank space; however, you can use any value as the join string, including the value resulting from an expression.

The sprintf Function

The sprintf function behaves like the printf function defined on Day 11, "Formatting Your Output," except that the formatted string is returned by the function instead of being written to a file. This enables you to assign the string to another variable.

The syntax for the sprintf function is


sprintf (string, fields);

string is the character string to print, and fields is a list of values to substitute into the string.

Listing 13.15 is an example that uses sprintf to build a string.


Listing 13.15. A program that uses sprintf.

1:  #!/usr/local/bin/perl

2:  

3:  $num = 26;

4:  $outstr = sprintf("%d = %x hexadecimal or %o octal\n",

5:          $num, $num, $num);

6:  print ($outstr);



$ program14_9

26 = 1a hexadecimal or 32 octal

$

Lines 4 and 5 take three copies of the value stored in $num and include them as part of a string. The field specifiers %d, %x, and %o indicate how the values are to be formatted.

%d Indicates an integer displayed in the usual decimal (base-10) format

%x Indicates an integer displayed in hexadecimal (base-16) format

%o Indicates an integer displayed in octal (base-8) format

The created string is returned by sprintf. Once it has been created, it behaves just like any other Perl character string; in particular, it can be assigned to a scalar variable, as in this example. Here, the string containing the three copies of $num is assigned to the scalar variable $outstr. Line 6 then prints this string.

NOTE
For more information on field specifiers or on how printf works, refer to Day 11, which lists the field specifiers defined and provides a description of the syntax of printf

Summary

Today, you learned about three types of built-in Perl functions: functions that handle process and program control, functions that perform mathematical operations, and functions that manipulate strings.

With the process- and program-control functions, you can start new processes, stop the current program or other processes, or temporarily halt the current program. You also can create a pipe that sends data from one of your created processes to another.

With the functions that perform mathematical operations, you can obtain the sine, cosine, and arctangent of a value. You also can calculate the natural logarithm and square root of a value, or use the value as an exponent of base e.

You also can generate random numbers and define the seed to use when generating the numbers.

Functions that search character strings include index, which searches for a substring starting from the left of a string, and rindex, which searches for a substring starting from the right of a string. You can retrieve the length of a character string using length. By using the translate operator tr in conjunction with the system variable $_, you can count the number of occurrences of a particular character or set of characters in a string. The pos function enables you to determine or set the current pattern-matching location in a string.

The function substr enables you to extract a substring from a string and use it in an expression or assignment statement. substr also can be used to replace a portion of a string or append to the front or back end of the string.

The lc and uc functions convert strings to lowercase or uppercase. To convert the first letter of a string to lowercase or uppercase, use lcfirst or ucfirst.

quotemeta places a backslash in front of every non-word character in a string.

You can create new character strings using join and sprintf. join creates a string by joining elements of a list, and sprintf builds a string using field specifiers that specify the string format.

Q&A

Q:How does Perl generate random numbers?
A:Basically, by performing arithmetic operations using very large numbers. If the numbers for these arithmetic operations are carefully chosen, a sequence of "pseudo-random" numbers can be generated by repeating the set of arithmetic operations and returning their results.
The random-number seed provided by srand supplies the initial value for one of the numbers used in the set of arithmetic operations. This ensures that the sequence of pseudo-random numbers starts with a different result each time.
Q:What programs can be called using system?
A: Any program that you can run from your terminal can be run using system.
Q:How many processes can a program create using fork?
A:Perl provides no limit on how many processes can be created at a time. However, the performance of your system will be adversely affected if you generate too many processes at once. In particular, programs that call fork and wind up in an infinite loop are sometimes called fork bombs, because they generate thousands of processes and grind your machine to an effective halt. (Your system administrator will not be pleased with you if you do this!)
Q:How can I send signals to a process without killing it?
A:The kill function actually can send any signal supported by your machine to any running process (that you can access).
Refer to the UNIX system documentation for details on the signals you can send and what their names are.
Q:What is the difference between the %d and %ld format specifiers in sprintf?
A:%ld defines a "long integer." It refers to the largest number of bits that your local machine can use to store an integer. (This is often 32 bits.) %d, on the other hand, is equivalent to your machine's standard integer format. On some machines, %ld and %d are equivalent. If you are not sure how many bits your machine uses to store integers, or you know you are going to be dealing with large numbers, it's safer to use %ld. (The same holds true for all other integer formats, such as %lx and %lo.)
Q:What is the difference between the %c and %s format specifiers in sprintf?
A:%c undoes the effect of the ord function. It converts a scalar value into the equivalent ASCII character. (Its behavior is similar to that of the chr function in Pascal.) %s treats a scalar value as a character string and inserts it into the string at the place specified.

Workshop

The Workshop provides quiz questions to help you solidify your understanding of the material covered and exercises to give you experience in using what you've learned. Try and understand the quiz and exercise answers before you go on to tomorrow's lesson.

Quiz

  1. What do these functions do?
    a.    srand
    b.    pipe
    c.    atan2
    d.    sleep
    e.    gmtime
  2. Explain the differences between fork, system, and exec.
  3. Explain the differences between wait and waitpid.
  4. How can you obtain the value of p?
  5. How can you obtain the value of the mathematical constant e?
  6. What sprintf specifiers produce the following?
    a.    A hexadecimal number
    b.    An octal number
    c.    A floating-point number in exponential format
    d.    A floating-point number in standard (fixed) format
  7. If the scalar variable $string contains abcdefgh, what do the following calls return?
    a     substr ($string, 0, 3);
    b.    substr ($string, 4);
    c.    substr ($string, -2, 2);
    d.    substr ($string, 2, 0);
  8. Assume $string contains the value abcdabcd. What value is returned by each of the following calls?
    a.    index ($string, "bc");
    b.    index ($string, "bcde");
    c.    index ($string, "bc", 1);
    d.    index ($string, "cd", 3);
    e.    rindex ($string, "bc");
  9. Assume $string contains the value abcdabcd\n (the last character being a trailing newline character). What is returned in $retval by the following?
    a.    $_ = $string; $retval = tr/ab/ab/;
    b.    $retval = length ($string);

Exercises

  1. Write a program that uses fork and waitpid to generate a total of three processes (including the program). Have each process print a line, and have the lines appear in a specified order.
  2. Write a program that reads input from a file named temp and writes it to the standard output file. Write another program that reads input from the standard output file, writes it to temp, and uses exec to call the first program.
  3. Write a program that prints the natural logarithm of the integers between 1 and 100.
  4. Write a program that computes the sum of the numbers from 1 to 10 ** n for values of n from 1 to 6. For each computed value, use times to calculate the amount of time each computation takes. Print these calculation times.
  5. Write a program that reads an integer value and prints the sine, cosine, and tangent of the value. Assume that the input value is in degrees.
  6. BUG BUSTER: What is wrong with the following program?
    #!/usr/local/bin/perl
    print ("Here is a line of output. ");
    system ("w");
    print ("Here is the rest of the line.\n");
  7. Write a program that uses index to print out the locations of the letters a, e, i, o, and u in an input line.
  8. Write a program that uses rindex to do the same thing as the one in Exercise 1.
  9. Write a program that uses substr to do the same thing as the one in Exercise 1. (Hint: This will require many calls to substr!)
  10. Write a program that uses tr to count all the occurrences of a, e, i, o, and u in an input line.
  11. Write a program that reads a number. If the number is a floating-point value, print it in exponential and fixed-point form. If the number is an integer, print it in decimal, octal, and hexadecimal form. (Hint: Recall that printf and sprintf use the same field specifiers.)
  12. BUG BUSTER: What is wrong with the following program?

#!/usr/local/bin/perl



$mystring = <STDIN>;

$lastfound = length ($mystring);

while ($lastfound != -1) {

        $lastfound = index($mystring, "xyz", $lastfound);

}