The principle of abstraction means that information can be accessed in a way that isolates how data is stored from how it is accessed and used.
See also Classes, Encapsulation, Inheritance, and Polymorphism.
Alternation is the term used when a regular expression pattern chooses between two or more choices. For example, m/one|two|three/ will match if the string in $_ contains any one of the three character sequences: one, two, or three.
See also Regular Expression.
Perl has three distinctive types of quotes: single-quotes ('), double-quotes ("), and back-quotes (`). If you'd like to be a bit more explicit in quoting, you can use the alternates that are also provided: q( ) for single-quotes, qq( ) for double-quotes, and qx( ) for back-quotes. For example, q(This) is equivalent to 'This'. Perl also has an alternative mechanism that can be used to quote a lot of small single words. For example, qw(one, two, three) is equivalent to ('one', 'two', 'three').
See Pattern Anchor.
It is often very useful to create a function or variable without identifying names; these programming elements are called anonymous. You allude to them using references. For example, if you initialize $foo using $foo = { 'John' => 10, 'Karen' => 20}, then $foo becomes a reference to the anonymous hash. You access the hash entries by dereferencing $foo. For example, @{$foo}{'John'} is equal to 10.
See also Reference and Dereference.
ANSI refers to the American National Standards Institute. ANSI serves to administer and coordinate the U.S. private sector voluntary standardization system. Founded in 1918 by five engineering societies and three government agencies, the Institute remains a private, nonprofit, membership organization supported by a diverse constituency of private and public sector organizations. Its home page is http://www.ansi.org/home.html and you can find many references to the different standards there. The American National Standards Institute is located at 11 West 42nd Street, New York, NY 10036 (Telephone: (212) 642-4900; Telefax: (212) 398-0023).
See ASCII.
See Parameter.
An array is a collection of values stored as a unit. I think of an array in the same way that I think of a list because both are composed of many things. An array can be composed of numbers, strings, hashes, or even other arrays. A basic array assignment looks like this: @array = (1, 2, 'Three', 4);.
See Context (Array & Scalar).
A range is a shorthand method for generating consecutive elements of an array. For example, @array = (1..6) is equivalent to @array = (1, 2, 3, 4, 5, 6). You can also create letter ranges-Perl will automatically generate the missing letters. For example, @array = ('AA'..'AD') is equivalent to @array = ('AA', 'AB', 'AC', 'AD').
A slice is a shorthand method for specifying specific elements of an array. Instead of specifying one index inside the square brackets, you can specify multiple indexes. You can either assign the result of a slice of another array variable or assign new values to the specified elements. For example, @array[0, 6] refers to the 1st and 7th elements in the array. @array[0..4] refers to the elements from 0 to 4-five in all. Slice assignments look like this: @array[0..2] = @foo; or @array[0..2] = ('one', $two, 'three');.
A splice is a way to modify an array variable to add, delete, or replace elements. See the description of the splice() function in Appendix C, "Function List."
ASCII is a bit-mapped character set standard for interchanging text encoded with 7-bits in an 8-bit byte. The ASCII standard was created by the American National Standards Institute (ANSI). Each character maps directly to a number from 0 to 127. For example, the letter "A" is numbered 65 and "a" is numbered 97. Generally, these numbers are displayed in hexadecimal format. For example, the letter "A" is 0x41 and "a" is 0x61. While ASCII is satisfactory for displaying the English language, it is not considered adequate for non-English languages because the 128 character choice is too limiting. For instance, many European langugaes use accented characters which ASCII can't easily handle.
See also ANSI.
An assignment statement stores a value into a variable. For example, $foo = 4 stores the value of 4 into the $foo variable. The left side of the statement must be a value-something that ultimately will resolve to a memory location where the storage will take place.
An associative array-also called a hash-uses strings as indexes instead of numbers; for example, $hash{'david'} or $hash{'Larry Wall'}. Note that hashes use curly brackets around the index while arrays use square brackets.
Every Perl operator and function has a tendency to favor its left or right when looking for operands. If the operator looks left first-like the string concatenation operator-then it has left-associativity. If it looks right first-like the minus sign-then it has right-associativity.
awk is a UNIX-based utility that scans input lines for a specific pattern. Perl has most, if not all, of the abilities of awk.
See also Pattern.
Backtracking happens when the internal routines that perform pattern matching head down the wrong path when looking for a pattern. Since the current path-the set of characters being searched-is wrong, a new path needs to be found. Because this process is internal to Perl, you don't need to worry about the details. If you want to know more, please see the documentation that came with your Perl distribution.
See also Regular Expression.
When using files, you can use either binary mode or text mode. Binary mode means that Perl will not change your input or output in any way. By the way, this is my preferred mode of operation. Text mode-only available on some operating systems like Windows 95 and Windows NT-will convert newline/carriage return character pairs into a single newline. It will also interpret any byte that has a value of 26 as the end-of-file marker.
Bitwise operators view values at the bit level. Usually, Perl
looks at the entire value. However, bitwise operators will see
a value of 15 as a series of ones and zeros.
Chapter 4 "Operators," talks about bitwise operators
and how they can be used.
A block of code is a series of statements surrounded by curly braces. Code blocks can be viewed as single-pass loops. Using the my() function inside a code block will create a variable local to that block.
See also Scope.
Many functions need information before they can do their work. This information is given to functions in the form of parameters. For example, in the function call foo('one', 'two'), the strings 'one' and 'two' are parameters. When parameters are passed to the function by reference, the function can modify the parameters and the change can be seen by the calling function or program. For example, foo(\$result) passes a reference to the $result variable into the foo() function. Inside the function, the reference can be dereferenced to get at and modify the value of $result.
See also Call by Value.
Many functions need information before they can do their work. This information is given to functions in the form of parameters. For example, in the function call foo('one', 'two'), the strings 'one' and 'two' are parameters. When parameters are passed to the function by value, changes to the value inside the function are not seen outside the function.
See also Call by Reference.
A character class-used in pattern matching-defines a type of character. The character class [0123456789] defines the class of decimal digits. And [0-9a-f] defines the class of hexadecimal digits. Chapter 10, "Regular Expressions," discusses character class in detail.
See Regular Expression.
Some operating systems-such as UNIX-let your program create clones of itself using the fork() function. These clones are called child processes or subprocesses. Child processes are frequently used by server processes. For example, you might fork a process (create a child process) to handle multiple request on a single socket.
A class is a combination of variables and functions designed to emulate an object. An object can be anything you want it to be-a pen, an ATM machine, a car, whatever. The class's variables (also called properties) and functions (also called methods) are the computer's way of modeling the object. See Chapter 14, "What Are Objects?" for more information.
See also Encapsulation, Inheritance, and Polymorphism.
Client/Server is a buzzword that is past its prime. Use object-oriented or rad, instead. Seriously though, C/S refers to the concept of splitting the workload for a given task. Typically, the work is broken into user-interface tasks (like presenting information and inputting information) and back-end tasks (querying databases, printing reports, and sorting information). A standard C/S Internet application would use a web browser for the client and a cgi-enabled Web server as the server.
Perl has several options you can control when invoking your Perl script. They are called command-line options because you add them to the command that invokes Perl. For example, in the command perl -w test.pl, the -w is a command-line option which causes Perl to display messages about questionable code. Chapter 17, "Using Command-Line Options," has a description of all of the available options.
A compiler reads your program code and converts it into another form-typically, a language that your CPU can directly understand. The secondary form is sometimes written to disk in the form of an executable file; however, this is not always the case. In fact, Perl does not currently create executable files-although some people are researching this topic.
See also Interpreter.
The errors caught during the compilation phase are called compile-time errors. When the compiler converts a program to an internal format, it checks for syntax errors and, if the -w option is turned on, questionable coding practices.
Concatenation consists of taking two things and sticking them together. The operation is frequently used with strings. In fact, Perl has its own concatenation operator-the period; for example, 'one' . 'two' is equivalent to 'onetwo'.
In programming circles, a constant is a value that doesn't change. Constants are very similar to variables because both use a name to refer to a memory location that holds a value. The exception is that, with constants, that value can't change; with variables, it can. Normally, trying to change a constant would generate a compile-time error. Unfortunately, Perl does not have true constants, but you can emulate them by initializing a variable and then never assigning a second value to it. Some programmers like to emulate constants by using a function to return a value. This works, but it is very, very slow.
Classes use constructor functions to create an object. This is usually done by creating an anonymous hash and storing the classes properties inside the hash as entries. Most constructor functions are named new().
See also Classes and Deconstructor.
Sometimes you can control the type of value-either array or scalar-that
is returned from a function. If you place parentheses around the
function call, the return value will be placed in an array (of
course, it might only be a one-element array). Function calls
that are themselves parameters to another function are usually
evaluated in an array context also. You can use the scalar() function
to create a scalar context. This is valuable when determining
the size of an array. For example, scalar(@array) will return
the number of elements in @array.
Note |
Functions can use the wantarray() function to determine their own calling context. Appendix C, "Function List," has an example that uses the wantarray() function. |
Control characters are characters that control devices-like the display. For example, displaying the value 7 usually causes a beep to sound. The control values map directly onto the English alphabet. Therefore, a value of 7 is Control G-also written as Ctrl+G or ^G.
CR is the abbreviation for carriage return. A CR is represented by \r in strings. The carriage return can also be referred to as Ctrl+J, ^J, 0x0a, or as an ASCII value of 10.
See also ASCII and Control Characters.
A database is a grouping of related information. For example, your book collection might be one database and your stamp collection might be another. Each book or stamp would typically have its own record that contains information specific to that particular item. Records are broken into fields of information. For example, a book's title and the author's name might be fields in the records of the book collection.
The data type is simply the type of information that a variable holds. Perl has four main data types: scalars, arrays, associative arrays or hashes, and references.
See also Scalars, Arrays, Hashes.
Perl has a feature that lets you step line-by-line through your programs. This feature is called a debugger because it is generally used to find logic errors or bugs in your programs. Chapter 16, "Debugging Perl," shows how to use the debugger.
A declaration tells Perl that you want to use a variable. Most languages require you to declare the variables that you intend to use. This enables the compiler to perform some optimizations and perhaps see if you use a variable incorrectly. Perl does not require and does not have any declaration statement-the closest thing is the my() function.
Deconstructor functions are used by classes to clean up after you are done with an object. You might need to close a socket or file, or to write some log messages. All deconstructor functions are named DESTROY().
See also Classes and Constructor.
A defined variable is one that has been initialized with a value.
A delimiter is used to tell when one thing ends and another begins. Delimiters are widely used in text-based databases to separate one field from another. For example, in the string "one:two:three" the colon is the delimiter. You can break a string into components based on a delimiter using the split() function; you can put the string back together again using the join() function.
A reference is a scalar that points to a value. The act of dereferencing means to follow the link to arrive at the value. For example, you can create a reference with the following $foo = \10;. This makes $foo a reference to an anonymous literal value of 10. Printing $foo prints the value of the reference. To get at the value, you need to dereference $foo like this ${$foo}. The symbol in front of the curly brace depends on the type of reference. Use $ for scalars, @ for arrays, and % for hashes.
See also Reference.
You use detail lines to display information about individual items in reports. Reports can also have header, footer, subtotal, and total lines. Chapter 11, "Creating Reports," has examples of how to prepare reports.
The diamond operator (<>) is used to read a line of input from a file. Some operating systems, like UNIX, can use the diamond operator to read from sockets. Chapter 9 "Using Files," has examples that use the diamond operator.
Directories are used by operating systems and users to group files in a hierarchical or tree fashion. See your system's documentation for more information.
All Internet servers have an Internet Protocol (IP) address that consists of four numbers connected by dots. For example, 207.3.100.98 is the IP address of my personal server. Please don't try connecting to it though; my IP address changes every day.
Empty strings have no characters and have a length and value of zero. They are literally represented by "". Empty arrays have no elements and are literally represented by ( ). Empty hashes have no entries and are literally represented by { }. If you have a variable that contains a large string, you can free up or release memory by assigning the empty string to it. You can use the same technique to release memory used by arrays and hashes.
Encapsulation means that the information about an object (its properties) and functions that manipulate that information (its methods) are stored together.
See also Abstraction, Classes, Inheritance, and Polymorphism.
Encryption is the act of changing plain text into text which is not readable. Encryption enables you to store text while ensuring that it is safe from prying eyes.
See Infinite Loop.
Environment variables are stored by the operating system. You can change and/or add environment variables on a per-process basis. Any changes made to environment variables will be passed on to child processes, but, when your process ends, the changes go away.
EOF stands for end-of-file. UNIX uses a character value of 4 to represent the end-of-file, and DOS/Windows uses a value of 26. These end-of-file values are ignored in binary mode.
See also Binary Mode.
In Perl, some letters and characters can have more than one meaning depending on the situation in which they are used. The period could mean to match any character in a regular expression or it could simply be needed to represent a period. You can force Perl to use a literal context by placing a slash (\) in front of the character to create an escape sequence. For example, \. means that a regular period should be matched in a regular expression pattern. This simple definition is complicated by the fact that some escape sequences have meanings all their own. For example, \t indicates the tab character. See Table 2.1 in Chapter 2 "Numeric and String Literals," for a list of all of the special escape sequences.
An expression is one or more operands connected by one or more operators. The operands can be either literal values, variables, or functions. For example, $foo is an expression. $foo + (34 * bar()) is also an expression. Expressions can be arbitrarily complex.
See also Statement.
FF is the abbreviation for form feed or page eject. This character is typically sent to a printer to force a page ejection. An FF is represented by \f in strings. The form feed can also be referred to as Ctrl+L, ^L, 0x0b, or as an ASCII value of 12.
See also ASCII and Control Characters.
See Database.
You use a filehandle to let your program access files. It is essentially a pointer to an internal data structure maintained by the operating system. Perl naming conventions indicate that all filehandles should have names that use all capitals.
You use footer lines to display information at the bottom of the page in reports. Reports can also have header, detail-line, subtotal, and total lines. See Chapter 11, "Creating Reports," for more information.
You use formats to control a report's appearance. You can specify both the static text and the variables that will be displayed in the report. Chapter 11, "Creating Reports," shows you how to create reports.
ftp is an abbreviation for File Transfer Protocol. This protocol is used on the Internet to transfer files between two computers.
See Procedure.
You use globbing (what a funny word!) to expand a file specification into a list of matching files. For example, *.pl might be matched by test.pl and foo.pl. Use the glob() function to do your globbing.
Regular expressions are normally greedy-they try to find the longest sequence of characters that match a given pattern. For example, if you use "qqBqqBqqB" as your search space and /(qqB)+/ as your pattern, there are three matching possibilities. They are "qqB", "qqBqqB", and "qqBqqBqqB". Perl will find the longest matching string, so $& will be equal to "qqBqqBqqB". You can reverse this behavior by adding a ? to the pattern. For example, /(qqB)+?/ will match "qqB". Don't use the * meta-character with the ? meta-character because it will always match the empty string.
See also Regular Expression.
You use this utility to search files for patterns.
See Associative Array.
Header lines are used to display information at the top of a report's page. Reports can also have footer, detail-line, subtotal, and total lines. Chapter 11, "Creating Reports," shows you how to create headers for your reports.
You use a here document to specify input to a variable or function.
It is typically used with the print()
function. An example will explain better than words:
print <<"_END_"; This is the first line of output. The value of \$foo is $foo. This is the third line of output. _END_ print("This is the fourth line of output\n");
The syntax for here documents is both freeform and rigid. The ending label must be immediately to the right of the << symbol and must be enclosed in quotes. The ending label-after the document-must be by itself on a line and at the beginning of the line.
Here, documents are useful if you need to output a lot of lines at one time.
Hexadecimal refers to numbers using base 16.
See Endless Loop.
This is an object-oriented term that means that one object inherits properties and methods from another object in a parent-child relationship.
See also Abstraction, Classes, Encapsulation, and Polymorphism.
Initialization is the act of assigning a value to a variable for the first time or it can also be a series of actions taken to create a situation. For example, the initialization phase of a socket program would include getting the protocol and port number, determining the remote server's address, and creating and binding a socket.
Interpolation means the replacement of a variable name with its value. For example, if $foo equals "dinner" then "big $foo" is equal to "big dinner".
An interpreter executes your program without first creating an executable file. It interprets your program into the language of the CPU, on-the-fly. Compilers and interpreters do a lot of the same work. However, since interpreters can't create executable files, the source code must always be available to users.
See also Compiler.
You use inter-process communication, or IPC, when two or more processes need to communicate. The communication can take place using databases, shared memory, semaphores, or sockets.
I/O is an abbreviation for Input/Output.
See Inter-process Communication.
Each entry in a hash is a key-value pair. The key is used as the index to retrieve the value.
You use labels to mark locations in your program to which you need to return. Typically, you label the outer loop in a nested series of loops so that you can jump out of the inner loops if needed.
LF is the abbreviation for linefeed or newline. An LF is represented by \n in strings. The linefeed can also be referred to as Ctrl+M, ^M, 0x0d, or as an ASCII value of 13.
See also ASCII and Control Characters.
A library is a file that groups related functions together. Libraries are loaded into your program using the require compiler directive. Chapter 15, "Perl Modules," talks a little bit about libraries.
See Array.
A literal is a value that is represented "as is" in your source code. There are four types of Perl literals: Number, Strings, Arrays, and Hashes. Chapter 2 "Numeric and String Literals," shows many examples of literals.
A loop is a series of statements that are executed more than once. Each loop has a control mechanism to stop looping. Chapter 7 "Control Statements," discusses the different types of looping and controls that are used.
See also Endless Loop.
Meta characters are characters that have more than one meaning inside regular expressions. Chapter 10, "Regular Expressions," has an in-depth discussion of meta-characters.
See also Regular Expressions.
A module is a file that holds a related group of functions-such as a library. However, modules are a bit more complex. Modules can control which function and variable names get exported from the module namespace into the main namespace. See Chapter 15, "Perl Modules," for more information.
Namespaces are used to segregate function and variable names. Each data type has its own namespace. This means that you can use the same variable name for different data types. For example, $foo, @foo, and %foo are different data types with the same name. You can create your own namespace with the Package keyword. See Chapter 14, "What Are Objects?" for more information.
See Class.
Octal refers to numbers using base 8.
The operators in a computer language tell the computer what actions to perform. For example, the plus sign (+) is an operator.
Some functions need outside information before they can perform their tasks. The outside information is called a parameter. For example, the print() function needs to know what it should print and where.
Polymorphism is a term from the object-oriented world. It means that a child class can redefine a method already defined in the parent class. Chapter 14, "What Are Objects?" discusses polymorphism.
A port is the address of a socket on an Internet server. In addition to the server address, each socket also needs a port number. The port number is added to the end of the server address to create a full address. For example, www.locked.com:80 is a full Internet address that specifies a port number of 80.
Every Perl operator and function has an associated priority. This priority or precedence level tells Perl which operators should be evaluated first. Chapter 4 "Operators," lists all of the operators and their priorities.
Functions, procedures, routines, and subroutines are all basically the same thing-a set of statements that are grouped together for a common cause. If you like to be picky, functions are routines that return values while subroutines don't return values. Procedure is the generic name used to refer to both functions and subroutines.
A protocol is a set of agreed-upon commands and responses. The Internet has a plethora of protocols that you can use. See Chapter 22, "Internet Resources," for information about how to find more information.
See Array Range.
See Database.
A reference is a scalar value that points to a memory location that holds some type of data. See Chapter 8 "References," for more information.
A Regular Expression is used to find patterns in strings. See Chapter 10, "Regular Expressions," for more information.
All Perl functions return a value when they are finished. The return value is the value of the last executed statement or you can use the return() to explicitly state it. You may always choose to ignore the return value by not assigning the function call to a variable.
Run-time errors happen while your program is executing. Run-time errors are logic errors and therefore usually harder to track down than compile-time errors.
A scalar variable can hold one string or number value at a time. Chapter 3 "Variables," shows you how scalars can be used.
See Context (Array & Scalar).
Normal Perl variables can be used by any function and therefore are said to have a global visibility or scope. You can create variables that are local to a particular function or block of code with the my() function. These variables have a local scope.
The && and || operators are considered short-circuit operators because the second operand might not be evaluated. For example, in the statement 0 && die(); the die() function will not be executed. However, in the statement 0 || die(); the die() function will be executed.
A signal is a message sent to your program by the operating system. When a signal is received by your program, it interrupts the normal flow of execution. If you don't have a signal handler function defined, default internal functions will be called. See Chapter 13, "Handling Errors and Signals," for more information.
See Array Slice.
A socket is the end link of a connection between two computers. The first step to using any of the Internet protocols is to create a connection to another computer using the socket functions. Then, you can send and receive information over the sockets. See Chapter 18, "Using Internet Protocols," for more information.
See Array Splice.
A stack is a data structure that has the same properties as a stack of potato chips in a Pringles can. Only the top chip is accessible. And, therefore, two operations are possible: add a chip or remove a chip. A stack works exactly the same way. You can push a new item onto the stack or you can pop an item off the stack.
A statement is an expression with a semicolon at the end. The semicolon transforms an expression into an executable statement.
STDERR, STDIN, and STDOUT are predefined filehandles that every program can use. You use STDERR to display error messages, usually on the computer's monitor. You use STDIN to get input, usually from the keyboard. And you use STDOUT to display messages, usually on the computer's monitor.
See Procedure.
When using files, you can use either binary mode or text mode. Binary mode means that Perl will not change your input or output in any way. This is my preferred mode of operation, by the way. Text mode-only available on some operating systems like Windows 95 and Windows NT-will convert newline/carriage return character pairs into a single newline. It will also interpret any byte that has a value of 26 as the end-of-file marker.
The undefined value (undef) can be returned by functions to indicate an error condition. It is also the value returned when a nonexistent hash entry is accessed.
A variable is a changeable piece of information used in computer programs. Typically, variables have a name and a data type. Perl variables can be scalars, arrays, or hashes. Every variable has a life-cycle. It gets created, used, and is then destroyed. Regular Perl variables are created when they are initialized and destroyed when the program ends. The my() function can create a variable that only exists inside a function or code block.
Whitespace is a term that refers to space, tab, and newline characters. These characters create white space on a page when printed. You can use the \s symbolic character class in patterns to match whitespace characters.