Chapter 7, "Extending HTML's Capabilities with CGI," showed you how to use CGI to increase the effectiveness of a site and gave a detailed example of how to write a CGI script. The rest of this book shows how to use CGI scripts to accomplish tasks such as setting up a chat area, providing a bulletin board, or operating an online "store." But none of these useful functions is possible if CGI isn't working.
The preceding chapter also pointed out some security risks associated with CGI. Because of these risks, some service providers have flatly refused to allow users to put their own CGI scripts on the server. But things are changing. Many Internet Service Providers (ISPs) have decided to allow CGI scripting in order to stay competitive. Some are using CGIWrap. Others are hand-checking scripts before allowing them on and others provide cgi-bin directories for each user to increase accountability and maintain some semblance of control.
The net result of ISPs allowing CGI on their machines is that many ISPs who have heretofore had only a passing familiarity with CGI (and know only that "it's dangerous") are now enabling directories for CGI and helping programmers get their scripts set up. When problems occur, a round of fingerpointing starts during which the programmer and the service provider blame each other for the fact that the script isn't working.
As more and more ISPs accommodate CGI scripts on their servers, the number of frustrated CGI installers increases. One script archive with thoroughly debugged, well-documented scripts and a good Frequently Asked Questions list (FAQ) still gets over 300 messages a day, most of them complaining, "I can't get your script to run."
This section describes what happens when the server is misconfigured. The next section describes how scripts fail. The final section describes the symptoms for each kind of failure and gives a fault-isolation procedure, which identifies the problem and shows how to fix it.
When the server sees a GET request, it has no idea whether the entity requested is supposed to be a static file or a program. Suppose that an installer puts a Perl script somewhere in the tree of directories rooted at the server's root. When the server finds the file, it recognizes the file as a text file and serves it up, as shown in Figure 8.1.
Figure 8.1: A Perl script "called" from a document directory.
The solution is to move the script from the document directory to the CGI directory. As the Webmaster, find out from the service provider the path to the CGI directory. Often it's called cgi-bin. On some machines there are cgi-bin directories set up for each virtual host.
Another way to locate the cgi-bin directory is to look in the server's srm.conf configuration file. Unless the Webmaster also happens to be the server maintainer, he or she won't be able to write to this file, but he or she can probably read it. The configuration files are located in different places depending upon the type of server and choice of the installer. On the NCSA server and its cousin, Apache, start at usr/local/etc/httpd/conf. Remember, don't change anything in these files. If your service provider has given you write-access to them, it was probably by mistake. These files are the heart and soul of the server. Once installed, they should be changed by authorized maintainers only.
Once you are in the conf directory, enter the following line from the UNIX command prompt:
grep -i cgi *.conf
This line looks for all occurrences of the word cgi in the configuration files. The -i switch makes the search case-independent, so both CGI and cgi match. Ignore any files that end in conf-dist. Those are from the original distribution set and are not used at runtime. Here's a sample of what you might see:
access.conf:<Directory /usr/local/etc/httpd/cgi-bin> srm.conf:ScriptAlias /cgi-bin/ /usr/local/etc/httpd/cgi-bin/ srm:conf:AddType application/x-httpd-cgi .cgi
The ScriptAlias directive in srm.conf tells you that files that are placed in /usr/local/etc/httpd/cgi-bin/ will appear at the URL: /cgi-bin/ on your server. If you look in that directory, you might find a program called test-cgi. If so, go to the browser and go to URL: /cgi-bin/test-cgi on your server. You see the list of environment variables output by test-cgi.
If there is more than one cgi-bin
directory, you should be able to recognize one of them: The directory
might have your server name or user ID in the path. Your service
provider may have also set up a cgi
directory for you inside the master cgi-bin
directory. Change to the cgi-bin
directory (for example, /usr/loca//etc/httpd/cgi-bin/)
and look at the contents. You might find a symbolic link xyz
pointing to, say, /users/pages/xyz/cgi-bin.
Make sure the directory is writeable by you. This directory is
the place to put your CGI scripts.
Tip |
In UNIX, you can make a tiny file that "points to" the real file. These pointers are called "symbolic links," "soft links," or "symlinks." To make a symbolic link in the current directory named myFile to a file in another directory, type: ln -s /home/smith/aFile myFile You can spot a symbolic link by doing an ls -l on the directory. Symbolic links have an 'l' in the initial position, and show the aliasing in the last field, like this: lrwxrwxrws 1 root system 29 Mar 19 19:40 wdb -> |
If you don't see a <ScriptAlias ...> directive that mentions CGI, it is possible your service provider is using access.conf or .htaccess to control where CGI scripts run. Check the srm.conf for the following directive:
AddType application/x-httpd-cgi .cgi
This directive tells the server that if a request is made that ends in .cgi, interpret that request as a request to run the program. The request will be honored if the requested program is in a directory that has been enabled for CGI. It's possible that the file extension may not be .cgi. It's also possible that the service provider has added other file extensions, like .pl (for Perl scripts) and .sh (for shell scripts). In any case, this directive says that in order to run, the file name of the script must end in the prescribed file extension.
Look in access.conf for a <Directory ...> directive that mentions a directory at or above the root of your directory. For example, if your document directory is /users/pages/xyz/, the following directive includes you:
<Directory /users/pages> . . </Directory>
Somewhere between the opening <Directory...> and the closing </Directory> find the directive Options and make sure it includes ExecCGI. It might say either ExecCGI or Options All. In either case, your document directory has been enabled for CGI.
If you can't find your directory covered by access.conf, look in your home directory for a file named .htaccess. Note that the leading dot makes the file "invisible." Use the following to see the file:
ls -a .htaccess
Tip |
Most operating systems provide a way to make a file invisible or "hidden." In UNIX, a file is hidden if the first character in its name is a period. To see hidden files, request a directory listing with the -a option: ls -a. Hidden files are always visible to the root user. |
See Chapter 17, "How to Keep Portions of the Site Private," for a full discussion of the .htaccess file. If you have a .htaccess file, check it to see if you have the Option All or Option ExecCGI directive. If you do, then the .htaccess file is the place to put your scripts.
If you don't see the AddType directive in srm.conf, check again in your .htaccess. A server can use .htaccess to tell the server to only treat the .cgi extension as "magic" if the requested document is in the right directory.
Server configuration allows server maintainers to be very precise in expressing their wishes. With this power comes the ability to make a mistake and cause CGI scripts to fail.
See http://hoohoo.ncsa.uiuc.edu/docs/tutorials/cgi.html for a full discussion on configuring the NCSA server for CGI.
Look in access.conf for a <Directory ...> directive that mentions a directory at or above your cgi-bin directory. For example, if your cgi-bin directory is /usr/local/etc/httpd/cgi-bin/xyz/, the following directive includes you:
<Directory /usr/local/etc/httpd/cgi-bin> . . . </Directory>
Somewhere between the opening <Directory... and the closing </Directory> find the directive Options and make sure it includes ExecCGI. It may say either ExecCGI or it may say Options All. In either case, your cgi-bin directory has been enabled for CGI.
Once you find your cgi-bin directory, run a script from the browser to make sure everything is working. As mentioned earlier, test-cgi is commonly available and thoroughly debugged. It's a good first start.
If test-cgi works but your scripts don't, it's time to roll up your sleeves and find out what's wrong.
If most of your experience has been on desktop computers like Macintoshes and PCs, you probably don't think about file permissions. After all, when you click a program in Windows, it runs. If you don't want someone running your programs, you don't let them on your machine.
UNIX is different. When UNIX was developed, no one had ever heard of a "personal computer." Computers were big and expensive and you had to share them with lots of other people and you wished you had a computer of your own. To meet this need to share the computers resources, the UNIX designers set up the file system to give three different kinds of access to three different groups of people-nine levels of security in all. (In most newer UNIXs, there are more sophisticated mechanisms for access control, but they aren't relevant to most Web sites.)
The three levels of access are read, write, and execute. The three groups of people are the owner, the group, and others.
Take a look at a typical UNIX file. Enter
ls -l /etc/passwd
Despite its name, this is usually not where the encrypted passwords are stored. This name has historical significance only. A typical response to the above command is
-rw-rw-r--_1_root_security_389_Feb 16 16:25_/etc/passwd
The fields that control security are right up front. The first field (a dash) says that /etc/passwd is an ordinary file and not a directory, device or something else. The next three positions describe the permissions of the owner. The owner of /etc/passwd is root, the system superuser. The owner, in this case, has permission to read and write the file. The third position determines whether the owner can execute the file as a program. /etc/passwd isn't a program, so the execute bit is turned off.
The next three permission bits apply to the file's group. In this case, the group is "security," and members of that group can read or write but not execute. The third set of permissions is "others," sometimes called "the world." /etc/passwd is said to be world-readable, because anyone can read it, although not everyone can write or execute it.
Another way to read these permission bits is as three octal (base-8) numbers. In a given set of three bits, the one on the right has a value of one, the one in the middle has a value of two and the one on the left has a value of four. If all three bits are on, the number is 4+2+1=7. Seven is the highest number expressible in a single digit in base-8, just like nine is the highest number able to be expressed in a single digit in base-10 (the decimal system).
Now read those permission bits on /etc/passwd again. The first set is 4+2+0=6. The second set is the same. The third set is 4+0+0=4. So a UNIX expert will say that /etc/passwd has permission 664.
Now issue the following commands:
cd # to return to your home directory touch foo.cgi # to make an empty file named foo.cgi ls -l foo.cgi
This last command shows the default permissions that you have. They are controlled by your umask, which was set up by the system administrator when your account was established. A typical value of the umask is 133. If your account is set up with a umask of 133, the ls command returns:
-rw-r--r-- ..... foo.cgi
Note |
The UNIX umask gets its name from the fact that it inhibits or "masks" out permission bits that should be off by default. If you enter then no bits are inhibited, and all files created in the future will have permission 777. If you enter you get the opposite effect. All bits are inhibited, and the new default permission is 000. A typical value is which gives default permission bits of 751-the owner can do anything, the group can read and execute, and the rest of the world can just execute. The system administrator will usually put a umask command in one of the files, like /etc/profile, that all users execute when they log in. You can set your own umask in your own .profile file (or in .cshrc if you use the C shell). |
Now try to execute the script. Yes, it's an empty file, but that doesn't matter for now. Type
./foo.cgi
You will get a message that says Execute permission denied. That's not surprising. You saw earlier that the owner's execute permission bit was off. Now type
chmod +x foo.cgi ls -l foo.cgi
The chmod command tells UNIX
to set the execute bits.
Tip |
The general syntax for chmod is where new-mode may be expressed in "who, what, permissions" format. For example, to add execute access to the group permissions use g+x as the new mode. The full list is given in Table 8.1. |
Category | Meaning | |
Who | User (owner) of the file | |
Group | ||
Others | ||
All | ||
What | Remove this permission | |
Add this permission | ||
Set this permission exactly | ||
Permissions | Read access | |
Write access | ||
Execute access |
There are several other permission bits which can be set with chmod, but the ones in Table 8.1 are those most commonly used on Web sites. See the man page for chmod for more details.
You can also combine symbolic mode entries, like this:
chmod a+x,g+r files
Many experienced users find it faster to specify the permission bits in octal notation. Such a user might type
chmod 751 files
to set the permission on a file to
rwxr-x--x. -rwxr-xr-x ............... foo.cgi
Now the file is executable to everyone (owner, group, and others). Just for fun, execute it:
./foo.cgi
Nothing happens (how much did you expect an empty file to do?), but there's no error message. The file is now executable.
Sometimes you will see file permissions given in instructions as octal numbers. Type
chmod 644 foo.cgi ls -l foo.cgi
and see that the file permissions go back to 644 (owner = read (4) + write (2), group and others = read (4) only).
Now type
ps -ef : grep httpd
On UNIX systems derived from the Berkeley distribution, you will need to type ps -aux | grep httpd. The ps part of this command says to list all the running processes on the machine. The output of the ps -ef command can go on for several pages. The grep httpd part says to show only those lines that mention httpd (the name used for NCSA servers and their kin). On most machines, you'll see a half-dozen or more lines that look like this:
root_11092_1_0__17:06:17_-_0:01_/usr/local/etc/apache/src/httpd nobody_12444_11092_0_17:09:54_-_0:00_/usr/local/etc/apache/src/httpd nobody_14496_11092_0_17:09:54_-_0:00_/usr/local/etc/apache/src/httpd nobody_15518_11092_0_17:09:54_-_0:00_/usr/local/etc/apache/src/httpd nobody_16040_11092_0_17:09:54_-_0:00_/usr/local/etc/apache/src/httpd
These lines say that there are five copies of the server running.
The first one (process ID 11092)
was started by user root
at 17:06:17. That copy started
four others (its process ID appears in the Parent Process ID,
which is the third column). If you had to use ps
-aux the columns will be a bit different but in either
case the column we're interested in is the first one. It says
that the servers are running under the authority of user nobody.
Not surprisingly, user nobody
has almost no authority in the system. (Remember that these servers
are going to be run by thousands of complete strangers. How much
authority do you give a stranger?)
Note |
A few service providers do not give users permission to Telnet into their account. If you are among those unfortunate few, you won't be able to run the exercise described in this section. But the file permissions discussion is still relevant to you. Make sure you are using a version of FTP that allows you to set the permission bits. Then, when you transfer the files into your cgi-bin directory, set the permissions to world-readable and world-executable (755), just as we described above. |
To execute a CGI script, the server (running with the authority of nobody) must be able to execute it. With all of this background, you're ready to do just that. Change the directory to your cgi-bin directory. For example, type
cd /usr/local/etc/httpd/cgi-bin/xyz
and look at the permissions of one of your scripts:
ls -l myScript.cgi
If it is not world-executable, change it with:
chmod +x myScript.cgi
or, if you prefer
chmod 755 myScript.cgi
Verify that the script is world-readable and world-executable. If it's not, the server will tell you that you don't have permission to execute that script when you try to access it.
Although it is less frequently a problem, note that the directory that contains the scripts must also be world-readable and world-executable. To see the permissions on a directory, change the current directory to that directory (using the cd command) and type:
ls -ld .
If "others" bits on the permissions are not r-x, change
them. If you do not have the authority to change them, contact
your system administrator.
Tip |
It is sometimes useful to be able to change the permissions of all, or nearly all of the files in a directory tree. To change all of the files, use the chmod -R option (where -R stands for "recursive." To change most of the files, build a set of tests for the find command, and use find . tests -exec chmod new-mode {} \; |
Other scripts run in your cgi-bin directory. Your script is world-readable and world-executable. But still when you run the script from the browser you get an error. Typically, the error informs you about a malformed header or returns the Internal Server Error error message.
Your server is not broken, but your script probably is. To obey HTTP, the first thing the script should send is "Content-type: text/html" followed by an empty line. In Perl, this is done like this:
print "Content-type: text/html\n\n";
To troubleshoot this problem, Telnet in to your account and run the script from the command line. In most cases you'll see a syntax error from Perl. Fix the Perl problem. Once the script runs, try it again from the browser. If it runs successfully from the command line but not from the browser, there is something wrong with the program logic; it is not sending the content-type line. Later on this chapter the section "Checking by Hand" describes how to set up the environment variables and completely mimic the actions of your browser.
Remember to check the error log of the server. If the script runs
but produces an error, that error is written to the file handle
STDERR. The server redirects
that output to the error log. You can find your error log by examining
the configuration files or by asking the system administrator.
Tip |
If the server has been configured with the default directories, the error log is at /usr/local/etc/httpd/logs/error-log. |
Here's a mistake that's easy to make and tough to spot. To understand this problem we need to understand the first line of a Perl script.
When you say to UNIX
./foo.cgi
you are saying, "Look for the file foo.cgi in my current directory, and execute it." If foo.cgi is a compiled binary file, it is loaded into memory and run. If the file is a shell script, it is turned over to the current shell (a command interpreter) and run. But if it's a Perl script, UNIX has no way of knowing. It passes the file to the shell, which quickly responds that it can't make sense of these commands.
The solution comes from an arcane bit of UNIX lore. For a whimsical description of the story, see article 47.02 in UNIX Power Tools by Peek, O'Reilly and Loukides. For a more serious look, see the man page for execve(2). If you start the very first line of a text file with #!, most popular versions of UNIX will look on that line for the name of a program to run and, optionally, a string to pass to that program. To set a file to be run by Perl, type the following:
which perl
Expect a response like
/usr/bin/perl
or possibly
/usr/local/bin/perl
In fact, enter
ls -l /usr/bin/perl
to see how Perl has been installed. Don't be surprised if it is a symbolic link to /usr/local/bin/perl.
Now you know where Perl has been installed. On the very first line of your Perl script, starting with the very first character, type #! followed by the path to Perl. If the Perl installer took the defaults during installation, this line will be:
#!/usr/local/bin/perl
Be sure to type the line exactly as described. This line is read directly by the UNIX kernel, which is a most unforgiving reader.
A sure sign that the kernel is having a problem finding the Perl interpreter is when you run the program from the command line and it responds "not found." You can see the file in an ls listing, so you know it's there. You have specified ./myScript.cgi, so you know it's not a path problem. Look at the first line. The kernel is telling you that it tried to exec the interpreter you named on that line, but that interpreter wasn't where you said it was.
Here's a tricky little problem that can become troublesome. Many users produce CGI scripts on their desktop machine (a Mac or PC), then use FTP to send the file to their server. Sometimes this process will work fine for weeks and then one day a script is transferred up to the server and fails in bizarre ways.
To understand why this problem occurs, it's necessary to understand how various operating systems terminate lines in a text file. In UNIX, the end-of-line character is a new line, also known as a linefeed. On a Mac, the end-of-line character is a carriage return. Under DOS and Windows, the end-of-line is denoted by a carriage return and a linefeed.
The FTP program supports several types of transfer. The two most common are ASCII (sometimes called text) and binary (also called image). In ASCII transfers, each line of the text is converted to a standard representation called NVT ASCII. NVT ASCII ends each line with a carriage return/linefeed. So if you send from a Macintosh to a UNIX machine, the sending FTP converts from the Mac standard to NVT ASCII and sends. The receiving machine reads the NVT ASCII and saves the file using the UNIX convention, linefeeds only. Similarly, if you send from a PC, the file is sent as NVT ASCII, and the UNIX box converts to its native format.
ASCII transfer is the default, however, what if the Webmaster inadvertently sets the transfer type to binary? (On some versions of FTP, the program attempts to "discover" whether the file is text or binary and may guess wrong.) In binary mode, no conversions are made, so the lines end up on the UNIX machine just like they started on the desktop machine. The most immediate symptom will be that the file will "look funny" in most editors. It may appear to have blank lines between the text lines, or all of the text may be on one long line. The most serious symptom is that the program will fail to execute.
To check the end-of-line characters on a file named foo.cgi on the UNIX machine, type the following:
od -c foo.cgi | more
The first part of this command invokes a dump program named od and asks for the file to be interpreted as characters. The first few lines of typical output looks like this:
0000000 # ! / u s r / l o c a l / b i n 0000020 / p e r l \n \n # n a m e o f 0000040 f i l e w h i c h c o n t 0000060 a i n s t h e o r d e r e d 0000100 l i s t o f p a g e s \n $ 0000120 t h e L i s t F i l e = " . 0000140 / t h e L i s t " ; \n \n # n a 0000160 m e o f s t r i n g w h I 0000200 c h n a m e s t h e P r e
Look closely at the characters at the end of each line. If the file is set up correctly for UNIX, they should be \n, which means newline in UNIX. If the lines are terminated with \r\n or just \r, the file won't run correctly.
The solution, of course, is to retransmit the file this time making sure that FTP is set to ASCII transfer. A workaround is to use the UNIX command tr to translate the characters to their correct format.
If the file comes from a Macintosh (each line ends in a return) type the following:
tr "\r" "\n" < foo.cgi > out.cgi mv out.cgi foo.cgi
The tr command translates the characters in the first string (a return) to the characters in the second string (a newline). The tr command reads from standard input and writes to standard output. Be careful not to name the output file the same as the input file or the file will be emptied. The second line moves the file from the temporary name we gave it back to its original name.
If the file comes from an MS-DOS computer then each line will end with a carriage return followed by a newline (\r\n). Because UNIX wants the newline, all you need to do is delete the return:
tr -d "\r" < foo.cgi > out.cgi mv out.cgi foo.cgi
The error codes and messages that the server returns can be useful in identifying the cause of a problem. Experienced Webmasters learn to associate common error codes with certain problems. Remember that the error message for a given code may vary somewhat from server to server, based on the configuration set up by the local administrator.
Recall that the 400 series of errors mean that the server thinks the client has made a mistake.
This message means that the file is protected (typically by a .htaccess file) and the user did not send the proper authorization. Most browsers interpret a 401 and display a dialog box prompting the user asking for a username and password.
The most likely explanation for this error is that the file or directory permissions do not allow read- or execute-privileges by the server. If the server is running as a non-privileged user like nobody, the CGI files and directories must be set to world-readable and world-executable.
Another explanation is that the system administrator has not configured this directory for CGI. Check the earlier process for how to confirm that the server is properly con-figured.
This message means what it sounds like. Either the script is not where you thought it was, or when the script ran it tried to access another file and it wasn't where you thought it was. If you're sure you're getting to the script, put
print "Content-type: text/html\n\n" ;
near the top of the script, load the script up with print statements so you can see how far its getting and find the reference to the URL that isn't there.
The 500-series of error codes means that the server thinks that it has made a mistake. The real culprit is almost always a script error.
An error 500 means that the header did not start with the "Content-type" line required by HTTP. Here are some things to check:
If the error log or message mentions execve,
it is almost certain that the kernel cannot find Perl. Check the
first line again.
Tip |
When debugging, if the script runs from the command line but fails when run from the browser, the problem is most likely in your environment variables (or in STDIN, if you are using POST). The script assumes something about the environment that isn't true, and it throws an error. If this happens, temporarily switch the ACTION in your script to test-cgi and rerun the script. test-cgi will report all the environment variables. Now use the output of test-cgi to compare the actual values of the environment variables with the assumptions made by the code. |
An enhanced test-cgi is available from Chris Schanzle at http://speckle.ncsl. nist.gov/~chris/test-cgi. This version dumps STDIN if the method is POST.
The most frequent culprit in this instance is that the directory is not enabled for CGI (or that the script is in the wrong directory). If you try to GET a script in such a directory, you will get the source. With POST the server knows you are trying to run the script but it has no permission to run programs in that directory.
Remember that there are two ways for the system administrator to enable CGI. If your administrator has chosen to use the ExecCGI option (with the "magic" CGI type) your file names must conform to that naming convention. Usually the required extension is .cgi. If your script is named foo.pl, try renaming it to foo.cgi.
When a script fails to execute properly from the server, it is often necessary to "run it by hand," taking control from the browser (and sometimes from the server) in order to see the results of each step. This section shows three ways of doing this.
When troubleshooting CGI scripts, experienced developers often tell neophytes to "run it from the command line." In saying this, the experienced developers mean they should use Telnet to log into their account on the server, change to their cgi-bin directory and type the name of the program. If their PATH variable is not set up to look for scripts in the current directory, they will need to preface the script name with "./" to tell the shell where the program is.
If the first line is set up to point to the Perl interpreter, Perl takes control and checks the syntax of the file. Because Perl checks the program at startup time, many kinds of errors are avoided at runtime (when the developer is not around, and the site visitor is alone with the script).
If Perl finds an error, it stops and prints the error. Sometimes one error will cause a cascade of others, so most programmers check the first error or two, then rerun the program.
Once the program is running, simple invocation becomes less useful. Most scripts ask early on:
if ($ENV{REQUEST_METHOD} eq "POST"} or if ($ENV{REQUEST_METHOD} eq "GET"} ....
Because simple invocation from the command line does not set any environment variables, the script will fail. Depending on the program, it may just exit, crash, or politely respond with an HTML message that it was not started by the preferred method. As Chapter 7, "Extending HTML's Capabilities with CGI," showed (with formmail), it is possible to set up a script so that it handles either GET or POST requests.
To set environment variables, you have to know which shell you are running. If your prompt is a dollar sign, you are running the Bourne shell, the Korn shell, or possibly BASH. They all use the same command to set environment variables. To set environment variables in any of those shells, type:
export REQUEST_METHOD=GET
Be sure to type the string just as it is shown here. Putting spaces around the equals sign will cause an error.
If your prompt is a percent sign, you are running the C shell. To set environment variables in the C shell, type:
setenv REQUEST_METHOD GET
Look over the script and see what environment variables it requires. For GET, it almost certainly needs REQUEST_METHOD, because most well-written scripts check to see if the user is calling it by GET or POST; and QUERY_STRING, because that is how the information gets to the script. Remember to encode QUERY_STRING. If you don't need escaped characters, you can say something like this:
export QUERY_STRING=name=John+T.+Smith&address=1234+Jones+Street.
To see what your page is sending, look in the URL field at the top of the page after you have attempted to access the script.
If your script expects to be run by POST, set it up this way:
export REQUEST_METHOD=POST export CONTENT_LENGTH=1024 echo "name=John+T.+Smith&address=1234+Jones+Street" | myScript.cgi
Don't worry about making CONTENT_LENGTH the exact number of characters in STDIN. Just make it large enough to handle all the characters you send it. In the same way, don't worry about sending in all the fields from a form. Send in enough to check the basic processing. If you do decide to put in all the data, save it to a file so you can save time by typing:
export REQUEST_METHOD=POST export CONTENT_LENGTH=1024 myScript.cgi < myData
Note that you don't have to keep reentering the environment variables. Once set, they stay set until you leave that shell. If you are working in your login shell, they stay around until you explicitly change them or until you log out.
In this way, the basic environment of the script is set up and you can watch it run. Put print statements in the script to make sure it's following the path you think it is. Check the results from calls to functions to make sure they are succeeding as you expect. (It's not a bad idea to leave some of those checks in the scripts to handle the response ex-plicitly.)
You can also check the scripts in the browser by printing a "Content-type" line early on. You may want to set up a standard set of HTML-related subroutines, like these from this file named html.cgi.
# =============================================================== # This subroutine takes a single input parameter and uses it as # the <TITLE> and the first-level header. # =============================================================== sub html_header { $document_title =$_[0]; print "Content-type: text/html\n\n"; print "<HTML>\n"; print "<HEAD>\n"; print "<TITLE>$document_title</TITLE>\n"; print "</HEAD>\n"; print "<BODY bgcolor=\"#CCCC99\" TEXT=\"#000000\" LINK=\"#DD0000\" VLINK=\"#009966\">\n"; print "<H2>$document_title</H2>\n"; print "<P>\n"; } sub html_trailer { print "</BODY>\n"; print "<HTML>\n"; } sub die { print "Content-type: text/html\n\n"; print "<HTML>\n"; print "<HEAD>\n"; print "<TITLE>Error</TITLE>\n"; print "</HEAD>\n"; print "<BODY bgcolor=\"#CCCC99\" TEXT=\"#000000\" LINK=\"#DD0000\" VLINK=\"#009966\">\n"; print "<H1>An Error has occured</H1>\n"; print "<P>\n"; print @_; print "\n"; print "</BODY>\n"; print "</HTML>\n"; };
Now to quickly get a script to print, put the following lines near the top:
require "html.cgi"; &html_header("Test"); and at a point just above where the script exits, add &html_trailer;
For more complex problems, consider running the script from Telnet and bypassing the browser. Suppose your server is called www.xyz.com and the server is set up to expect messages on port 80. To troubleshoot the script at /cgi-bin/foo.cgi with a query string of "This is my query", type
telnet www.xyz.com 80
Wait for the server to respond, then type
GET /cgi-bin/foo.cgi?This+is+my+query HTTP/1.0
The server runs the script, sends back the results, then closes the connection. This method has the advantage of showing the headers coming back.
To exercise a script with POST, type
telnet www.xyz.com 80
Wait for the server to respond, then type
POST /cgi-bin/foo.cgi HTTP/1.0 Content-type: text/plain Content-length: 45 name=John+T.+Smith&address=1234+Jones+Street
The server runs the foo.cgi script, sends back the result, and closes the connection. If you failed to send a Content-type line as the first line back from the script, the server will throw an error 500. The first line in the server's response shows the error code. For example:
HTTP/1.0 500 Server error Date: Mon, 12 Feb 1996 03:22:14 GMT Server: Apache/1.0.2 Content-type: text/html <HEAD><TITLE>Server Error</TITLE></HEAD> <BODY><H1>Server Error</H1> The server encountered an internal error or misconfiguration and was unable to complete your request.<P> Please contact the server administrator, morganm@dse.com and inform them of the time the error occurred, and anything you might have done that may have caused the error.<P> </BODY> Connection closed.
Running scripts from the command line or from Telnet can give insight but can be time-consuming. Various tools are emerging that simplify the process. Once such utility is CGItap, available from ScendTek Internet Corporation at http://scendtek.com/cgitap/. CGItap is a small Perl script that can run on any machine. It intercepts the dialog between the client and the server and reports the following sections:
The CGI Script Output is the raw output. Although the HTTP headers are stripped off, the remaining information will show the Content-type line if it is present.
Knowing how to run a script from the command line and from Telnet is essential for a Webmaster. For day-to-day work, a program like CGItap can be invaluable.
It is better to avoid the problems we've discussed in this chapter than to allow them to occur and then detect them. Here's a process that helps avoid most of the errors discussed in this chapter.
Configure the development machine to be as close to the live server as possible. Use the same domain names, the same configuration files and the same directory structures.
Start building CGI scripts from templates and libraries. This book emphasizes understanding the underlying mechanisms. Once you understand them, move on to an environment that does not require retyping code and permits reuse of existing designs.
During development, work from the command line. Develop shell scripts that exercise the code. Don't work too much on getting the output HTML right until the program logic is correct.
In 1976 Tom McCabe published a paper entitled "A Complexity Measure" in IEEE Transactions on Software Engineering (SE-2, No. 4, pp. 308-320) arguing that a program's complexity, as measured by its control flow, is a major factor in determining the quality of the program. The lower the complexity, the better, because developers can grasp the program and see their mistakes.
McCabe's Complexity Measure works like this:
If the routine scores five or below, it's probably simple enough.
If it scores between six and ten, think about ways to simplify
it. If its complexity metric is above ten, consider rewriting
it. It's almost doomed to be buggy, and it will probably cost
less to rewrite it than to fix it.
Tip |
Numerous software utilities are available to compute metrics like McCabe's Complexity Measure. Check out http://www.swbs.idirect.com/, which describes C-DOC from Software Blacksmiths, Inc. |
Develop a set of regression tests for each routine that exercises
each independent path of the program. Set up "scaffolding"
scripts to test each routine separately. Put each such regression
test in a shell script. Once you are satisfied with a certain
level of performance, save the results in a "golden"
file. From then on, always compare the output of the script with
the output of the "golden" test run. For example, Listing
8.1 shows a high-level scaffolding file called test01.sh.
Note |
The term "scaffolding" comes from the building construction industry. Scaffolding is used during the construction process to allow workers to reach parts of the building that would otherwise be inaccessible. Software "scaffolding" can be built as low-level routines to temporarily substitute for the real routine so that overall logic and design can be tested. Such low-level test routines are called stubs. High-level scaffolding is used to call low-level routines, in order to exercise them under controlled conditions while watching their inputs and outputs. High-level scaffolding is also known as a driver or sometimes as a test harness. In regression testing, a golden unit is one which has been checked by hand is known to be correct. Future versions of the software are likely to be correct if they produce the same output as the golden unit (and are known to be incorrect if they produce different output). |
Listing 8.1 test01.sh-A Driver That Takes the Place
of the Client and Server and Runs a CGI Script Directly
#!/bin/ksh export REQUEST_METHOD=POST export CONTENT_LENGTH=1024 /usr/local/etc/httpd/cgi-bin/xyz/myScript.cgi < test01.dat > test01.results diff test01.golden test01.results
This script sets up and runs the script myScript.cgi in the xyz project directory. It uses POST to read its input from the data file test01.dat and writes its results to test01.results. Then it compares the results of this run with the results of the "golden run" and shows any differences.
There should be a test for each path through the code. For example, every if statement generates two paths: one if the condition is true and one if it is false. Test at and near limits. If something special happens when a variable is exactly one, test with the variable set to zero, one and two.
To keep from having to test a huge number of cases, test each subroutine separately. Suppose the program runs in three steps:
Furthermore, suppose that the each module has a complexity of five. If you test a subroutine one at a time, this program can be tested with fifteen tests (or maybe a few more to cover special cases and limits). Tested as a whole, this program might have a complexity of 125, and might need between 130 and 150 tests to determine if it's still functioning correctly.
Once a module is working, check it into the configuration control system, along with the test routines, golden files, and test inputs. Put a README in the file to document the versions of any binary files like Perl. Make a rule that whenever a module is checked out, it is not checked back in on the main path without passing all regression tests. (Checking it back in on a branch is okay under certain circumstances.)
Once all the subroutines are working, integrate the whole program. Build regression tests for it, and put the subroutines all under configuration control.
Many of the projects described in this book require more than one script. Build regression tests for the whole system and put all of the scripts and their tests under configuration control.
The regression testing described in the preceding section is functional testing. Its purpose is to make sure that the software works the way it's supposed to. Another kind of testing is stress testing-throwing input at things that the software was never explicitly specified to handle. Here are some ideas for stress testing:
Keep written records of the tests so that if there's ever a question about what the system was able to do on a given date, you can document the tests from the archives.
Third, engage in load testing. Set up your test server with two
or three times the number of servers you allow and set them all
to exercise the new software. Set up all the regression tests
to run in a continuous loop. If the software has any common files
it must read or write, load testing will shake out concurrency
issues. Watch the system performance during load testing. Use
UNIX tools, like vmstat,
to see where the time is going. If your UNIX is derived from System
V, use sar to examine the
same topics. Look for hot spots in the code and think of ways
to optimize them.
Note |
When a program runs, the available time (sometimes known as wall time since it is measured by the clock on the wall, as opposed to CPU time) only goes five places:
Knowing where the time is going is the first step to speeding up a program. If most of the time is spent in the CPU, make the program more efficient, or run fewer programs, or get a faster computer. If the system is paging or swapping, consider adding real memory. If the system is bottlenecked on the disk or other I/O, consider adding more and faster resources in those areas. Most UNIX vendors have manuals and seminars on how to optimize programs running on their operating system. For general comments, look at System Performance Tuning (O'Reilly & Associates, Inc., 1990) by Mike Loukides. |
During testing, you will find "hot beds" of defects. Track the defect density by subroutine and by program. When the defect density crosses some threshold, throw out the code and rewrite it. You will find it less expensive to trash bad code than to maintain it.
Once you've made all the changes you need to so that the software performs acceptably under stress and load, go back to the beginning and run a full regression suite. Once it passes all tests, check in all the test software along with the code under test, so you can re-create the test environment at any time.
Once the product seems to work and passes the developer's tests, give it to a friendly in-house test team. Depending upon the software, the testers can be administrative staff, family members, or friends. Ask them to interact with the software and try to break it. (Twelve-year-olds are an excellent resource for these alpha test teams. They can break anything.)
Keep written records of the defects found during alpha testing, as well as recommendations from the testers for improvements. Fix the problems and run a full set of regression tests to make sure nothing broke in one part while you were fixing another part.
After you and your alpha testers are satisfied that the product works, offer it to one client at a discounted rate. You are now beginning beta testing. Make it absolutely clear that this software is going out for its first test. Give the client Customer Trouble Reports (CTRs) and make sure they know how to fill them out. Consider putting the CTR online, so visitors can report problems. Analyze the error log daily to see whether the software is malfunctioning.
Does this business of testing and retesting sound like a lot of work? It is. But it's not nearly as much work as fixing defects after the software is released.
Think of software development this way. Before the software is released, you can develop it during working hours, at your own pace and take your time to make sure everything is right. It never seems like there's enough time. The deadline always looms large. But compare that environment to fixing fielded software. Once it breaks, the customer is hesitant to trust it again and he wants it fixed now. While you're working on it, hundreds, or perhaps, thousands of people are using it, breaking it again, and getting frustrated. Your next project is languishing on the disk, slipping behind schedule because you're tied up fixing the last seven systems you shipped. Not a pretty picture. Clearly to survive in this business, a Webmaster must assemble a team of software developers who share the responsibility for specifying, designing, coding, integrating, and testing CGI-based software systems. These teams must develop, document, and improve repeatable processes, which results in shipping quality software products consistently.
In his book, Code Complete (Microsoft Press, 1993), Steve McConnell reports the results of a highly disciplined coding and testing process called "cleanroom development." He reports that "productivity for a fully checked out 80,000-line cleanroom project was 740 lines of code per work-month. The industry average rate for fully checked out code is closer to 150 lines per month." He quotes cleanroom pioneer Harlan Mills as saying that "after a team has completed three or four cleanroom development projects, it should be able to reduce the density of errors in its code by a factor of 100 and simultaneously increase its productivity by a factor of 10."
The finest programs in the world are worthless if they cannot be run. This chapter addresses problems that occur in the CGI script as well as problems that occur in the server configuration. It lists the error codes that can be returned by the server, and shows what kinds of problems cause each error.
This chapter also shows how to run CGI scripts by hand, bypassing the client and even the server so that the input and the output are both visible and controllable. The final section shows a step-by-step set of procedures that can reduce the defect rate in delivered code by a factor of ten.