This chapter is devoted to providing solutions to some of the most common bugs you'll run across when debugging CGI applications. The information in this chapter is the result of many frustrating hours spent figuring out why an obviously simple CGI script would not work. Hopefully, the information in this chapter will solve some of the problems you might face while debugging CGI scripts. The information in this chapter is based on topics covered in Chapter 20, "Introduction to Web Pages and CGI."
Basically, this chapter will cover the points to keep in mind when you write CGI scripts that have to open files or pipes or keep temporary information in files, as well issues about how to respond to client-side requests. We will also cover some security issues about writing and placing Perl scripts in directory trees. The summary at the end of the chapter will serve as a good checklist for you to use when evaluating potential problems with your Perl CGI scripts.
The CGI script on the server side must be an executable file and have execute permissions set for the file. Not having the correct permissions results in "file not found" errors sent back to the client. Not having execute permissions results in other error messages being sent back to the client.
All paths of execution in the CGI script must return a value that can be interpreted by the client. Test your CGI to make sure that the application does not exit with an unpredictable return code. If the CGI script relies on files on the server side, are they accessible? The CGI script will be running under the same permissions as the Web server. The server runs as the user called "nobody." Make sure that the data files being accessed by the CGI script have the correct read/write permissions for being accessed by "nobody."
Opening the file usually entails checking for errors before proceeding. Here's a usual call in a Perl script to open a file and read from it:
open(MFILE,"</tmp/bogus/data/file")
|| die "$!: Cannot open it!";
If the Perl script skips the error-checking portion and just goes on attempting to read from the nonexistent file handle MFILE, it will read no data. (Perl does not crash in this instance.) For example, the following lines are expected to return a GIF image back to the client:
print "Content-type:image/gif \n\n";
open(MFILE,"</tmp/bogus/data/image.file");
while (<MFILE>) {
print $_;
}
close MFILE;
This code would be called in a CGI script to return the contents of an image back to the browser whose request caused the script to be executed. If the GIF image being opened does not exist, then no data is sent back to the browser. Here are the ways to avoid such errors:
One important thing to look for are possible infinite loops. Make sure your CGI script always returns. Remember that CGI scripts are run as the result of a client request coming to your server. It's quite possible to have several clients make the same request to start off these infinitely looping CGI scripts on your server. Too many of these scripts will cause the machine running the server to lock up!
Another common mistake when writing CGI scripts is to use the error from a pipe command directly. The returned code from a pipe command is based on whether or not the pipe was opened. The returned code from the pipe command has nothing to do with the command on either side of the pipe symbols (|). For example, the following code will almost never execute the die statement because a pipe almost always can be opened on a given system:
open(MYFILE,"myprogram |") || die "Cannot run myprogram";
Instead, break this statement into two statements and then close the file handle. The relevant error code will be in the variable ($?) in case of any errors. Use the error code returned from the close call before proceeding with the second portion of the code. If there are no errors, reopen the file and continue. This procedure sounds really hokey, but it works as long as you keep two things in mind: that starting up myprogram is not inefficient and that each invocation of myprogram is totally unrelated to any previous invocations of myprogram. If these conditions are met, then you can go ahead and simply structure the code like this:
open(MYFILE,"myprogram |");
close(MYFILE);
if ($?)
{
die "\n Cannot run myprogram";
}
open(MYFILE,"myprogram |");
One way to use CGI scripts is to send mail to a recipient as part of a response to a FORM handler. For example, the following statements send a mail message using mailx:
$sendTo = "badguy@bad.code.edu";
#### NO, NO, NO, NOT this way....
open (MAILME,"| mailx -s 'A report' $sendTo")
print MAILME, $mailMessage;
#
# some other code here.
#
close MAILME;
When using lines to send mail, make sure you specify the absolute pathname to the mailer program. Also, close the pipe to the mail handler as soon as possible; otherwise you might forget to do so later in the code. Therefore, it's generally better to structure the previous lines as this:
$sendTo = "myname@ikra.com";
open (MAILME,"| /usr/bin/mailx -s 'A report' $sendto")
print MAILME, $mailMessage;
close MAILME;
#
# some other code here.
#
Of course, you should use your system's mail carrier, such as mh, or elm, instead of mailx in the previous example. Obviously, the underlying mail system has to be up and running for this to work.
Most CGI applications run on UNIX systems that require the first line of the CGI script to be a "bang" line. For example, if your CGI script is a Perl script and your Perl interpreter is in /usr/local/bin, then the first line of the file should be #!/usr/local/bin/perl. If you don't use this bang line, the default shell script is used instead, and your CGI script is run by the default shell script.
On NT systems, the Perl CGI script should not have the bang line as the first line in the script. The bang line is ignored on NT systems. The CGI script should run on the NT machine without any problems. However, when porting CGI scripts from an NT system to a UNIX system, make sure you add the #! line to the first line of every Perl CGI script.
Some Perl scripts use the Autoload.pm module to dynamically load in extensions at execution time. A Perl script will not run if the module cannot dynamically load the extensions. When porting such scripts to different systems, ensure that the extensions you have to load are available on each system you port your Perl script to. Some modules may not be available on the base system you port your CGI script to.
To avoid such problems at load time, you can either port the modules yourself, not use the module extensions, or statically link the extensions in. For example, most modules are statically linked into the NT version of Perl because the autoload module is not supported under NT. If you are certain that the modules you are loading are not dynamically linked executables and that all the functionality you need is in the .pm file, then you can simply use the use statement to load the .pm module file.
Avoid system calls as much as possible when writing CGI scripts that have to run on different systems. As its name suggests, a system call is very dependent on the type of system on which it's being called. Most versions of UNIX support system calls uniformly and do not cause any problems. Different operating systems support different types of system calls. A system call that works on a VMS system might not work on a UNIX system, and vice versa.
A common problem that results from CGI scripts is that malformed headers are sent back when a request for data arrives from a browser. Normally, a MIME header is sent from a server back to a browser. For example, to send an HTML document back, a header will be of the form Content-type: text/html \n\n, for a GIF image Content-type: image/gif \n\n, and so on. A script that has errors in it or simply does not run will not return this header to the browser.
Also, don't forget to send two new lines at the end of every header. The server expects a blank line following the MIME header, so make the header call like this:
print "Content-type: image/gif \n\n";
The \n\n construct may not work under all conditions, especially those that require an explicit carriage-return/line-feed pair. In this case you should use the construct \r\n\r\n instead of \n\n.
It's important to flush the output buffers used by CGI scripts immediately. The underlying operating system may keep output written to a file handle such as STDOUT for some time. This time may be longer than a browser expects to spend while waiting for a response. The simplest way to do this is to select the output file handle and then set the $| variable to 1.
A CGI script is the child process of the Web server running on a system. Being a child process, it cannot set its environment variables for a period longer than its own execution time. That is, any environment variables set using statements like the following will only set the value of the environment variable GEEPERS for the script while it's executing:
$ENV{'GEEPERS'} = "creepers";
The value of GEEPERS is not available to the parent (server) process, which invoked this shell script in the first place. In fact, the next time the same CGI script is run, the value of the environment variable GEEPERS will be the value set in the server, not one set previously by a client.
A possible way to track information between successive runs of a CGI script is to use an HTML FORM object to store variables. HTML FORM handling is covered in detail in Chapter 20. Basically, what you can do is store intermediate values in a TEXT object, making the TEXT box invisible. Successive calls to the CGI script update the value of the variable in the TEXT box. Of course, you can chew up disk space by saving intermediate results to disk.
There are occasions when CGI scripts use temporary files to store information. Don't forget to delete these files after your script is done. After some time, such temporary files can accumulate and use up valuable disk space. It's a good idea to exit from one point in the code by calling a subroutine and to remove all temporary files in that subroutine before exiting.
Keeping temporary files on a server also poses the problem of synchronizing the temporary file with the process that created it. Normally, the name of the temporary file is derived from the process ID of the creating application. This, in turn, means that only the process that created the file knows the filename and when to delete the file. Even if a common prefix, such as CGI, is used for all temporary filenames, processes within the same process group should not arbitrarily delete all temporary files beginning with CGI. For one thing, other CGI applications might be using the temporary files when another process deletes them. Also, there might be other unrelated processes using CGI as the prefix for their filenames.
Another common problem with CGI scripts is that beginning Webmasters forget to make the path to these scripts visible to the Web server. Most servers look in the cgi-bin subdirectory as the top of the path for a CGI script to execute. If the named file in the path does not follow the rules for the server you happen to be running, the server will pick up the script and ship it back to the browser as a text file. In most cases, this is simply an annoyance. In some cases, looking at your CGI script may give away valuable directory information to the end user at the browser.
To avoid such problems, you should edit the configuration files for the server you are running. For the ncSA server, this entails editing the srm.conf file in conf subdirectory of where you installed the distribution for the server. The ScriptAlias directive in the srm.conf file controls which directories contain server scripts. The format for the ScriptAlias directive in the srm.conf file is
ScriptAlias fakename realname
For example, the following setting will make the /home/webserver/httpd/cgi-bin/ directory look like the /cgi-bin directory to the Web server:
ScriptAlias /cgi-bin/ /home/webserver/httpd/cgi-bin/
Also, if you want to execute files at locations other than those specified in the ScriptAlias path, you can specify what file extensions are allowed with the AddType directive. For example, the following directive allows all executable scripts with .pl or .cgi to be executed:
AddType application/x-httpd-cgi .cgi
In general, use absolute pathnames to all the files your CGI script accesses. Specifying a relative pathname causes all searches using the relative pathname to be started from the "root" of the DocumentRoot. The DocumentRoot directive in the srm.conf file is the base directory from which files are searched for binary files. The benefit of using an external base starting directory is that an entire directory tree can be moved by simply moving the root of that tree. This way you do not have the agony of resetting all pathnames if all the scripts in the root of the directory change. However, the downside of this base directory path is that it makes your movable directory susceptible to hackers who can use the relative pathnames to point to their own files in place of a directory tree on a system and let your documents point to their own versions of your documents.
Finally, the configuration file access.conf has a FollowSymLinks/ directive. If this directive is enabled, a browser can be used to follow symbolic links when it's resolving pathnames to find a document. If your CGI script is accessing a file via a symbolic link, the script will not work unless this directive is set to allow the follow-up of links. Unfortunately, enabling the follow-through opens up a security hole big enough to drive a virtual bus through. If someone symbolically links a document to the /bin or /sbin directory on your system, he or she has free run of the system.
Warning |
Never put perl.exe in the httpd directories in the heat of debugging. It's a major mistake that will let users at the browser run anything on your system! Don't even symbolically link to an executable program such as perl, sh, or something similar that a user could run off the command line. |
Almost all servers append /index.html or index.html to a given URL that references a directory. Therefore, the following URLs both become http://www.ikra.com/index.html:
http://www.ikra.com/
http://www.ikra.com
Guess what happens when there is no index.html file in the directory being referenced? The server returns an FTP listing of all files in the directory! This type of exposure of your directory subtree to the world might not be what you want.
CGI scripts can often return HTML pages as responses. One of the first things you should do is to check all URLs generated in these scripts that refer to your server. Make sure that there is an index.html file in all the directories that a URL generated at your server can refer to. It's a good idea for all URLs that you generate to be absolute pathnames instead of relative pathnames.
One very important directory to place an index.html file in is the logs subdirectory in the httpd tree. Not placing an index.html file in the logs subdirectory will expose all your Web server logs.
This chapter is a synopsis of some of the problems you can run into when coding CGI scripts using Perl. I cannot possibly enumerate all the problems you might run into when debugging CGI applications; however, this checklist will help you in debugging some common problems: