Chapter 22

Using HTML FORMs with Perl CGI Scripts


CONTENTS


This chapter covers the use of Perl with HTML forms. The topics include collecting information from an HTML FORM and responding to the requested information. I cover two ways of querying information from an HTML script: using the GET and POST methods. I also cover how to acquire and then parse data in the Common Gateway Interface (CGI) script in order to get responses back to the browser. The information presented in this chapter can easily be expanded to cover a whole book. There are many different ways of handling CGI scripts, FORMs, and developing client/server applications, and just as many texts to cover them. A list of references is provided here if you want more information:

For more information via printed textbooks, you might want to consult these titles:

Input and Output with CGI

If you have used a Web browser, then you have come across HTML pages, which allow you to query databases for information. Click a button and-voilà-you get the latest weather in Colorado. Just enter a date and destination and you can click a button to get the travel information you need. What's going on behind the page? Well, the chances are very high that the information handler behind the Web page is a Perl script. Perl's power and ease of handling make it a good choice for setting up support code for Web pages.

Before I begin, remember that a CGI script does not have to be written in Perl, but the ease and convenience of handling strings makes Perl a very comfortable choice. Because this book is about Perl, it won't take a wild guess to figure out which language I cover in this chapter. However, you certainly can write CGI scripts in any language you like-tcl/Tk, C, C++, or (gasp) even in Assembler.

I'll go over a few points about the terminology in this chapter before I get into the code. An HTML page is picked up and displayed by a browser on the request of a user using that browser. The information handling scripts and executables for that page are handled by the server to which the HTML page's Uniform Resource Locator (URL) points. The server gets a request for action from the browser when the user selects the URL. The request is processed by the server using the CGI, and the results of the CGI executable are sent back to the browser, which in turn displays them to the user. When describing the code that handles the requests, it's easy to use the word user instead of browser. However, as far as the CGI script on the server is concerned, it's sending results back to whoever or whatever called it. It's easy to get the words mixed up, but the intent of both words is to imply the entity that invoked the CGI script in the first place.

I introduced you briefly to CGI in Chapter 20, "An Introduction to Web Pages and CGI." In this chapter, I cover how the methods for CGI are implemented in HTML forms. I use the test-cgi.pl shell script (presented earlier) as the basis for setting up shell scripts for returning data in response to a request. Listing 22.1 presents a Perl script to echo CGI environment variables.


Listing 22.1. Perl script to echo CGI environment variables.
 1 #!/usr/bin/perl
 2 #
 3 # The sample script file to echo back ENV
 4 # variables on call from an HTML document.
 5 #
 6 $|=1;            # Flush output immediately.
 7 print "Content-Type: text/plain\r\n";
 8 print "Yet Another CGI/1.0 Test Script\r\n";
 9
10     $count = ($#ARGV + 1);
11     print "Argument Count: $count";
12     foreach $word (@ARGV) {
13         print "\n $word";
14
15 print "\n";
16 #
17 print "SERVER_SOFTWARE = $ENV{'SERVER_SOFTWARE'}\n";
18 print "SERVER_NAME = $ENV{'SERVER_NAME'}\n";
19 print "GATEWAY_INTERFACE = $ENV{'GATEWAY_INTERFACE'}\n";
20 print "SERVER_PROTOCOL = $ENV{'SERVER_PROTOCOL'}\n";
21 print "SERVER_PORT = $ENV{'SERVER_PORT'}\n";
22 print "SERVER_ROOT = $ENV{'SERVER_ROOT'}\n";
23 print "REQUEST_METHOD = $ENV{'REQUEST_METHOD'}\n";
24 print "HTTP_AccEPT = $ENV{'HTTP_AccEPT'}\n";
25 print "PATH_INFO = $ENV{'PATH_INFO'}\n";
26 print "PATH = $ENV{'PATH'}\n";
27 print "PATH_TRANSLATED = $ENV{'PATH_TRANSLATED'}\n";
28 print "SCRIPT_NAME = $ENV{'SCRIPT_NAME'}\n";
29 print "QUERY_STRING = $ENV{'QUERY_STRING'}\n";
30 print "QUERY_STRING_UNESCAPED = $ENV{'QUERY_STRING_UNESCAPED'}\n";
31 print "REMOTE_HOST = $ENV{'REMOTE_HOST'}\n";
32 print "REMOTE_IDENT = $ENV{'REMOTE_IDENT'}\n";
33 print "REMOTE_ADDR = $ENV{'REMOTE_ADDR'}\n";
34 print "REMOTE_USER = $ENV{'REMOTE_USER'}\n";
35 print "AUTH_TYPE = $ENV{'AUTH_TYPE'}\n";
36 print "CONTENT_TYPE = $ENV{'CONTENT_TYPE'}\n";
37 print "CONTENT_LENGTH = $ENV{'CONTENT_LENGTH'}\n";
38 print "DOCUMENT_ROOT = $ENV{'DOCUMENT_ROOT'}\n";
39 print "DOCUMENT_URI = $ENV{'DOCUMENT_URI'}\n";
40 print "DOCUMENT_NAME = $ENV{'DOCUMENT_NAME'}\n";
41 print "DATE_LOCAL = $ENV{'DATE_LOCAL'}\n";
42 print "DATE_GMT = $ENV{'DATE_GMT'}\n";
43 print "LAST_MODIFIED = $ENV{'LAST_MODIFIED'}\n";

I'll examine only the Perl scripting features that apply to CGI. Basically, CGI scripts are executed by the server in response to a request or action by the URL referenced in the HTML document being viewed. For example, a URL refers to this document as follows:

<A
HREF="http://ikra.com/cgi-bin/test-cgi?Its+de+a+vu+all+over+again" >
Click me for an echo.
</A>

The output from this script is as follows. I have truncated it to save space.

Yet Another CGI/1.0 Test Script

Argument Count: 6
 Its
 deja
 vu
 all
 over
 again
SERVER_SOFTWARE = ncSA/1.4.2
SERVER_NAME = pop.ikra.com
GATEWAY_INTERFACE = CGI/1.1
SERVER_PROTOCOL = HTTP/1.0
SERVER_PORT = 80
SERVER_ROOT =
REQUEST_METHOD = GET
HTTP_AccEPT = */*, image/gif, image/x-xbitmap, image/jpeg
PATH_INFO =
PATH = /bin:/usr/bin:/usr/ucb:/usr/bsd:/usr/local/bin
PATH_TRANSLATED =
SCRIPT_NAME = /cgi-bin/test-cgi
QUERY_STRING = Its+deja+vu+all+over+again
QUERY_STRING_UNESCAPED =
REMOTE_HOST = pop.ikra.com
REMOTE_IDENT =

The first action is to reply to the server that text is being sent back. This is done with the following statement:

print "Content-Type: text/plain\n\n";

Examine this test-cgi.pl Perl script and its associated URL in more detail. Notice how the arguments are being passed into the Perl script. Okay, so I said Its instead of It's, because I did not want to escape the single quote (') between the t and s.

HREF="http://ikra.com/cgi-bin/test-cgi?Its+de+a+vu+all+over+again"

The script being referred to in this URL is the test-cgi file on the node ikra.com in the subdirectory cgi-bin of the http root directory. The arguments being passed into this script appear after the question mark (?). Each argument is separated by a plus sign (+).

The number of arguments, therefore, is six. The string is the now famous saying that is widely attributed to Yogi Berra, "It's déjà vu all over again." Now let's see how the shell script handles this quip.

The first line to look at is the one in which $| is set to 1. The $| variable is a special variable in Perl. When the $| variable is set to a non-zero value, Perl forces a flush to the current output channel. When you are working with CGI applications, it's important to keep in mind that a quick response will win you praise. Don't wait for the channel to flush input back to the caller because the buffering on your output might cause the client's browser to wait for input for so long that a timeout is triggered.

The next line is absolutely necessary and should be printed back to the browser regardless of how the shell script runs. This line tells the client what type of data you are sending back. In this example, plain text is sent back; it's important to let the browser know about it. This is done by sending back the MIME content identifier:

print "Content-Type: text/plain\n\n";

It's nice to know what the returned output is; you can print it out with this line:

print "Yet Another CGI/1.0 Test Script\n\n";

Next, all the arguments are printed out back to the browser with the following lines:

$count = ($#ARGV + 1);
print "Argument Count: $count";
foreach $word (@ARGV) {
    print "\n $word";
}

The environment variable QUERY_STRING has the arguments to this shell script in the form of Its+deja+vu+all+over+again. In order to parse this string into individual arguments, you have to split the array where there is a plus sign. This is easily done with the following line (which is not in Listing 22.1):

@keywords = split('+', $ENV{QUERY_STRING});

Each element of the @keywords array will be assigned an argument. That is, the array will look like this:

@keywords = ("Its", "deja", "vu", "all", "over", "again");

Now you can use these keywords to index into an external database and return an appropriate response.

What Are GET and POST?

There are two HTTP methods for getting data to a CGI script using an HTML page: GET and POST. The main difference between the two methods of sending data is in the form of a query string to a CGI script. In the GET method, the query string is appended to the URL of the CGI program that will be handling the request. Within the CGI script, the query string will be set to the environment variable QUERY_STRING. In the case of a POST, the browser collects the information from a FORM and presents the data to the CGI script via its standard input. The main advantage of using a POST request over a GET request is that POST requests can be longer than the maximum allowed length (usually 256) for an environment variable.

The GET method can be used without having to encode a FORM because all you have to do is append the query string to the calling program's URL and send the resulting string to the CGI program. For example, you could define an anchor tag like this:

<A HREF="/cgi-bin/summer.pl?name=Kamran%20Husain&y=3"> CGI Sample</A>

This anchor tag will send a GET request to the program summer.pl. The summer.pl program in turn will get the string "name=Kamran%20Husain&y=3" in its environment variable called QUERY_STRING. Note that the question mark (?) in the constructed query string separates the path of CGI script from the parameters to be passed in the QUERY_STRING.

Note that the %20 in the name assignment corresponds to the ASCII representation for a space, a hex 20. Spaces and special characters are not permitted in the query string, and so they have to be converted to their ASCII representations. Here's a Perl statement to convert a given string into an encoded query string:

$query ~= s/(\W)/sprintf("%%%x",ord($1))/eg;

The substitution operator finds all the items that are not words with the \W construct. The parentheses around the match (\W) allow this match to be referenced in the substituted string. The matched word is then replaced by its hex equivalent by evaluating the sprintf statement, as specified by the -e flag. The sprintf command simply replaces each matched string $1 with a percent sign followed by its ordinal value. The substitution is done on the entire string by specifying the -g flag.

So, what's going to be the major difference in the way you are going to handle the incoming data in your Perl script? When handling a GET request, you are responding to data in the QUERY_STRING environment variable. When handling the POST request, your Perl script will have to read from STDIN, the default input file handle. In a CGI script, the environment variable REQUEST_METHOD will be set to either GET or POST depending on how the FORM was defined. A FORM can be defined to either method in the <FORM> tag with the METHOD attribute. To use the GET method for a CGI script handleIt.pl, you would use the following statement:

<FORM ACTION="/cgi-bin/handleIt.pl" METHOD="GET">

For using the GET method for the same CGI application and FORM, you would use the following statement:

<FORM ACTION="/cgi-bin/handleIt.pl" METHOD="POST">

The CGI application you specify in the ACTION attribute of a FORM is called whenever a button of a TYPE attribute "submit" is pressed. To define a "submit" button on a FORM, you can use the following <INPUT> tag:

<INPUT TYPE="submit" VALUE="Just do it!">

The line above will create a button on the FORM with a caption set to the string in the VALUE attribute. When this button is pressed, the browser will collect the information from the fields in the FORM and using the method defined in the METHOD attribute of the FORM make a query string and send it to the CGI application defined in the ACTION attribute.

Hardwiring a URL with existing question marks and plus signs to set up the input to a CGI script defeats the purpose of having a FORM in the first place. This is where the POST request comes in to tell the browser how to make the input string for you by using the input from a FORM.

Handling HTML FORMs with GET Methods

Data collected from an HTML form can also be sent for processing with the FORM keyword using the GET method. See the code with the HTML page shown in Listing 22.2.


Listing 22.2. Simple FORM input.
 1 <html><head> <title>Welcome.</title>
 2 </head>
 3
 4 <body>
 5 <center><h1>Test A Script</h1></center>
 6 <hr>
 7 <p>
 8 <A HREF="http://ikra.com/cgi-bin/test-cgi
   ÂIts+deja+vu+all+over+again" >Click Me</A>
 9 </p>
10 <p>
11 <FORM
12 METHOD="GET" ACTION="http://ikra.com/cgi-bin/test-cgi>
13 <INPUT TYPE="Submit" VALUE="Just Do It">
14 </FORM>
15
16 </body></html>

The rendering of this listing in Netscape is shown in Figure 22.1. Pressing the Just Do It button returns an argument count of 0.

Figure 22.1 : A simple form.

Accept some more input from the user to get more information about the FORM. The modified form is shown in Figure 22.2. Listing 22.3 shows how the text area was inserted.

Figure 22.2 : Using a simple form with a text area.

The <BR> tag causes a line break and forces the button onto the next line. Without the <BR> tag, the button would be on the same line as the text widget (space permitting). The following tag collects the input for the FORM:

Type something here: <INPUT SIZE=60 NAME="response">

The length of the string the user can type in is set to 60 characters wide. The value sent to the shell script from this text widget is assigned to the keyword response. Let's see how the Perl shell script is called when the button is pressed.


Listing 22.3. Sample form with text input.
 1 <html><head>
 2 </head>
 3
 4 <body>
 5 <center><h1>Test A Script</h1></center>
 6 <hr>
 7 <p>
 8 <A HREF="http://ikra.com/cgi-bin/test-cgi
ÂIts+deja+vu+all+over+again" >Click Me</A>
 9 </p>
10 <p>
11 <FORM
12  METHOD="GET" ACTION="http://ikra.com/cgi-bin/test-cgi>
13  Type something here: <INPUT SIZE=60 NAME="response">
14  <BR>
15  <INPUT TYPE="Submit" VALUE="Just Do It">
16  </FORM>
17
18 </body></html>

The output is shown in Figure 22.3. Look closely in the middle of the figure to see the line:

Figure 22.3 : The output of the request from the text area.

"response=Caution+Cape+does+not+NOT+enable+user+to+fly"

Look at the value assigned to QUERY_STRING. The "not+NOT" is deliberately done to catch your eye. As you can see, the string is not easy to read. Look at the title and location of the Netscape window in Figure 22.3. The value of QUERY_STRING is set to a format that is expected by the CGI script at the server.

Handling an HTML FORM with POST Methods

Handling the POST method is different than handling the GET method. In the POST method, you use the STDIN line as the source of your input. The length of the string being passed into the POST-handling script is set in the CONTENT_LENGTH identifier.

To illustrate the use of CONTENT_LENGTH and POST methods, you'll work with a slightly more complicated input FORM. I'll construct the FORM shown in Figure 22.4. The HTML code for this page is shown in Listing 22.4. The Perl script behind the FORM is shown in Listing 22.5.

Figure 22.4 : Sample credit card application form.


Listing 22.4. A sample credit card application form.
 1 <html><head> <TITLE>Sample Credit Form</TITLE>
 2 </head>
 3
 4 <body>
 5 <center><h1>Sample Credit Application Form</h1></center>
 6 <hr>
 7 <FORM METHOD="POST" ACTION="http://ikra.com/cgi-bin/credit.pl">
 8  First Name <INPUT SIZE=20 NAME="fname">
 9  Last Name  <INPUT SIZE=20 NAME="lname"> <BR>
10  Social Security Number <INPUT SIZE=12 NAME="ssn">
11  Mom's Maiden Name <INPUT SIZE=20 NAME="mname"> <BR>
12 <HR>
13 <H4>Type of Cards Desired</H4>
14 <INPUT TYPE="CheckBox" VALUE="VISA" NAME="visa">VISA
15 <INPUT TYPE="CheckBox" VALUE="MCRD" NAME="mastercard">Mastercard
16 <P>
17 <HR>
18 <H4>Number of Dependants</H4>
19 <SELECT NAME="dependants" SIZE="1">
20 <OPTION SELECTED>1
21 <OPTION>2
22 <OPTION>3
23 <OPTION>4
24 <OPTION>5
25 <OPTION>6
26 </SELECT>
27 <HR>
28 <H4>Yearly Income</H4>
29 <INPUT TYPE="Radio" VALUE="1" NAME="income">0-10K
30 <INPUT TYPE="Radio" VALUE="2" NAME="income">10-20K
31 <INPUT TYPE="Radio" VALUE="3" NAME="income">20-30K
32 <INPUT TYPE="Radio" VALUE="4" NAME="income">30-40K
33 <INPUT TYPE="Radio" VALUE="5" NAME="income">40-50K
34 <INPUT TYPE="Radio" VALUE="6" NAME="income">50K+
35 <HR>
36  <INPUT TYPE="Reset" VALUE="Clear Form">
37  <INPUT TYPE="Submit" VALUE="Submit">
38  </FORM>
39
40 </body></html>

Here is the output from the credit.pl Perl script:

=============================================

SERVER_NAME = pop.ikra.com
REQUEST_METHOD = POST
SCRIPT_NAME = /cgi-bin/credit.pl
QUERY_STRING =
CONTENT_TYPE = application/x-www-form-urlencoded
CONTENT_LENGTH = 91
income is set to 5
ssn is set to 123-45-6789
lname is set to Doe
dependants is set to 4
mastercard is set to MCRD
mname is set to Jane Smith
fname is set to John

In this output from the POST request, the REQUEST_METHOD is POST, and the query string is shown as empty! So where did all the user's input go? The input has been pumped into the standard input of the Perl script. You have to design your Perl script to pick the input from either the POST or GET requests automatically. Listing 22.5 illustrates how to process both types of requests.


Listing 22.5. The Perl script to handle credit.html.
 1 #!/usr/bin/perl
 2 #
 3 # The sample script file to show difference in
 4 # handling POST and GET requests.
 5 #
 6 #
 7 $|=1;            # Flush immediately.
 8 print "Content-Type: text/plain\n\n";
 9
10
11 print "\n=============================================\n";
12 print "SERVER_NAME = $ENV{'SERVER_NAME'}\n";
13 print "REQUEST_METHOD = $ENV{'REQUEST_METHOD'}\n";
14 print "SCRIPT_NAME = $ENV{'SCRIPT_NAME'}\n";
15 print "QUERY_STRING = $ENV{'QUERY_STRING'}\n";
16 print "CONTENT_TYPE = $ENV{'CONTENT_TYPE'}\n";
17 print "CONTENT_LENGTH = $ENV{'CONTENT_LENGTH'}\n";
18
19 if ( $ENV{'REQUEST_METHOD'} eq "GET" &&
20      $ENV{'QUERY_STRING'} ne '') {
21     $form = $ENV{'QUERY_STRING'};
22     }
23 elsif ( $ENV{'REQUEST_METHOD'} eq "POST" ) {
24     read(STDIN,$form, $ENV{'CONTENT_LENGTH'});
25 } else {
26     print "\n At least fill something! I cannot work with empty strings";
27     exit;
28     }
29
30 #
31 # Now the variable $form has your input data.
32 # Create your associative array.
33 #
34     foreach $pair (split('&', $form)) {
35         if ($pair =~ /(.*)=(.*)/) {  # found key=value;
36         ($key,$value) = ($1,$2);     # get key, value.
37         $value =~ s/\+/ /g;  # substitute spaces for + signs.
38         $value =~ s/%(..)/pack('c',hex($1))/eg;
39         $inputs{$key} = $value;   # Create Associative Array.
40         }
41     }
42 
43 foreach $item (keys(%inputs)) {
44     print "$item is set to $inputs{$item}\n";
45 }
Lines 19 through 28 contain fragments of code that actually determine where to pick up the input.

Basically, this script handles the input for a GET request with non-empty input and a POST request with any input. At the end of this conditional, $form has the input string in a URL-encoded form. Obviously, this kind of data handling is not acceptable in a real-life scenario. The parsing of the incoming input to figure out if it's POST or GET has to be done so many times and in so many shell scripts that it's really a good idea to simply write a subroutine that handles both types of processing. Once you have such a subroutine defined, all you have to do is simply include it in the rest of the CGI scripts to extract the incoming parameters.

In either case, the output of the Perl script is what is sent back to the calling browser. In other words, all the words written to STDOUT (the default if a file handle is not specified in the print statement) are sent the browser. In fact, the output is forced to be flushed as soon as possible with the use of the $|=1 command.

Tip
You have to deal with handling any errors in input. Imagine the type of input your script might receive if your user walks away from his desk and his three-year-old gets to do some typing! Always check for input into your CGI form. It's better to be safe than sorry.

In Listing 22.5, lines 19 through 28 will parse the incoming parameters into an associative array called inputs. Look at what you have parsed into the incoming inputs associative array from a test run. The output of what the values that were entered in the form and sent to the script are set in inputs:

%inputs{'income'} is set to 5
%inputs{'ssn'} is set to 123-45-6789
%inputs{'lname'} is set to Doe
%inputs{'dependants'} is set to 4
%inputs{'mastercard'} is set to MCRD
%inputs{'mname'} is set to Jane Smith
%inputs{'fname'} is set to John

If you look at the HTML file that invoked this script, you'll recognize some of the indices in the %inputs array. The keys used to index into the %inputs array were set in the HTML document. They have now been passed into the Perl script for use. The %inputs array now has all the values for you to work with.

Of course, you always have to check the incoming values to see if they make sense. There are several ways to check the input for your credit card application example. You could check if the social security number has the right number of digits, if all the fields were filled in, and so on. One possible way to check the input is shown in Listing 22.7. Note how each variable is tested for a range of values and to see if it's empty. In your HTML pages and CGI scripts, you must check for missing or inconsistent responses. Prepare for the worst-case scenario.

The tedious part is checking for all the possible responses that your user can type in. Checking for non-zero responses, empty strings, and out-of-range values takes time in execution and in setting up tests. However, the time will be well spent if the users of your page are not given Server Error messages, or, even worse, data on bad input without even a whimper of an error message. This type of response may lead to the user actually believing in the erroneous test results.


Listing 22.6. Checking for missing or inconsistent responses.
  1 #!/usr/bin/perl
  2 #
  3 # The sample script file to show difference in
  4 # handling POST and GET requests.
  5 #
  6 #
  7
  8 $|=1;            # Flush immediately.
  9 print "Content-Type: text/plain\n\n";
 10
 11 print "\n=============================================\n";
 12 print "SERVER_NAME = $ENV{'SERVER_NAME'}\n";
 13 print "REQUEST_METHOD = $ENV{'REQUEST_METHOD'}\n";
 14 print "SCRIPT_NAME = $ENV{'SCRIPT_NAME'}\n";
 15 print "QUERY_STRING = $ENV{'QUERY_STRING'}\n";
 16 print "CONTENT_TYPE = $ENV{'CONTENT_TYPE'}\n";
 17 print "CONTENT_LENGTH = $ENV{'CONTENT_LENGTH'}\n";
 18 print "SERVER_NAME = $ENV{'SERVER_NAME'}\n";
 19 print "\n=============================================\n";
 20
 21 $form = &FormArgs;
 22 if ($from eq "0" ) {
 23     print "\n At least fill something! I cannot work with empty strings";
 24     exit;
 25     }
 26 #
 27 # Now the variable $form has your input data.
 28 # Create your associative array.
 29 #
 30     foreach $pair (split('&', $form)) {
 31         if ($pair =~ /(.*)=(.*)/) {  # found key=value;
 32         ($key,$value) = ($1,$2);     # get key, value.
 33         $value =~ s/\+/ /g;  # substitute spaces for + signs.
 34         $value =~ s/%(..)/pack('c',hex($1))/eg;
 35         $inputs{$key} = $value;   # Create Associative Array.
 36         }
 37     }
 38
 39 $income = $inputs{"income"}; # Check if an income value was selected.
 40 if (($income < 1) || ($income > 6)) {
 41     print "\n Please specify your income range";
 42     exit;
 43     }
 44
 45 $ssn = $inputs{"ssn"};
 46
 47 if ( $ssn =~ /[0-9]{3}-[0-9][0-9]-[0-9]{4}/)
 48     {
 49     $snumber = $ssn;
 50     $snumber =~ s/\-//g;
 51     }
 52 elsif ( $ssn =~ /[0-9]{9}/) {
 53     $snumber = $ssn;
 54     }
 55 else    {
 56     print "\n Enter the social security number in the form XXX-XX-XXXX";
 57     exit;
 58     }
 59 $fname = $inputs{'fname'};
 60 if ($fname eq "") {
 61     print "\n Please enter your first name";
 62     exit;
 63     }
 64
 65 $lname = $inputs{'lname'};
 66 if ($lname eq "") {
 67     print "\n Please enter your last name";
 68     exit;
 69     }
 70
 71 $mname = $inputs{'mname'};
 72 if ($mname eq "") {
 73     print "\n Your mother's maiden name is required";
 74     exit;
 75     }
 76
 77 $dependants = $inputs{'dependants'};
 78 if ($dependants < 1) {
 79     print "\n Now, now, we have to be dependant on ourselves.";
 80     exit;
 81     }
 82
 83 #if ($dependants > 10) {
 84     #print "\n Would you like me to contact the IRS for you?";
 85     #exit;
 86     #}
 87 #
 88 #
 89 $visa = $inputs{'visa'};
 90 $mastercard = $inputs{'mastercard'};
 91
 92 if ($visa eq "" &&  $mastercard eq "") {
 93     print "\n At least pick one card ! ";
 94     exit;
 95     }
 96
 97 exit;
 98
 99 #------------------ explicitly bailout ----------------
100
101 #
102 # A simple subroutine to briefly test incoming input
103 #
104 sub FormArgs {
105
106     if ( $ENV{'REQUEST_METHOD'} eq "GET" &&
107          $ENV{'QUERY_STRING'} ne '') {
108     $form = $ENV{'QUERY_STRING'};
109     $form; # return value is true.
110     }
111     elsif ( $ENV{'REQUEST_METHOD'} eq "POST" &&
112          $ENV{'CONTENT_LENGTH'} ne '0') {
113     read(STDIN,$form, $ENV{'CONTENT_LENGTH'});
114     $form; # return value is true, continue
115 } else {
116     "0"; # Unable to process
117     }
118 
119 }

The code in line 8 makes the call to the FormArgs function which extracts all the arguments into an associative array and returns a value of true if any arguments were extracted or not. If no values were extracted, the code in line 22 will bail the program out with an error message.

The loop defined in lines 30 through 37 splits the incoming string and places all the variable=value pairs into the %inputs array. Recall that the input string is in the form var1=value1+subvalue1&var2=value and so on. Spaces are converted to + signs, each assignment is separated from the other using an ampersand.

The code in line 30 splits each assignment that is delimited by ampersands. Then each element is placed in the $pair variable for use in the for…each loop statements. In line 31, the element in the $pair variable is examined to see if it has the form variable=value, that is there is a word on either side of an equal sign within the contents of the $pair variable.

If an assignment is found, the code in line 32 extracts the name of the variable being assigned to into the $key variable, and the value in the $value variable. The contents of the $key variable will be used to index into the %inputs array during the rest of the program. The contents of the $value variable will be that in the $pair variable. The extra plus (+) signs are replaced with spaces in line 33. The line is terminated in line 34. Finally in line 35 we actually index into the %inputs array to assign a value using the $key value extracted in line 32.

The rest of the lines of code (lines 38 to 71) are pretty straightforward in the way they check for blank or incorrect input value. Of particular interest is how the social security number is interpreted in this script (see line 47). The number can be read in from the user as XXX-XX-XXXX, where (X is a decimal digit from 0 to 9), or as a string of nine decimal digits XXXXXXXXX. This situation has been taken care of with the two conditions for the regular expressions.

A social security number is quite meaningless to someone who lives outside of the United States. When designing pages that are user specific or where the country of origin matters, it's best to either provide a warning or an alternative page. How would you handle a phone number in this scenario? Phone numbers in the United States are assigned in a different way than they are in a foreign country. When designing HTML pages, you have to keep these sensitive and important internationalization factors in mind.

Returning HTML Pages

So far I have dealt only with returning messages back in the form of text data. The beauty of CGI is the ability to send back custom HTML pages in response to your requests. Instead of sending back a content-type of plain, you send back a type of html. This is done with the following statement:

print "Content-type: text/html\n\n";

It's your responsibility to make sure that your script sends back a valid HTML page regardless of how badly the input is messed up. The returned page should have the correct <HTML></HTML> tags and should be syntactically correct. Remember that the browser will be expecting an HTML page as soon as it sees the context type of html. Don't forget the extra empty line. Also, remember to use \n\n to terminate the string.

Refer to the code in Listing 22.7 to see how the error message is constructed from an empty string. Basically, the very first error that occurs is being reported (rather than flooding the user's screen with a page full of error messages). Naturally, this is a design decision that you have to make as you design your HTML pages. Do you inform the user only of the first error, or do you tell him or her about every conceivable error that has occurred with the input? Pouring on too many error messages will only serve to annoy or confuse the user.


Listing 22.7. Send back an HTML page.
  1 #
  2 #!/usr/bin/perl # # The sample script file to show difference in
  3 # handling POST and GET requests.
  4 #
  5 #
  6
  7 $|=1;            # Flush immediately.
  8
  9 ##
 10 ## THE NEXT LINE IS DIFFERENT FROM THE PREVIOUS
 11 ## SCRIPTS:
 12 print "Content-type: text/html\n\n";
 13
 14 $form = &FormArgs;
 15 if ($from eq "0" ) {
 16     print "\n At least fill something! I cannot work with empty strings";
 17     goto BAILOUT;
 18     }
 19 #
 20 # Now the variable $form has your input data.
 21 # Create your associative array.
 22 #
 23     foreach $pair (split('&', $form)) {
 24         if ($pair =~ /(.*)=(.*)/) {  # found key=value;
 25         ($key,$value) = ($1,$2);     # get key, value.
 26         $value =~ s/\+/ /g;  # substitute spaces for + signs.
 27         $value =~ s/%(..)/pack('c',hex($1))/eg;
 28         $inputs{$key} = $value;   # Create Associative Array.
 29         }
 30     }
 31
 32 $error = "";  ## No errors to start with
 33
 34 $income = $inputs{"income"}; # Check if an income value was selected.
 35 if (($income < 1) || ($income > 6)) {
 36     $error = "Please specify your income range";
 37     }
 38
 39 $ssn = $inputs{"ssn"};
 40
 41 if (error eq "") {
 42 if ($ssn =~ /[0-9]{3}-[0-9][0-9]-[0-9]{4}/)
 43     {
 44     $snumber = $ssn;
 45     $snumber =~ s/\-//g;
 46     }
 47 elsif ( $ssn =~ /[0-9]{9}/) {
 48     $snumber = $ssn;
 49     }
 50 else    {
 51     $error = "Enter the social security number in the form XXX-XX-XXXX";
 52     }
 53 }
 54
 55 $fname = $inputs{'fname'};
 56 if ($fname eq ""  && error eq "") {
 57     $error =  "Please enter your first name";
 58     }
 59
 60 $lname = $inputs{'lname'};
 61 if ($lname eq "" && $error eq "") {
 62     $error =  "Please enter your last name";
 63     }
 64
 65 $mname = $inputs{'mname'};
 66 if ($mname eq "" && $error eq "") {
 67     $error = "Your mother's maiden name is required";
 68     }
 69
 70 $dependents = $inputs{'dependents'};
 71 if ($dependents < 1 && $error eq  "") {
 72     print "Now, now, we have to be dependent on ourselves.";
 73     goto BAILOUT;
 74     }
 75
 76 #
 77 #
 78 $visa = $inputs{'visa'};
 79 $mastercard = $inputs{'mastercard'};
 80
 81 if ($error eq "") {
 82 if ($visa eq "" &&  $mastercard eq "") {
 83     $error = "At least pick one card ! ";
 84     }
 85 }
 86
 87 print <<"HTMLHEAD";
 88 <HTML><TITLE>This is a test</TITLE>
 89 <BODY>
 90 <p>
 91 HTMLHEAD
 92
 93 if ($error eq "")
 94     {
 95     print "\n <H2>Congratulations!</H2> ";
 96     print "<P>Your application has been accepted";
 97     print "<P>We will be living off your interest payments shortly";
 98     }
 99 else
100     {
101     print "\n <H2>Error!</H2> ";
102     print "\n <P>$error<P>";
103     print "\n <P>Please correct the error and retry";
104     }
105
106 #
107 print <<"HTML";
108 <p>
109 <A HREF="http://www.ikra.com/credit.html">Restart</A>
110 <A HREF="http://www.ikra.com/index.html">Home Page</A>
111 <p>
112 </BODY></HTML>
113
114 HTML
115
116 exit;
117
118 #
119 # A simple subroutine to briefly test incoming input
120 #
121 sub FormArgs {
122
123     if ( $ENV{'REQUEST_METHOD'} eq "GET" &&
124          $ENV{'QUERY_STRING'} ne '') {
125     $form = $ENV{'QUERY_STRING'};
126     $form; # return value is true.
127     }
128     elsif ( $ENV{'REQUEST_METHOD'} eq "POST" &&
129          $ENV{'CONTENT_LENGTH'} ne '0') {
130     read(STDIN,$form, $ENV{'CONTENT_LENGTH'});
131     $form; # return value is true, continue
132 } else {
133     "0"; # Unable to process
134     }
135
136 }

This script produces the header for the HTML header and body first with the code in lines 87 through 91. Line 87 asks Perl to print everything until the string HTMLHEAD is found by itself on a line. Line 88 starts a new HTML page, followed by the start of the body of the page with the <BODY> tag, and then a blank line with the <P> tag. Note that I did not use the <TITLE> and </TITLE> tag pair.

print <<"HTMLHEAD";<HTML><TITLE>This is a test</TITLE>
<BODY>
<p>
HTMLHEAD

Then, the script examines the $error string to see if it had any problems listed in it. If no problems are seen (that is, the $error string is empty), then the script accepts this input and prints out an acknowledgment. On the other hand, if there are some problems, then the script prints out the value of $error to show what the errors are and print those out instead.

At this point, the script can write out HTML tags and text for sending back the content of an HTML page to the browser. Regardless of what the result of action is, you have to close out the HTML output with the </BODY> and </HTML> tags. Then you are done. The response is sent back to the browser, and you can safely exit.

Perl gives you, as a programmer, enormous flexibility and power to control how you handle responses and echo back messages. I used the construct print << "HTML". Anything from that statement on will be printed to STDOUT (standard output of the script or until the end of file), until either that exit statement or the word HTML is found by itself on one line.

Using the Collected Data

So far, you've been able to collect the incoming data from the user and verify that it is correct for the HTML FORM you are supporting. Now the question is what can you do with the collected data? Well, basically anything you want, because it's local to your script now. Two of the most common actions you might want to take with this data is to archive it to disk or mail the contents as a message to someone.

Archiving User Responses

The archival process to store the incoming data can be done in many different ways. You can use the incoming name and other information to store values in a text string or a database. Using the techniques covered in Chapter 18, "Databases for Perl," you can construct your own database. At the very least, you can archive the responses in a plain text file by appending them to an existing file.

A simple solution is to use the following lines to write them all out. It'll be one long text file.

open (MYARchIVE,"applicants", 0666) || die "Cannot open Archive";
print MYARchIVE "SOR";
print MYARchIVE "$inputs{'income'} ";
print MYARchIVE "$inputs{'ssn'} ";
print MYARchIVE "$inputs{'lname'} ";
print MYARchIVE "$inputs{'dependants'}";
print MYARchIVE "$inputs{'mastercard'} ";
print MYARchIVE "$inputs{'mname'} ";
print MYARchIVE "$inputs{'fname'} ";
print MYARchIVE "EOR";
close MYARchIVE;

Using a crude method like this might get you by if you have only a few applicants. The
appended, plain text file might be hard to manage and search. This method for storing user responses certainly won't handle multiple applications by the same individual. What you probably want to do is to store the information in some sort of database file. Look at Chapter 18 for information on using free databases generated by DBM (the database management utilities supplied with Perl) or using commercial database applications such as Oracle, Sybase, Informix, and so on, which you can access from within a Perl script using the DBI (database interface).

For a commercial application, you're better off using an existing database from Oracle, dBASE, or some other commercial database management system. With a commercial system you're able to use the DBI to take advantage of particular features of that database.

Perl comes with several modules, including GDBM, NDBM, and SDBM. For the purpose of illustration, they are functionally the same, so I'll use GDBM. This will help keep the focus on how to handle data from within a CGI script, rather than going off into a tutorial on databases.

In this script, you'll use the GDBM_File.pm module with the following line:

use GDBM_File;

All Perl modules end with the .pm extension; the use command does not require that this be specified. To see if you have the module in your system, look in the /usr/lib/perl5 directory for all the files ending *.pm.

Next, you have to figure out how to store the users' responses in this database. An almost-unique key for this application is the user's $ssn field. Perhaps you can create the index by concatenation of the $ssn field with the last name ($lname).

$appIndex= $lname . $ssn;

Using this $appIndex variable, you can index into your sample database, which is called applicants.dbm. Create this database first and then associate it with the %applicants associative array. That way, if the applicant has already applied for credit, you can give him or her an error message or proceed with updating his or her information. The action to take is really up to you. The following snippet of code shows how to use DBM to track applicants:

dbmopen(%applicants,"applicants.dbm", 0666);
if ($applicants{$appIndex} eq "") {
$error = "You have already applied for credit. We get back to you";
}
else {
$applicants{$appIndex}= $form;
}
dbmclose(%applicants);

Basically, you are saving the query string in $form for future use. Any other script reading the applicants.dbm file will have to break this string apart to get the individual words, just like in the credit.pl script.

Forwarding User Responses

Another alternative use of the incoming data is to mail the bulk of the information to another user. This feature is invaluable for firms that provide services on the Internet. For example, you could send a mail message to the sales representative for a mail-order firm when a FORM is filled out, or you could send the contents of a bug report FORM to a help desk representative-basically, whenever someone fills out your FORM and you get a mail message saying that they want more information.

Now you can incorporate the mail feature in the Perl script you've been working with. Look at the section of code you have to add to get this "mail back" feature. The mail can be sent just before you exit instead of updating your internal database. append the following snippet of code to Listing 22.9 to add the mailing feature to your CGI script:

$subject = "Credit Application for " . $fname . " " . $lname;
$sendto  = "credit_rep@big.rich.bank.com";
$fullname =  $fname . " " . $lname;
$cardDesired = $mastercard . " " . $visa ;
#
# open pipe to send mail to .

open (MAIL,"| /usr/bin/sendmail $sendto") || die " Mail does not work!";

select(MAIL);  # Now all the output will go MAIL

$date = `date`;
chop $date;

print << "EMAIL";
Date: $date
From: $user@using.browser.com
To: $sendto
Subject: $subject

Name = $fullname
Card Desired = $cardDesired

Number of Dependents = $dependents
Income Level = $income
Date of application = $date

Please contact them at your convenience,
Sincerely
Your Dearly Beloved Web Server
EMAIL

#
# Do other processing here.
# Print out to MAIL if you have to.
#

close(MAIL);  # send mail

#
# ... continue processing if you have to or exit.
#
exit(0);

There are a few points to note concerning the script shown above. First, a UNIX pipe is opened to the sendmail program. The | character in the filename argument tells Perl to open a pipe, not a regular file. Refer to Chapter 14, "Signals, Pipes, FIFOs, and Perl," for more information. Now all the text sent to the MAIL handle will be sent to the program at the end of the pipe. In this case, the program at the end of the pipe is the sendmail program.

The select(MAIL); statement selects MAIL as the default file handle for all the output. This is simply a convenience for me as a script writer. If I do not do this, my print statements would all have to be of the form print MAIL. If I inadvertently forget to specify a MAIL handle in a print statement, it will be sent to STDOUT and not MAIL. You can elect not to use this method.

The statement for the From field in the mail message is hardwired to From: user@using.browser.com. You can add fields in your HTML FORM to accept a return mail address and collect it in a field called $returnAddress. This way, the reply to this mail message is sent directly to the user. The line of code to set the return address looks like this: From: $returnAddress.

It would be nice to show the date of the application relative to the server. (The applicant could be on the other side of the world for all you know.) The chop command gets rid of the carriage return at the end of the line returned from the date command.

The close(MAIL) call terminates the input to the sendmail program, which in turn sends the mail out. You can select (STDIN) again at this point or bail out.

Summary

This chapter covered how to write HTML FORM pages and how to write CGI scripts for handling input. There are two methods used for querying information from an HTML script: GET and POST. The GET method sends the data collected from the FORM in the environment variable called QUERY_STRING. The POST method sends the data in via the standard input (STDIN) to the script, and the length of the input is set in the CONTENT_LENGTH environment variable. Spaces within a value, when passed as an argument to the script handling the input, are shown as plus signs (+). Different assignments to a variable are separated by ampersands (&).

The CGI script can process the input by verifying it for acceptable parameters and return replies in the form of HTML pages or plain text. The CGI script then can store the data away in a database, mail the results to someone else, or both.