Chapter 12 Perl/CGI Libraries and Databases

CONTENTS

Rules of Thumb for Smooth Client/Server Interaction
Stages of URL Encoding
The Standard Perl Library
The Relational Database Model

In this chapter, the concept of Perl libraries is explored and explained, with reference given to several online resources available to you.

Rules of Thumb for Smooth Client/Server Interaction

When designing any script, but especially when designing for CGI, you should follow some traditional advice when it comes to programming. Never forget that you are writing these scripts on a computer, and can take full advantage of what that means.

Always read the documentation. At the very least read the READ MEs. When scripting for different applications, you are always dealing with slight variations on a theme. You need to know what little details might trip you and cause you to stumble.
Test your script intensely. When you test a script, you are trying to make it crash if you can, so in the end it will be resilient to users' usage, however they may use it. Run other applications with it, input all manner of data, and have various system loads for starters.
Look around on the Internet for software tools that have been developed by others. You might even find a script that solves your problem, or gets you pretty close to solving it. These scripts have already gone through some testing (if they hadn't they wouldn't be available for download) so you don't have to reinvent the wheel. Some of the places you can look are the cgi-lib.pl library located at
http://www.bio.cam.ac.uk/web/form.hmtlor the CGI.pm library located at
http://www-genome.wi.mit.edu/WWW/tools/scripting/CGIperl
Get in touch with e-mail list and Usenet newsgroups that deal with your problem. Don't just look for the Perl NT group, but for others that deal with the Web browser you decide to use, the computer hardware you use, and so forth. There are also Frequently Asked Questions, or FAQs, listings that seem to cover just about every configuration going these days, so look for these, too.

Before going into how Perl libraries work, and how you can add them into your CGI scripts, let's discuss some of the stages that data must go through to be sent between client and server on the Internet.

Stages of URL Encoding

To pass data over the Internet using the CGI, it must be URL-encoded. The data is first transferred as name/value pairs. Then these pairs are divided by the "&" symbol. Each pair is identified with each other by the "=" symbol, with the default value entered as the value, unless the default value is undefined, and then the value will be empty. The name/value pair will still be sent. When spaces occur in the encoding, they are represented by the "+" symbol. Any reserved characters necessary must be encoded using their hexadecimal equivalent; a "%" symbol and a two-digit (hexadecimal) number. Those characters with special meaning must be encoded before being sent to the client.

Understanding these elements of URL encoding is a key to successful client/server relations.

The Standard Perl Library

There are several standard Perl library routines available online. The following scripts are examples of the kinds of useful scripts found in these libraries. These scripts come in the form of subroutines that can be placed at the end or beginning of your full script, and then invoked with a subroutine call where necessary in your script.

These scripts range from the simple MIME header script to a larger date creation script. Each of these scripts can be inserted "as is" in your scripts, or adapted to fit your script's needs. Some of these scripts are quite short, and it might seem less trouble to create a standard one- or two-line subroutine. When deciding whether or not to use the "three or greater" rule consider this: If you are going to use a function three or more times in a script, that function should have its own subroutine.

The MIME Header for HTML Documents

This script has been used in some of the other scripts in this book to designate the MIME header of the response going back to the client. The only specification that must be made is the value of $title, the title of the HTML document being returned.

# a subroutine that returns an HTML MIME header
     sub html_header {
          local($title) = @_;
          print "Content-type: text/html\n\n";
          print "<HTML><HEAD>\n";
          print "<TITLE>$title</TITLE>\n";
          print "</HEAD>\n<BODY>\n";
     }

Finding the Request Method Sent to Your Server

To determine what kind of request method, GET or POST, is being used to call your CGI scripts, you can use this short subroutine:

# Find the request method
     sub MethGet {
          return ($ENV{'REQUEST_METHOD'} eq "GET");
     }

This script returns a true value if the request method being used is GET, meaning that a false value would be POST.

Date Stamp Signature

To put a date stamp on a page that is in a different format than the available environmental variable, this subroutine can be adapted to whichever date format you wish. This version contains a copy line that states the creator of the page, and offers a link to the home page of that creator.

# a date stamp signature
     sub date_stamp{
          local($sec, $min, $hour, $mday, $mon, $year, $wday, $yday, $isdst) = localtime;
          local(@days) = ('Sunday','Monday','Tuesday','Wednesday','Thursday','Friday','Saturday');
          local(@months) = ('January','February','March','April','May','June','July',
		  'August','September','October','November','December');
          print"<P>\nThis page created by <A HREF=\"http://www.my_server.com/index.html\">";
          print "<B>my_domain.com</B></A>";
          print " on ";
     printf("%Ø2d:%Ø2d:%Ø2d on %s %d, %d", $hour, $min, $sec, $months[$mon], $mday, 1900+$year);
          print "</BODY></HTML>\n";
     }

NOTE

When the year 2000 rolls around, the 1900 value needs to be changed

Automatic Return

This subroutine tacks on a return link to your default home page at the bottom of an HTML document generated by your server through the CGI. It also includes an image.

# return to the default home page
     sub home {
          local($gif,$text) = @_;
     # if nothing supplied, use default gif and text
     if ($#_ < Ø) {
          $gif = "default_return.gif";
          $text = "Return to our home page";
     }
     print "<HR>";
     print "<A HREF=\"http://my_server.com/index.html\">";
     print "<IMG src = \"http://my_server.com/$gif\">";
     print "$text </A>";
     print "<HR>\n";
     }

Displaying the Variables Sent to the CGI Script

To create an HTML display of the variables sent to your Perl scripts through the CGI, add this subroutine. This is handy if you want to return the data that a user has input into your form for a verification check.

# display the variables
     sub printvar {
          local(%in) = @_;
          local($old, $out, $outout);
          $old = $*;
          $* = 1;
          $output .= "<DL COMPACT>";
          foreach $key (sort keys(%in)) {
               ($out = $_) =~ s/\n/<BR>/g;
               $output .= "<DT><B>$key</B><DD><I>$out</I><BR>";
               }
          }
          $output .= "</DL>";
          $* = $old;
          return $output;
     }

This subroutine will create a list of the variables in the HTML document.

Displaying the Variables Sent to the CGI Script in Compact Form

This is similar to the previous list, except that it returns the list of variables as one line-not as a list.

# display the variables in compact form
     sub printvarcompact {
          local(%in) = @_;
          local($old, $out, $outout);
          $old = $*;
          $* = 1;
          foreach $key (sort keys(%in)) {
               ($out = $_) =~ s/\n/<BR>/g;
               $output .= "<B>$key</B> is <I>$out</I><BR>";
               }
          }
          $output .= "</DL>";
          $* = $old;
          return $output;
     }

Storing User Data Sent to Your Server

To store the data sent to your server from an HTML form using either the GET or POST method, this subroutine will store it in the associative array @in.

# storing user request data
     sub readparse {
          local(*in) = @_ if @_;
          local ($I, $loc, $key, $val);
          # read in text
          if ($ENV{REQUEST_METHOD'} eq "GET") {
               $in = $ENV{'QUERY_STRING'};
             } elsif ($ENV{'REQUEST_METHOD'} eq "POST"); {
                   read(STDIN,$in,$ENV{'CONTENT_LENGTH'});
                   }
               @in = split(/&/,$in);
               foreach $in(Ø..$#in); {
     # convert pluses into spaces
                $in[$i] =~ s/\+/ /g;
     # split into key and values on the first '='
               ($key,$val) = split(/=/.$in[$i],2);
     # convert %XX from hex to alphanumeric
               $key =~ s/%(..)/pack("c",hex($1))/ge;
               $val =~ s/%(..)/pack("c",hex($1))/ge;
     # associate the key with value
               $in{$key} .="\0" if (defined($in{$key}));
     # 0 is used as the separator of multiple values
               $in{$key} .= $val;
          }
          return 1;
     }

The Relational Database Model

The database model that most of you will be familiar with is the relational database. In this model the data is stored in a table format, with the columns used as fields and the rows used as records. The row/records relate to the column/fields, making a relational database.

To put data into a two-dimensional database like this, Perl has the split operator, which can be used on flat file records to prepare them for storage in a relational database. Perl also can search several databases by linking, or joining, common fields in each database.

Before you jump into designing Perl search scripts, it is important to discuss the databases themselves. A well-organized database speeds up searches, and is easily adaptable to your future needs. Creating and maintaining a good database takes a great deal of time, but this is time well-spent, so figure out how to work it into your schedule. The time you spend now creating a clear, straightforward database will repay you every time it is used.

Searching with Perl

The standard Perl library, which comes with every installation of Perl for Windows NT, contains the Perl script, or routine, look.pl. This script can be used to do binary searches on large text files, such as long HTML documents, or relational databases. A binary search does not go through each entry sequentially, looking for a match, but divides the file in half, and half again until a match is found. This speeds a search tenfold.

Binary Searches and look.pl

When the large text file in question is sorted with a key field, then look.pl will begin a binary search. This example provides an HTML form from which the user enters a last name for the search to match. The database structure for the search is based on the simple format shown in Table 12.1.

Table 12.1 Simple Personnel Database Format

Last Name Uppercase	User ID	Full Name
FAWKES	xfawkes	Xavier Fawkes
JOHNSON	ejohn	Ernie Johnson
OLIVIA	molivia	Magdalene Olivia
SMITH	lsmith	Leora Smith
WALTER	twalter	Tony Walter

This database file can be stored as name_file.dat, or by whatever file extension your database application uses.

The search through this file, using the example scripts and look.pl, produces the matching name, e-mail address, and the person's full name. Watch out for the embedded space between the first and last names in the third row. You must make sure that Perl splits the entries between the rows, and not between these embedded spaces. This is accounted for in the example script.

The example data file is also already sorted alphabetically by last name. With this simple search there is no provision for Perl to organize the various entries before making the search, although this feature can be added easily. The problem with adding this procedure is that it drastically increases the processing time Perl needs. A properly organized and maintained database does not need this procedure added.

The HTML form that asks for user input looks like that shown in Figure 12.1. The script asks for the last name of the person for whom the search is being conducted, as well as for a user selection of an upper limit to the number of matches that will be returned.

Figure 12.1 : A simple name search form.

<HTML>
<HEAD>
<TITLE>Simple Name Search</TITLE>
</HEAD>
<BODY>
<H2>Name Search</H2>
<OL>
<LI>Please type in the last name of the person you are looking for. Three letters will do.<BR>
<LI>Please select the appropriate match limit.
</OL>
<P>
<FORM method="POST" action+"http://my_server.com/cgi-bin/name_file.pl">
<B>Last Name</B>
<INPUT name="last_name">
<P>
<B>Match Limit</B>
<SELECT name="match-limit">
<OPTION> No Limit
<OPTION> 1ØØØ Matches
<OPTION> 5ØØ Matches
<OPTION> 25Ø Matches
<OPTION selected> 1ØØ Matches
<OPTION> 5Ø Matches
<OPTION> 25 Matches
<OPTION> 1Ø Matches
</SELECT><P>
Submission Choices: <INPUT type="submit" value="Begin Search">
Reset Search: <INPUT type="reset" value="Reset Match">
</FORM>
</HTML>

This form places a request to name_file.pl, which is the Perl script that will search the specified data file, name_file.dat.

#! /usr/bin/perl
     # name_file.pl
     require 'look.pl';
     require 'cgi-lib.pl' # another standard Perl script
     &html_header("Search Results"); # our subroutine
     # from earlier in this chapter
     &parse_request; # from cgi-lib.pl
     $title = "Name Search Results";
     $server = "\@my_server.com";
     if ($query{'debug'} =~ /no debug/) {}
     else {
          &debug_info;
     }
     if ($query{'last_name'} eq "") {
          print "<H2>Please enter the last name.</H2><P>";
          print "<A HREF=\"http//my_server.com/namser.html\">";
          print "Please enter the name again.</A>";
          exit 1;
          }
     $path = "database/path"; # where your database
     # files are keep
     $names = "$path/name_file.dat";
     &search_result();
     print"<HR>";
     $counter = Ø;
     open(NAME_LIST, $name_list)|| die "Unable to open $name_list data file.";
     &look(*NAME_LIST, $query{'last_name'},Ø,1);
     # this is the subroutine from look.pl that
     # performs the binary search. Without this
     # subroutine this search would go through the 
     # entire file one entry at a time...very slow!
     while (<NAME_LIST>) {
          last unless /^$query{'last_name'}/i;
          @line = split(/\s\s+/);
          if ($counter > $query{'match-limit'}) {
               $counter--;
               print "<I>Match limit of $counter reached...search ending.</I>";
               last; 
               }          
          print "<H2>Your Search Results</H2><P><HR><P>";
          print "<PRE>";
          printf(" %-2Øs   %-15s   %-3Øs", $line[Ø], $line[1]$server, $line[2]);
          print "</PRE>";
          }
     close(NAME_LIST) || die "Unable to close $name_list data file.";
     print "<HR>";
     print "Your search found $counter match(s).<P>";
     print "<A HREF=\"http://my_server.com/ namser.html\">";
     print "Another search?</A>";
     exit Ø;
     sub search_result{ # the search result display 
          $fhdr="<B>Last Name</B>";
          $chdr="<B>E-mail Address</B>";
          $shdr="<B>Full Name</B>"
          print "<PRE>";
          printf(" %-25s   %2Øs   %3Øs   ",$fhdr, $chdr, $shdr);
          print "</PRE>;
      }
     sub html_header { # our html header
          local($title) = @_;
          print "Content-type: text/html\n\n";
          print "<HTML><HEAD>\n";
          print "<TITLE>$title</TITLE>\n";
          print "</HEAD>\n<BODY>\n";
     }
     sub show_debug {         
          while (($key,$value) = each(%query)) {
               print "The value of $key is $value <BR>";
          }
     }
     exit Ø;

This produces a result like that shown in Figure 12.2.

Figure 12.2 : Simple search results.

This binary search is based on only one parameter given to the CGI script by the user. How do you ask for more than one parameter? That would be a binary search involving multiple user parameters.

Binary Searches Involving Multiple Parameters

To search a data file using more than one parameter from the user is very similar to making a search based on only one. Using the look.pl script again, this kind of search requires a larger HTML form. This is an HTML document that asks the user for parameters to search a database of Web links, and then return the matches to the user.

<HTML>
<HEAD>
<TITLE>
Search Our Links
</TITLE>
</HEAD>
<BODY>
<H2>
Search Our Links!
</H2>
<P>
<HR>
<P>
We have a huge data base of links. Please enter the following information to find
links that will interest you.
<P>
<OL>
<LI>Link Subject - What is the Site about?
<LI>Date Limit - How old/new is the Site?
<LI>Number of Matches - How long a list of Sites?
</OL>
<P>
<FORM method ="POST" action="http://my_server.com/cgi-bin/data_search.pl">
<B>Link Subject </B><INPUT name="subject" size=2Ø maxlength=3Ø>
<P>
<B>Date Limit</B>
<SELECT name="date">
<OPTION>Last Week
<OPTION>Last Two Weeks
<OPTION>Last Month
<OPTION>Last Six Months
<OPTION>Last Year
</SELECT>
<P>
<B>Number of Matches</B>
<SELECT name="matches">
<OPTION>2Ø
<OPTION>1Ø
<OPTION>5
<OPTION>1
</SELECT>
<P>
Begin Search?: <INPUT type="submit" value="Begin Search">
Reset Search: <INPUT type="reset" value="Reset Match">
</FORM>
<P>
<HR>
</BODY>
</HTML>

In this form, the user can supply the subject of the links for which he or she is searching, and the Perl script dbsearch.pl will examine your database file of links and return them to the user. The previous HTML document should appear to the user like that shown in Figure 12.3.

Figure 12.3 : An HTML formfor a multiparameter binary search.

The script that this form calls, dbsearch.pl, takes the user input and places it into dbsearch.pl.

#! /usr/bin/perl
     # dbsearch.pl
     sub match {
         require look.pl;
         require cgi-lib.pl;
     }
     &html_header;
     # read and split the name/value pairs
     read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
     @pairs = split(/&/,$buffer);
     foreach (@pairs) {
          ($key, $value);
          $value = $deweb{$value};
          $form{$key} = $value;
     # Error message if no subject entered
          if ($form{'subject'} eq "") {
               print "<H2>You have to submit a subject name.</H2><P>";
               print <A HREF=\"http://my_server.com/bisearch.htm\">";
               print "Make another search.</A>";
          }
     # this section determines how far back to 
     # run the search
     if ($form{'date'} eq "Last Week"){
          $matchdate = &main'jtod($julnum-7);
     } elsif ($form{'date'} eq "Last Two Weeks"){
               $matchdate = &main'jtod($julnum - 14);
     } elsif ($form{'date'} eq "Last Month"){
               $matchdate = &main'jtod($julnum - 3Ø);
     } elsif {
          ($form{'date'} eq "Last Six Months"){
               $matchdate = &main'jtod($julnum - 18Ø);
     } elsif {
          ($form{'date'} eq "Last Year"){
               $matchdate = &main'jtod($julnum - 36Ø);
     }
     ($nday, $nmon, $nyear) = split(/-/,$matchdate);
     if ($nmon < 1Ø) {
          $nmon = join('','Ø',$nmon);
     }
     if ($nday < 1Ø) {
          $nday = join('','Ø',$nday);
     }
     $matchdate = join('',$nyear, $nmon, $nday);
     $forms = "usr/bin/links.dat"; # the database 
     # searched
     $logpath = "web/logs/"; # a log path
     $logname = "dbsearch.log"; # a log to keep access
     # data in
     $logfile = ">>$logpath$logname";
     &aux_var; # for max_hit counter
     &column_names(); # match column field names
     print "<HR>";
     $counter = Ø; # set match counter to Ø
     open(LINKS, $form) || dies "Unable to open the input file.";
     &look(*LINKS, $form{'subject'},1,1); # the actual
     # search
     open(LOGFILE,$logfile) || die "Unable to open $logname";
     print LOGFILE "$date | $ENV{REMOTE_HOST} | $ENV{REMOTE_ADDR} | $ENV{HTTP_USER_AGENT} |
	 $form{'subject'} | $userform \n";
     # records each search in the log
     close LOGFILE || die "Unable to close $logname\n";
     while (<SUBJECT>) {
          last unless /^$form{'subject'}.*/i;
          @line = split(/\s\s+/);
          if (($line[1] =~ /$userform/i || $userform eq "ALL") && ($line[3] >= $matchdate)) {
               $subject = "<A HREF=ftp://my_server.com/$line[4]>$line[Ø]</A>";
               @date = split(//,$line[3]);
          $date = "$date[4]$date[5]-$date[6]$date[7]-$date[Ø]$date[1]$date[2]$date[3]";
     $counter++;
     }
     # this stops the search once the maximum limit 
     # has been reached
          if ($counter > $ulimit[1]) {
               $counter--;
               last; 
          }
          print "<PRE>";
          printf(" %s  %-1Øs  %-6s  %3Øs","$date,$line[1],$line[2],$subject);
          print "</PRE>";
     }
     close(SUBJECT) || die "Unable to close the file.";
     print "<HR>";
     $hnew = $counter;
     print " Your link search found $hnew match(s).<P>";
     print "<A HREF=\"http://my-server.com/link_search.html\">Search Again</A>";
     exit 1;
     }
     sub column_names {
          $fhdr="<B>Link Type</B>";
          $chdr="<B>Link Name</B>";
          $shdr="<B>Link URL</B>";
          $dhdr="<B>Date Verified Active</B>";
          print="<PRE>";
          printf("%-1Øs  %-1Øs  %-1Øs  %s",$dhdr,$fhdr,,,$chdr);
          print="</PRE>";
     }
     sub j_number {
     local ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) =    localtime;
          $smon = $mon + 1;
          $syear = $year + 19ØØ; # this has to be updated
     # by the year 2ØØØ
          $timedate = join('-',$mday,$smon,$syear);
          &main'dtoj($timedate);
     }
     sub html_header {
          local($title) = @_;
          print "Content-type: text/html\n\n";
          print "<HTML><HEAD>\n";
          print "<TITLE>$title</TITLE>\n";
          print "</HEAD>\n<BODY>\n";
     }
     sub aux_var {
          @ulimit = split(/=/,$pairs[4]);
          $ulimit[1] =~ y/=/ /;
          $ulimit[1] =~ y/a-z/A-Z/;
          $ulimit[1] =~ y/=/ /;
          $ulimit[1] =~ y/a-z/A-Z/;
          if ($ulimit[1] =~ /^no/i) {
               $ulimit[1] = 99999; 
          }
     }
     eval '&match';
     exit Ø;

This script produces an HTML document with the output shown in Figure 12.4.

Figure 12.4 : The results of a multiparameter binary search.

More and more the Web is being used as a way to provide access to large, professional-size databases. These are often called relational database management systems, or RDBMS. The "serious" database systems, such as Sybase or Oracle, become even more powerful when you can provide Internet access to them via the Web. You might want to open this access to the general public, or restrict it to members only. Regardless of your choice, Perl has proven itself as an ideal language for dealing with RDBMS.

Binary searches are very good for dealing with flat files like those shown in the previous examples, but they are limited when dealing with the larger databases. Luckily, these larger databases come with their own search engines. Perl scripts can be written to create Web interfaces with these search engines but, to do this successfully, you must understand how the database behaves on the command line. Each database package has its own command line behavior, so be sure to check the documentation carefully.

To deal with these large kinds of databases effectively you should learn about Structured Query Language, or SQL, which is a standard in computer languages specifically for dealing with RDBMS. Perl is designed to integrate with SQL. One important aspect of this procedure that may be overlooked when dealing with databases is the database itself. Setting up a solid and well-organized database will save you and your users valuable time. Perl can be used to establish a gateway between your Web site and your relational database.

The details of SQL are too involved to be dealt with here, but for more information on SQL try these two sites:

http://www.jcc.com/sql_stnd.html

http://www.inquiry.com/techtips/thesqlpro/

Using SQL often involves understanding C, a language used with most industrial-sized relational databases.

Perl and Relational Databases Concerns

The uses of large, relational databases are many, and extending their utility to your Web pages has an obvious appeal. Perl can be used to create a search gateway between your Web site and database. It can also be used to create a gateway for an SQL-standard database.

Databases also can place large demands on your resources, inhibiting access to your Web site itself. If it is possible to use a separate computer as your database server, that will solve the drain on your resources.

To use your database to its fullest extent, make sure that you answer the following questions:

What are the environmental variables used by your RDBMS?
How does your RDBMS use SQL on the command line?
What is your database server, and what are its peculiarities?
Is there a limit to the number of users to your database stated on your license?

Each database has its own answers, so don't be afraid to check the accompanying documentation. If you have a database administrator, become his or her friend.

Some final things to remember when dealing with Perl and databases are as follows:

Use binary searches on flat files for speedy searches.
Understand the limit of a flat file search so you can switch to a full database search engine.
Both the printf and format operators work well to format output from a database.
Using subroutines properly can greatly reduce scripting time and increase a script's efficiency.
When using Perl with a database's search engine, always take its command line performance to heart.

Adding access to database information makes any Web site more valuable to the user, and is often worth the extra time and headache. Careful planning and implementation will minimize these pitfalls, and provide your Web site with a truly useful asset.