Chapter 11 The CGI and Networks

CONTENTS

More CGI Specifics
- MIME
- Other CGI Realizations
Server-Side Includes
- SSI Syntax
- SSI Commands
How CGI Works
CGI Summary

At this point, knowing what you know about CGI, you can begin to lay out what is important when deciding to work in a programming environment. The first consideration is whether the programming language you use allows you to read it, enhance it, and maintain it. Match your language to your needs. Some languages are more powerful, but take more time and RAM to do their processing. Keep storage concerns in mind. Every language has memory needs. There has to be sufficient disk space to store the language software itself, as well as room for the language to create temporary files "on the fly" that it may need to operate properly.

How fast can clients access each particular feature on your server, whether it is your Web pages or your FTP archive? When determining your access response time standard, the five-second rule is a good rule to follow. If it takes more than five seconds for the client to load whatever it has requested from your server, you are taking too long. The one big exception to the five-second rule is a search engine response-which may take longer.

The aptitude the language has for text manipulation is also important. Finally, figure out whether the language can "talk" easily with the various applications that will be accessing the Web server. The ability to "talk" means that the language you use should easily mesh with the applications you use on your server when it uses them. The best way to decide how well a language will "talk" with your applications is to determine which applications you will be using with the language-perhaps a database program-and then examine the potential CGI language based on the needs of your list of applications.

CGI scripts work best with well-organized data. There should be an overall method that you use to decide where each piece of data goes and to guide your file storage. Are you storing data by type or by category? Or are you storing data by kind of access or ease of display? This method can then be easily applied to your CGI scripts, to integrate them into your system's present organizational structure.

Make sure you know where everything is going before you get going. Establish a directory for final scripts, then don't clutter up that directory with test scripts. Create a way to check which version of your script is current, and where your test scripts are located, such as keeping test scripts in a directory labeled "TEST_SCRIPTS." Then rename and move a script to its proper directory, perhaps a script library folder, once it is ready for use.

Following accepted programming practices is very important. Document your code. Use separate directories for production and testing. Keep up-to-date maintenance logs with each new bug recorded in sufficient detail to be useful later.

Getting a handle on HTTP is essential for success with your CGI scripts. Keep up with any changes through any number of USENET newsgroups. Know the client/server cycle. Whether it is using CGI or not, the client/server dynamic should be well understood by anyone running an HTTP service. You should also have an understanding of how data is passed using <STDIN> and environmental variables, including the primary methods for doing this-that is, GET and POST in the HTML form format.

More CGI Specifics

There are several protocols and specifications used by the CGI, like HTTP, FTP, and e-mail. MIME specifications are very important when using the CGI with a Web server. There are several languages that can be used with these CGI specifics, such as C/C++, REXX, Python, and of course, Perl.

Understanding their role is important in understanding where your Perl scripts will fit in without difficulty. Another important consideration is whether protocols or specifications need adapting, such as with MIME headers, which must be included at the beginning of any Perl script returning data from your server to a client.

MIME

For CGI to run smoothly on your server, it has to know what kind of data is coming in so it can figure out what to do with it. Using MIME specifications, your server tells the client what kind of file it is returning to fulfill its request, so the client knows what to do with that file. Some of the different file types handled with MIME specifications are HTML, JPEG, GIF, MOV, and so forth. A full list of MIME header file types can be found in Appendix B.

If a client's request comes in, and it contains a METHOD=GET (or POST) argument, then some kind of data will be written to standard output, which is then sent by the server to the client. The initial print statement must be output in a kind of form string. Its format would resemble

Content-type    type/subtype    <line feed> <line feed>

and when this appears in Perl, it will look like this:

print "Content-type: text/html \n\n";

where \n is Perl's line feed escape.

Adhering to MIME specifications lets the client know that it is receiving a text file to be treated like HTML. Note that there are two line feed escape commands given. The first makes the initial line feed to go to line 2 of the output. The second produces a completely blank line. For a CGI script to run successfully, the second line of its output must be blank.

An example of a simple HTML document sent to satisfy a client's request might look something like Figure 11.1, where a simple greeting is sent back to the user. It is accomplished with the following script:

Figure 11.1 : A simple CGI greeting using Perl.

#!/usr/bin/perl
#kingsley.pl
print "Content-type: text/html \n\n";
print "<HTML> \n";
print "<HEAD><TITLE>Hey Now!</TITLE></HEAD> \n";
print "<BODY>Hey Now!</BODY></HTML> \n";
exit;

Other CGI Realizations

Now, a quick look at other commonly used CGI languages is in order. These are the other CGI languages that you may have heard of-or even encountered-in written scripts. Among them are C, C++, REXX, and Python. Some quick definitions will help you to understand better what is being discussed when you come across references to these other CGI languages, and even how they compare with Perl. To do this, a quick review of Perl as a CGI language is helpful here.

Perl

The first reason for using Perl as a CGI language is due entirely to its popularity. There are so many people using it that the number of Perl libraries is huge, and these libraries are growing every day. Ready-to-use Perl routines are often included in many Web server packages. Often, new Internet applications include Perl gateway routines free of charge.

There are also numerous support areas; the most important of these are the newsgroups, comp.lang.perl.misc and comp.lang.perl.announce, where Perl creator, Larry Wall, has been know to hold court, as well as many of the other key players in Perl's development and evolution, such as Tom Christiansen and Randal Schwartz.

Finally, Perl routines can be viewed as mix and match modules. Often a Perl script can be used in a new configuration with an intent that was never meant for the original script. These smaller scripts can reside quietly in the library waiting to be used for many other tasks, thus making Perl even more efficient.

Perl behaves as an interpreted language, which means that it interacts with the operating system in a more complicated way than a compiled language like C. This creates a longer processing time than a compiled language. The trade-off for this loss in speed with Perl is the ease with which its code can be written and understood.

C/C++

C is one of the most widely accepted languages in the scripting world. Even though it is a very difficult language to learn, it is very powerful. Its difficulties and power both derive from the same reason-it is a low-level language. Although this makes C seem positively archaic in its peculiarities, it works very close to an operating system when it is parsing and processing code, making it very fast, even when used to create large applications.

C works directly with the operating system itself, instead of the processes which often act as an intermediary between a language and an operating system. This can be seen best when C is run against interpreted scripts like Korn, Bourne, and Perl, especially when used in large database programs. The interpreted languages must operate out of a shell, whereas C does not. This can be very advantageous to a Web site designer, who does not want to expose his or her system's operating shell to the outside world.

Another benefit of C is that it has extensive libraries of procedures, lots of readily available sample code, debuggers, and support groups. As the primary language for UNIX development, C is also secure as far as having a future, as well. Learning C is never a waste of time.

Another factor that makes C attractive is that it adds an extra level of security to a Web server. Because it is compiled, there is no need to have the source code on your server. The C language works by being compiled into machine language. It is this machine language executable that sits on your server, not the original C script. Any of the three people who are smart enough to figure out how to modify machine code are not likely to waste their time trying; they're too busy trying to invent a new organism, or artificial intelligence for their toasters.

But, as with most of C's features, there is a "very good" and "very bad" side to each. The tie to UNIX makes it a natural for UNIX boxes, and UNIX programmers. Those of us who are working in different environments, such as Windows NT, have special concerns that do not mesh easily with the UNIX world, some of which have already been mentioned in this book. With UNIX, the operating system works without the additional level of the graphical interface that Windows NT has, making UNIX closer to the actual operating system. This is ideal for C, which also works closely with the operating system. With Windows NT, the graphical interface level creates a layer of difficulty for C that has to be overcome.

There is also no file I/O or OS interfacing in a low-level language such as C. Many of the other features expected from a higher-level language (such as Perl) are available only in procedural script libraries with C. Each OS will have its own specifics when it comes to these libraries.

There is a great debate raging, as there always is, as to whether or not C or Perl is the superior CGI language. I have a sneaking suspicion that those people who learned to program with C will favor C, and those who learned with other languages will favor those languages. Perl is a language that excels when it deals with text manipulation. The CGI environment's biggest demand is the ability to deal with text smoothly, making Perl an obvious best choice.

C++ is a superset of C, meaning that its functions and operators are built on C, so the concerns that are raised with C also apply to C++ as far as the scope of this book is concerned.

REXX

REXX is a language that was developed at IBM in the late 1970s. Mike Cowlishaw developed REXX as a procedural language.

One problem with REXX is that there is still no formal standard. Although most REXX scripts are portable, there is no standard specification that applies to REXX, so there can be irregularities that need to be addressed when using REXX scripts from another operating system. It is also not as popular as Perl or C for CGI programming, so there is much less "out there" for REXX in the way of support and script libraries. REXX, however, has been ported to just about every platform, including OS/2, Macintosh, Amiga, AS/400, and mainframes such as VM and MVS.

This is not to make the claim that Perl is the be all and end all of CGI programming. The more a Web Master knows, the more tricks he or she has to solve problems. Complex sites will often contain some Perl and C scripts. In addition, the CGI languages mentioned here are only a few of the those currently available. Because CGI is only a specification, not a language itself, there are no real restrictions on what can be used for it. For more information on REXX, read M.F. Cowlishaw's book, The REXX Language, A Practical Approach to Programming (Prentice-Hall, 1985). This is the closest thing there is to a formal REXX standard. You can also try these Web sites:

http://www.pvv.unit.no/RexxLA/index.html
http://www2.hursley.ibm.com/rexx/
ftp://rexx.uwaterloo.ca/pub/rexxfaq.txt

Python

Python was developed in Amsterdam, The Netherlands, by Guido van Rossum for a company called Stichting Mathematisch Centrum. It is a new language that goes even further in being readable than Perl.

Python, like Perl, is an interpreted language, so like Perl the Python interpreter must be installed on your server to use it. From the start Python was developed as an object-oriented language, so it is a strong, effective language for programming CGI.

Python's most interesting feature is its huge library of functions. It can use these to communicate over networks or to access system-specific functions. There is also a CGI library that is available to programmers. Some of the functions in the library include parsing, printing the defined environmental variables (to aid in debugging), and printing the contents of an HTML form. To attest to Python's usefulness it was used to create Infoseek's search engine, and all of its other programs. More information about Python can be found at

http://www.python.org/
comp.lang.python

Server-Side Includes

There is one other area of CGI that warrants attention before concluding this section. To provide greater usefulness to your users, you might want to have Server Side Includes as a service available on your Web server. It is important to note that although the NSCA and Netscape HTTP servers do support Server Side Includes, or SSIs, at the time this book goes to press, the CERN HTTP server, EMWAC HTTP server, and IS do not provide such support. There are, however, several other HTTP servers available, which also support SSIs. CERN has announced plans to make SSIs a part of their server in the future.

Although SSIs are not strictly in the realm of the CGI, they are included here because they may solve some problems you may have with some of your Web pages. Remember, you want to have as many problem-solving strategies as possible available because many problems require more than one strategy to be solved successfully.

SSIs denote the handling of special extensions to HTML tags. Resembling HTML documents, SSI files are very similar to HTML files. SSI files differ from HTML in their use of a superset of the CGI environmental variables. This does not make SSI files particular to the CGI, because SSI files do not have to have a gateway to operate.

SSIs do not run automatically. You have to enable them on your server before they will work. Please check your HTTP server's documentation for the way in which to enable SSIs.

It is important to know that although SSIs add a lot to a Web site, they also place a greater demand on your server's resources. For SSIs to run, the server has to read, or parse, every line of the SSI file to find the special SSI commands. If you find your server is becoming overworked, one quick way to deal with the traffic overload is to disable the SSI capacity on your server. Be very careful not to identify every HTML file as an SSI extension, or then your server will parse every HTML file that is accessed. In turn, this creates a huge drain on your server's resources and a time lag in satisfying the client's request. Typical SSI files have .shtml as their file extension.

Comparing SSIs with HTML will give you a clearer picture. Because SSIs are parsed by the server into HTML before they go to the browser, an SSI file looks very similar to an HTML file. This is file jazz.shtml.

<HTML>
<HEAD>
<TITLE>Jazz on the Web</TITLE>
</HEAD>
<BODY>
<H1>Jazz on the Web!</H><BR>
<H2>
This site was last modified on <!--#echo var="LAST_MODIFIED" -->
</H>
<HR>
<A HREF="http://town.hall.org/Archives/radio/Kennedy/Taylor/"><H1>Jazz Styles</A></H>
<A HREF="http://www.yahoo.com/Entertainment/Music/Artists/By_Genre/Jazz/">
<H1>Jazz Musicians</A></H>
<A HREF="http://www.yahoo.com/Entertainment/Music/Genres/Jazz/Labels/">
<H1>Jazz Labels</A></H><BR><BR>
<H2>The jazz quote of the day is:</H><BR>
<!--#include virtual="quotes" file="jazzquotes.html" -->
</BODY>
</HTML>

This file looks like a regular HTML document, except for two lines. The first line shows a very common use of SSI, which is to keep a running update on when the page was last modified.

The server reads the line

This site was last modified on <!--#echo var="LAST_MODIFIED" -->

and sees that it has to resolve the variable LAST_MODIFIED, which it then echoes to the client in HTML. The second line not standard to HTML is

<!--#include virtual="quotes" file="jazzquotes.html" -->

which the server parses and then sees it has to add the file jazzquotes.html to the page before it sends the data to the browser. The end result looks something like Figure 11.2.

Figure 11.2 : An example SSI file parsed in HTML.

It is interesting to note that when the server returns the called SSI file, jazz.quotes.html, it not only attaches the file to the new, parsed HTML file, but in the exact place where the SSI line of code is in the unparsed .shtml file.

If your haven't already noticed, SSI commands are an adaptation of the HTML comment tags. This was intentional, so that if you move your HTML documents, which contain SSIs, to another server, they will look the same, regardless of whether that server supports SSIs or not. Inside the HTML comment form the SSI syntax looks like this:

<!--#command cmd_argument="argument_value" -->

where the command is a special SSI command, the cmd_argument is related to the SSI command, and the argument_value is based on the cmd_argument.

SSI Syntax

There are six commands for SSI. They are config, echo, exec, fsize, flastmod, and include. Their functions are listed in Table 11.1.

Table 11.1 SSI Syntax

Command	Function
config	This sets the format of the size, time, or error messages.
echo	This will place the value of the SSI variables into an HTML document.
exec	Will execute a CGI program or system command, and output the result into the HTML document.
Fsize	Places the size of the file in the HTML document.
Flastmod	Places the date of the last modification of an HTML document into that HTML document.
Include	Places the contents of other HTML documents into the HTML document, or specified data file.

When working with SSI commands, it is important to remember these rules:

SSI syntax is based on UNIX commands, which are case-sensitive. Config is not the same as config.
SSI requires the right file extension if it is to be recognized and parsed by the server. Make sure your SSIs are .shtml. You can also turn parsing on for all documents on your server, or set the file's attributes, like the execute or archive bit. You also must make sure that your server is aware of this file type by associating the extension using File Manager.
There are no spaces in the beginning of an SSI line of code. It should always be <!--#command.
There should always be one space after the argument_value, as "jazzquotes.html" -->.
The argument_value should always be surrounded by double quotation marks.
SSI recognizes only path names that start either at the server root or are in a subdirectory of the directory where the SSI file is found. Do not use any backslashes in the path name.

SSI Commands

The command, config, short for configuration, does not appear in HTML documents. This command is used to change the expression of your other SSI commands as they appear in your HTML documents.

With the config command you can control the standard text output of any SSI command. For example, if you wanted to change how the date is sent back to the user from this format-Monday, May 11 10:32:43 EST 1996-to one more user-friendly, you would do this by modifying the flastmod command. You can also modify the error message that is sent out and the way the file size is formatted.

If you want to change the date, you use the command_argument timefmt in the SSI command. For the argument_value you can use any of these:

%a-Abbreviates weekday name, based on present locale.
%A-Gives full weekday name, based on present locale.
%b-Abbreviates month name, based on present locale.
%B-Gives full month name, based on present locale.
%c-This is the preferred date and time display for the present locale.
%d-Decimal number from 0 to 31 that represents the day of the month.
%H-Decimal number from 00 to 23 that represents the hour of 24-hour measured time.
%I-Decimal number from 01 to 12 that represents the hour of 12-hour measured time.
%j-Decimal number from 001 to 366 that represents the day of the year.
%m-Decimal number from 1 to 12 that represents the month.
%M-Decimal number from 00 to 59 that represents the minute.
%p-Gives a.m. or p.m. based on the time value, or the corresponding strings for the present locale.
%S-Decimal number from 00 to 59 that represents the second.
%U-Decimal number from 1 to 52 that represents the number of the week of the current year. It begins with the first Sunday of the first week.
%w-Decimal number from 1 to 6 that represents the days of the week, beginning with Sunday.
%W-Decimal number from 1 to 52 that represents the number of the week of the current year. It starts with the first Monday of the first week.
%x-This is the preferred date and time display for the present locale, minus the time.
%X-Gives the preferred time configuration based on the present locale, minus the date.
%y-Decimal number from 00 to 99 that represents the year, excluding the century.
%Y-Decimal number from 00 to 99 that represents the year, including the century.
%Z--Gives the time zone, name, or abbreviation.
These command argument values could be used as such:

Today is <!--#config timefmt="%a" --> <!--#echo var="DATE_LOCAL" -->

You accessed this page on hour <!--#config timefmt="%H" --> <!--#echo var="TIME_HOUR" -->

It is important to include the appropriate echo command for the server to return the desired response to the client.

The include command is the most basic of the SSI commands. It is most commonly used to add files to HTML that are needed in a variety of places. This eliminates the need to cut and paste that data each time by the use of the one line SSI command instead.

There are two command arguments: file and virtual. File indicates any file in the current directory, or in a subdirectory of the current directory. Virtual indicates any file that is originated at the root directory. The argument values for each are the actual path and file names, like this:

<!--#include file="/addresses/ad_mailing.html" -->

which automatically adds my mailing address to the HTML documents that need it.

The virtual command argument causes the server to look for the file in question in the root directory, as designated by the srm.conf file. When using the virtual command, the path name must be preceded by a backslash, and then the entire path name must be included. This differs from the file command argument which cannot start with a backslash, because it can look only in the current directory, or subdirectory, as the .shtml file is in, and not above it.

The kinds of files you can include are not limited to text only or HTML only files, but can be other SSI parsed files, excluding those that include the include command argument.

The task of the flastmod command is to note when changes were last made to a file, hence the name f(ile)lastmod(ified). As with the include command, flastmod uses file and virtual as its command arguments. The same rules apply to these command arguments as to the include command. Flastmod is used to indicate to the user the last time a file, like a Web zine, was modified, so that users will know if the information is new to them.

The fsize command is concerned with the size of the file. This command is handy when dealing with thumbnails of images on a home page that lead to the larger versions. The fsize command can indicate the size of each image, so users can decide if they have the time to view it. This also helps with downloads.

The fsize command accepts both the file and virtual command arguments, like the flastmod and include arguments, with the same parameters.

With the echo command on, SSI works with five command arguments. Unlike previous command arguments, the items below are not case-sensitive:

DATE_LOCAL-This creates the current time and date based on the time zone indicated by the server and the server software. The output can be modified using the command config and the command argument timefmt.
DATE_GMT-This is the current time and date based on Greenwich Mean Time, the common time reference accepted on the Internet.
DOCUMENT_NAME-This is the file name of the main document.
DOCUMENT_URI-This is the path name and file name of the main document. A URI (Universal Resource Indicator) can be considered the same as an URL.
LAST_MODIFIED-This is the time and date the main document was last modified based on the last time the document was saved, surprise, surprise.

The command argument used with the echo command is var. A typical use of the echo command might look like this:

<!--#echo var="document_URI" -->

where the document_URI refers to the URI of the first document first parsed by the server. Although there are technical differences, you can consider a URI the same as an URL. This variable refers to the URI/URL of the first file that sets the value for echo variables.

When you get to debugging SSI, and the echo command, the server will return the word (none), in brackets, when it cannot find the variable it is supposed to echo.

The exec command deals with controlling the operating system from inside the SSI HTML. Most of the commands regularly available from the command line are also available to the exec command. This makes the exec command very powerful, so powerful that, just like SSIs, it may be turned on or off by the server.

Using the exec command, an SSI file can automatically access a shell or execute a CGI script. Client response is not necessary for this to happen. The various shell commands available to the exec command allow the SSI script to use any of the environmental variables discussed earlier.

The command argument for the exec command is "cmd." The argument variables available to exec are all the arguments available to the current shell. The many options available to you are best utilized when you have a greater understanding of UNIX and the shells it uses, like the Bourne or Korn shells. The most important shell to learn is the one you may have on your server. The exec command can also be used with CGI and Perl.

To use the exec command with CGI, the command argument "cgi" is used instead of "cmd." This allows you to execute a CGI script inside SSI. One drawback is that the SSI still needs the CGI script to create its own headers, so an NPH-CGI script (non-parsed header) will not work. This is why you should not use NPH-CGI scripts in any SSI files because the NPH-CGI script will not generate the necessary header for the SSI file. Without the header, the client will be unable to use, or view, the returned file.

This last tip about SSIs is related to speed. In the various descriptions of the commands you may have noticed the server starts looking in the immediate directory, then proceeds down from there. To speed things up you can place the SSI files in the same directory as the .shtml file that calls it, and not in a subdirectory.

How CGI Works

The basic model of how the CGI works is fairly straightforward. When a user's browser, called the client, contacts your server, it may ask for a special, non-HTML file to be accessed. The server then accesses this file and returns any results to the client.

A Demonstration

Remembering our HTML form document from Chapter 10, where each element of an HTML form was demonstrated, it would be nice if we could apply that to the CGI. In a very simple way we can. This next HTML document does not lead to another HTML document, but uses CGI to call the page from your CGI bin, or "cgi-bin" as you will come to know it in your scripts and directory trees. The HTML tags are self-explanatory, and the document uses the METHOD=GET command to pass data to your CGI script.

<HTML>
<HEAD>
<TITLE>The Submission Page</TITLE>
</HEAD>
<BODY>
<H2>Press this button and submit to me!</H2>
<FORM Method="GET" Action="/cgi-bin/submit.pl">
<INPUT type="submit" value="Total Submission">
</FORM>
<HR NOSHADE>
</BODY>
</HTML>

This produces something like the screen in Figure 11.3.

Figure 11.3 : The submission page- passing data to the CGI.

In submit.html, the user selects the submit button which tells the server to call the file in the cgi-bin named submit.pl. When the server looks for this file, this is what it finds:

#! /usr/bin/perl
     # submit.pl
     print "Content-type: text/html\n\n";
     print <<'eop';
     <HTML>
     <HEAD>
     <TITLE>Total Submission</TITLE>
     </HEAD>
     <BODY>
     <H2>Thank-you for submitting!</H2>
     We look forward to your future submissions.
     </BODY>
     </HTML>
     eop

which looks like Figure 11.4 when it reaches your browser.

Figure 11.4 : A Web page created from a CGI script.

This is a good time to touch on some of the elements that you see in submit.pl. The first is the name itself: In Perl, the files are best named in lowercase, followed by the extension ".pl." You could use ".cgi" here as well, but the consensus among CGI programmers seems to be that if you're serious about your CGI programming, which of course you are, then it is better to signify what language you are using for your CGI script in the name of the file.

The next thing to discuss is the first line of the Perl script

-- #! /usr/bin/perl

This tells the server reading the file that this is a Perl script and where it can find Perl, so it can deal with this program. The "#!" is a special use of the two characters in Perl that are interpreted by the shell of a UNIX system as the executable for the following script. In NT this is not the case, but the convention is so deeply ingrained in Perl scripting convention that you will most likely see all Perl scripts with this opening line. The "#!" will be valid only in the first line of the script, then only the "#" symbol is necessary for marking comment lines in Perl, like the second line-# submit.pl. This is the name of the file, and it is good programming technique to always put the name of your file somewhere in here near the top of the script.

The next line uses the Perl command print. This is the standard tool for getting Perl to output data. The data that is output is the MIME header information, which tells the server to create the proper header for an HTML text file. The next line uses a programming trick that makes use of Perl's << command, which tells Perl to print everything that follows the << command, that is, eop (which is short for "end of perl") until it encounters the eop tag again. Perl makes a lot of sense, doesn't it?

Before you get too excited, maybe we should try to create an HTML form that actually takes user data and passes it to the CGI. The example above might as well have been a static HTML page for all the trouble it took.

Forms and CGI

One of the most common needs of a Web site is to have a form that gathers mailing list information. If you wanted to, you could create a form with a single textbox to get this information, or even easier, put in an e-mail tag and a request for the user to e-mail you his or her address. Although these are the easier ways to gather the data, it limits you on your options on the back end. It would be nice if this information could be sent into a database where a mailing list can be created based on city, or zip code. To do this, each piece of information needed must be input in its own field. An HTML form that asks for this might look like the following:

<HTML>
<HEAD>
<TITLE>Your Address Please</TITLE>
<BODY>
<CENTER><H2>Your Address, Please!</H2></CENTER>
<HR NOSHADE>
<FORM Method=GET Action="/cgi-bin/nph-address.pl">
<CENTER>
<TABLE Border=0 width=6Ø%>
<CAPTION Align=top>
<H2>What's Your Address?</H2></CAPTION>
<TH Align=left> First Name
<TH Align=left colsspan=2> Last Name <TR><TD>
<INPUT type=text sixe=10 maxlength=2Ø name="first">
<TD colspan=2>
<INPUT type=text size=32 maxlength=4Ø name="last"><TR>
<TH Align=left colspan=3> Street Address <TD><TD><TR>
<TD colspan=3>
<INPUT type=text size=61 maxlength=61 name="street"><TR>
<TH Align=left> City
<TH Align=left> State
<TH Align=left> Zip Code <TR>
<TD><INPUT type=text size=2Ø maxlength=3Ø name="city">
<TD><INPUT type=text size=2Ø maxlength=3Ø name="state">
<TD><INPUT type=text size=7 maxlength=1Ø name="zip"><TR>
<TH Align=left colspan=3> Telephone Number <TR>
<TD colspan=3><INPUT type=text size=15 maxlength=15 name="phone" value="999.999.9999"> <TR>
<TD width=50%><INPUT type="submit" name="address" value="Send In Your Address">
<TD width=50%><INPUT type="reset" value="Reset this Form"><TR>
</TABLE>
</CENTER>
</FORM>
</BODY>
</HTML>

This gives you a page like that shown in Figure 11.5.

Figure 11.5 : An address form to gather user data for the CGI.

All of the data from the form is URL encoded into name/value pairs and attached to the end of the URL of that page, as is regular procedure using the GET method. At the server end, this data is put into QUERY_STRING, the environmental variable that handles this kind of data. This data is also referred to as the query string. A sample query string from this form might look like this:

QUERY_STRING first=Bobby&last=Hull&street=1Ø63+Golden+Jet+Lane&city=
Pointe+Anne&state=Ontario&zip=CHI+BLA&phone=61Ø.555.117Ø&address=Send+In+Your+Address+

This query string would appear right after the ? in the URL of the form's action argument-/cgi-bin/nph-address.pl.

Non-Parsed Headers

As has been discussed previously, memory management is an ongoing concern of any Web server administrator. Any opportunity to reduce the work your Web server has to do to fulfill client requests should be taken. With that in mind, non-parsed header (NPH) CGI scripts can help lessen your server's work load.

NPH scripts are used to create headers that are not parsed by the server, as their name would indicate. Remember the address request HTML form? There was a program call through the CGI for nph-address.pl.

When data is passed to the CGI, it creates a header that tells the server the context of the data, and then sends the requested data itself.

Next the server has to create a response header to send to the browser. With this NPH, you can skip the stage between the CGI and the server, because it designates what the client is to do with the data within the data itself. Cleaning up the long string that is sent back to the user attached to the URL (making it a very ugly, long string) is a way to add some class to your Web page and using an NPH script that could do that might look like this:

#!/usr/bin/perl
     # nph-address.pl
     print<<"eop"
     HTTP/1.Ø 2Ø4 No Content
     eop

The response header specification is given with the "HTTP/1.0 204 No Content" line, where the value "204" informs the browser that there isn't any data to load with this response header. Sometimes, a confirmation HTML page is sent to the user, but this small script will accomplish the same thing and save time.

When the script is run, the browser is informed to let the current HTML document stay displayed. It is important when working with NPH that you use the nph- prefix in the file names, and no other variations, because these will only cause you CGI grief.

CGI Summary

When dealing with the CGI, Perl is one of the premier languages to run data to and from the Web pages on your server. The client/server dynamic is moderated by MIME specifications that inform both client and server what kind of data is being passed between them. Both SSIs and NPH are features that can enhance the CGI, each with different results.

There is much more to the CGI than is covered in this book. For more information on the CGI, there is the e-mail mailing list

CGI-L Common Gateway Interface list <CGI-L@VM.EGE.EDU.TR>, which can be
subscribed to by sending the message "subscribe" to listserv@VM.EGE.EDU.TR.
This list deals with the many issues involved with the CGI that are not
covered in this book. More information can be found at this URL: http://
www.yahoo.com/Computers_and_Internet/Internet/World_Wide_Web/CGI_Common_Gateway_Interface/

or the NCSA site

http://hoohoo.ncsa.uiuc.edu/cgi/

as well as the CGI library at

http://www.bio.cam.ac.uk/cgi-lib/

which has many Perl scripts to work the CGI. There is also this site

http://www.city.net/win-httpd/httpddoc/wincgi.htm

which specializes in Windows CGI concerns. Learning more about how your server uses the client/server model and how each element of the CGI is regulated on your server will help you to write better CGI scripts, as well as providing better access, and service, to the users of your Web sites.