Chapter 10

The Common Gateway Interface


CONTENTS

With the HTTP protocol, web browsers have access to several Internet services, but not to all of them. On its own the browser is also limited in its ability to deal with anything more than static HTML files. One of the ways to bypass this limitation is to use a gateway. A gateway provides a client with an interface that makes files and extension services appear as readable HTML documents. This gives the user the ability both to access other services on your Web server, and to input data to the Web server through HTTP.

NOTE
The only kind of relationship that the CGI is interested in when dealing with Web sites and related structures is the client/server model of computer communications. The computer that makes requests is the client, and the computer that answers these requests is the server. Web sites are stored on server computers, or machines

Gateways sit on the server where they take a client's user input and then output data to the client in a usable format, like an HTML document or URL. The gateway itself does not deal with satisfying the client's request, but finds the files, programs, or scripts on the server that can.

One of the ways to get more dynamic pages is to use Server Side Includes (SSIs) in your Web sites. These are different from the CGI, and will be explained in the next chapter. SSIs can work with-or without-a gateway.

The CGI Explored

The CGI is the specification for the way in which a server's gateway communicates with the Web server. When data comes in from a client's Web browser (discussed later in this chapter under HTTP Headers) that contains a query or an HTML form using the GET or POST request methods, then the Internet service, or "inets," starts up the http service, or https, to deal with the HTTP data that is arriving. The https then sends a message using the CGI specifications to the server's gateway program. The gateway receives this data from the browser either as standard input, or as environmental variables.

Using this data, the gateway initiates whichever response is necessary by parsing and processing the client data. Parsing is the procedure the computer puts the data through, figuring out all its syntax and storing the variables, if necessary, so it is ready to run.

This output goes back to the https as HTML, or some other data format that HTTP can handle; then the https sends this on to the client's Web browser. There may be no response from the gateway if the data it has received is only for storage or input to a database or file folder on the server.

CGI and Your Server

Before going into the interior of CGI (or, at least, the epidermis), a quick look at some of the ways your server organizes its information will shed some light on how CGI operates with your server. To work effectively, CGI has to know where things are. And to enable various functions, keep track of user access, and debug your CGI scripts, you have to know where things are. These are all organized in various common directories on your server. You are never restricted to the directory names used here, but these are the commonly used file names for directories performing the purpose outlined.

NOTE
One common confusion when dealing with a Windows NT-based server occurs because most resources dealing with CGI and other networking concerns are written from a generic UNIX background and use generic UNIX terms and concepts that do not translate easily (or sometimes at all) to a Windows NT format
To make using your system easier, this book uses the proper Windows NT terminology where possible, as well as translating UNIX terms you will encounter in other networking books. For example, consider the term daemon, which is commonly used in most networking texts. Daemons are programs that deal with providing network services to clients. With Windows NT, daemons are simply called services, or sometimes, network services, and sometimes just servers. This should not be confused with the computer that hosts these services, also commonly called a server. Little details like these can cause you hours of grief when you are dealing with the Internet, so be sure to keep this terminology issue in mind.

The Directories

Your NT server works by relying on several things, one of which is a well-organized directory system. Proper data management is crucial to the swift operation of a server. If your services cannot find the data requested of them, they cannot work properly. A badly organized server that forces the various services to go through a series of directories and subdirectories before they can find what they are looking for is just as troublesome. The following directories are used by the server to operate the CGI programs. These are all directories to which you must have access for successful gateway programming. If you are not also a system administrator, you will need to discuss these issues with the person who is.

The Server Root

Many files that determine where CGI programs can operate and what they are allowed to do are stored in the server's root directory. That directory is usually found on the C: drive on the server's computer.

Inside the server's root there are two important areas: the Log directory and the Registry directory. If you don't have access to these directories you can still use Perl scripts, but you will have to consult the system administrator for any additional information, or special access, that you may need.

The Log directory is where all the log files are stored, including logs on errors, security, system, and applications. The most important of these files are the error logs, where all the errors involving your HTML documents, CGI programs, and SSIs (Server Side Includes) are noted. The error logs are necessary for debugging your system. You can view these logs in Windows NT using Event Viewer.

On the Windows NT server, the security log is the log that keeps track of who is using (and has used) what on your server. This log is very useful when tracking the hits your various Web pages receive. The security log employs two icons when used by Event Viewer to speed up your search. A key is used to symbolize a successful action, and a lock signifies an unsuccessful one.

There are two other logs that Windows NT runs as part of its server setup. These are the system log, which records all events that occur in the Windows NT system, and the application log, which records events that occur during application runs on Windows NT. Error calls affecting applications can be found in this log.

You can have logs that keep track of any kind of information. If you have a complicated site that makes use of a lot of the newer HTML tags created by Netscape (like <TABLE>), or you are using Java, you may want to have alternate HTML documents for browsers, such as Mosaic, that do not support the tags that Netscape does.

You can use a log, in conjunction with a Perl script, that counts the different kinds of browsers accessing your server. This data can be used to reorganize your site so that you can direct each different browser to the Web pages that were especially built for that browser.

The Registry directory houses configuration and initialization data from the Registry that is controlled by using the applications Registry Editor, Control Panel, User Manager, or File Manager. The data in the Registry sets up what can happen, like permissions, and how it happens, like using environmental variables, on your server. Permissions are controlled by using the "Permissions" options under the Security menu in File Manager. You can turn on SSIs, so that your Web pages can take advantage of their functions, and also inform your server about new file extensions not covered by MIME specifications, such as the x-parsed-html-type, by using the Associate option under the File menu in File Manager. It is important to keep your server up-to-date with new MIME specifications so that these different file types will be handled properly on your server.

Some of these new file extensions are used to determine which files your server will search for SSIs. Files with the extension .shtml will be parsed by the server as SSIs. Also, you must check what file types your server allows by examining the file name extensions recognized by your system. You can do this by using Registry Editor and looking in the HKEY_LOCAL_MACHINE / SOFTWARE subtree in the Classes directory folder. A full listing of the file names recognized by your system is listed by its name, that is, the text file name information is stored in a folder titled .txt.

Microsoft recommends that you use read-only when using Registry Editor, and then make any changes through the appropriate Control Panel application or through File Manager, each of which has the proper procedure to make changes to the Registry already built in. To add the Perl file name extension, .pl, you can use the Associate feature in File Manager, which is found under the File menu.

Conversely, you could have a very well laid out plan for your memory management that restricts the parsing of certain types, but your server might be set up to parse all documents-thus hindering all of your hard work. Information about the commands that affect each directory tree is found in the HTML form tag here:

<DIRECTORY directory_path>

where the directory_path is all the files and directories included under that directory tree. The command set shown above is ended with the </DIRECTORY> tag.

To control the data in your log, use the tools that Event Viewer contains, like the sorting or filtering options. You can sort to list the order of the entries, or events, from oldest to newest or newest to oldest, by choosing one of the options under the View menu in Event Viewer. Oldest to newest is the recommended setting.

The size of event logs can become a problem very quickly on a busy server. Filtering helps to reduce this problem. When events are filtered, you determine the start and end of when events are listed and the types of events logged. The log can start at the first entry, or be assigned a specific date. The log can be listed until the last event, or have a specific end date. It is recommended that you choose a specific period of events to view because the entire list can become very long, very quickly.

The different types of events include:

To look for specific events, Event Viewer uses the Find function under the View menu. The parameters here are very helpful for locating a specific type of log event, whether it be an error for which you are looking, or to determine whether an application launched successfully.

You can save, or archive, any of these event logs in Event Viewer, where they will be stored using the .evt file extension. Logs are stored with the following fields of data: Date, Time, Source, Type, Category, Event, User, Computer, and Description. These fields are stored as comma-delimited files, which means that you can import this data into most spreadsheet and database programs.

Use the Registry Editor to control your configuration files. This is the NT application that edits the Registry, which is where the NT stores all of its configuration files. It is recommended that when you use the Registry Editor you convert it to a read-only format by selecting the Read Only command under the Options menu. This allows you to view all the data in the various configuration files without the fear of accidentally overwriting crucial data.

You can usually find what you are looking for in the HKEY_LOCAL_ MACHINE subtree directory in the Registry. This is where your hardware, software, security, system, and related configuration files are kept. Another is the HKEY_CLASSES_ROOT where the different file formats and data types are defined.

The Document Root

The directory tree you will likely find yourself in most of the time, once you've gotten most of your bugs out of the way, is the document root. This is where you keep all the HTML documents for a Web site available for client access. All the directories contained within the root directory are considered part of the document root.

The root directory for your Web site might be

c:/HTTP/bin/my_site

and the document root for the HTML file index.htm in my_site would then be

/HTTP/bin/my_site/

Principles of CGI Programming

The Common Gateway Interface is one way for a Web server, using HTTP, to "talk" to the operating system or the server's machine. It works using requests from the client that are either in standard input, <STDIN>, or environmental variables. Because of this the CGI can go further than the slower HTML link, which answers one client request at a time, leading to only one specific response at a time.

Instead, the CGI can permit the Web server to provide different documents based on the client's requests. More than this, the CGI permits totally new documents to be written "on-the-fly" so that customized client responses can be made. Typically, the user inputs his or her information via an HTML form. Before discussing that subject, however, a quick examination of HTTP headers is in order. A closer look at the headers will give us some clues as to how the CGI deals with data. MIME specifications for these headers are outlined in Appendix B.

HTTP Headers

The MIME specifications mentioned earlier, and explained in depth in Chapter 11, are used to create HTTP headers that let the client and server know what kind of data is being transferred between them. From the client, HTTP sends a request header based on the instructions found in the HTML file. The two basic methods to retrieve data from a server are "GET" and "POST."

The default method in HTTP is GET when a request method is not specified in the HTML document.

When GET is used, the information is sent to the server via the URL field. If POST is used, then the data is sent as a separate message once all the other HTTP request headers have been sent.

When the client has determined the method it will use to send the data, it builds an HTTP header to send to the CGI program on the server. This message is sent to the server, and there the CGI program in question is called up by the server. You are not restricted to sending only one header; you can also include other headers that contain additional information for the server or the CGI program.

The CGI program called up by the request then performs the task requested of it, taking commands from any form data present, and sends a message to the server concerning what kind of message should go back to the client. Between the two of them, the server and the CGI program, various HTTP response headers are created and sent to the client.

One of the ways the CGI program accomplishes this is by referring to itself as a non-parsed CGI program, or NP-CGI. This allows the response headers it creates to be sent straight through the server, simplifying and speeding things up by eliminating unneeded processing, or parsing, time. The other way a CGI program sends data is by creating only the minimum response headers necessary (usually Content Type headers) and sending them to the server where they are parsed.

NOTE
Parsing is the term used to describe the process that your computer goes through when preparing a program file for execution. When a computer parses a file, it goes though the file line-by-line, examining the syntax and looking for useful instructions that will cause it to do some task when the program is run.

Parsing a file can cause problems in HTML files, which are not meant to be parsed. When your computer reads these files, it could find all manner of instructions not meant for execution that could cause your computer to act up, or even crash.

Once all this is finished, the server will then decide if any additional headers need to be added to the response, and then sends it all to the client. This Content Type header is a common header that contains the file type of data being sent in between client and server.

HTML Forms

You should have a strong understanding of HTML form specifications, but the HTML form is the main way in which users will be passing information to your server, so we will go over the details involved. For a really in-depth tutorial on HMTL forms try

http://www.netscape.com/tutorials/forms.html

HTML forms start with the <FORM> tag. To handle specific data the <INPUT> tag defines how the data is gathered from the user on the page. The <SELECT> tag presents a choice to the user for data, like a multiple choice question on a test.

The <OPTION> tag is used to present each of the choices the user has. It is used with the <SELECT> tag. And, if the user is inputting text, the <TEXTAREA> tag creates a pane that will hold the user's data. This pane is scrollable. Each of these HTML elements is modified by it own attributes.

The HTML form itself sets up paired variables of name fields with value fields. The name variable is determined by the form, which is matched to the value variable supplied by the user. Once the user has supplied the information, the form has to understand what to do with the data. This is accomplished with the method and action attributes in the <FORM> tag.

There are several attributes in the <FORM> tag that deal with how the form will handle the data. These include

  application/x-www-form-urlencoded. 

Name/value pairs are the way in which data values are passed to the CGI from a form. The "name" comes from the name assigned by the programmer in the tag requesting input. This is paired with the input from the "value" in the same tag, which is given by the user.

The name/value pairs will be included in the data set in the order in which they appear in the form. The name fields are separated from the value fields with an = symbol and the white space in both the name and value variables is replaced with a + symbol. They are sent to the server as name=value, with each pair of name/value pairs separated by an & symbol. The format looks like this

name1=value1&name2=value2&name3=value3

or, with a real example

first=Bobby&last=Hull&street=1Ø63+Golden+Jet+Lane&city=Pointe+Anne&state=
Ontario&zip=CHI+BLA&phone=61Ø.555.117Ø&address=Send+In+Your+Address+

You may have noticed how long this can make the string being sent to the server. It is important to be aware of how your server handles long strings, so that information is not chopped off. The POST method has no limit, as it is just a continuous string of DATA from <STDIN>, like typing on a keyboard. If you type fast enough, your input gets stuck in a buffer. The GET method uses environmental variables, and it is limited to 255 characters.

These name/value pairs can be separated out using Perl. So, with name/value pairs, the name is how your server recognizes arriving data, while the value of that data is the value of the pair. The name/value system applies to all the types of data submission a user can make, from text entry to checkboxes to radio buttons.

All nonalphanumeric characters are replaced by a % symbol followed by the two hexadecimal digits that represent their ASCII code equivalent. Line breaks are signified as control/line feed %0D%0A. The nonalphanumeric characters most often used from your keyboard are symbolized by their decimal and hexadecimal equivalents, which are found in Table 10.1.

Table 10.1 Standard ASCII Characters and Their Decimal and Hexadecimal Equivalents

Character
Decimal
Hex
Tab
09
09
Space
16
20
"
18
22
(
40
28
)
41
29
'
44
2C
.
46
2E
;
59
3B
:
58
3A
<
60
3C
>
62
3E
@
64
40
[
101
5B
]
103
5D
\
102
5C
^
104
5E
{
113
7B
}
115
7D
|
114
7C
~
116
7E

There are other non-alphanumeric characters that can be encoded, as shown in Table 10.2.

Table 10.2 Non-Alphanumeric Character Encoding

Character
Encoding
?
%3F
&
%26
/
%2F
_=
%3D
#
%23
%
%25

The specifics of MIME/URL encoding can be found in RFC 1552, section 3 at

http://ds.internic.net/ds/dspg1intdoc.html

The tag that uses the POST method might look like this:

<FORM Method="POST" Action="http://www.my_server.com/cgi-bin/register.pl">

To collect specific information from the user, the <INPUT> tag is used. The various attributes for this are found in Table 10.3.

Table 10.3 Form Tag Input Options

Form Tag
Purpose
AlignUsed when an image is employed in gathering the data. The choices are "top," "middle," and "bottom," which define the relationship the image has to the text following it.
CheckedPresets a checkbox to include a checkmark. If this attribute is not included, the checkbox will be blank.
MaxlengthSets the maximum number of characters a user can input as text into a field. The default is unlimited, so you might want to restrict the length in every form, or risk your scripts becoming overwhelmed by a flood of text.
NameThe symbolic name used when transferring and identifying the output produced by your form.
SizeDefines the field width of the text box presented to the user. When the Size is less than the Maxlength, then the text field will be scrollable.
SrcIf an image is used, this identifies the source of the image file.
TypeThe kind of input format the user sees is defined. The choices are checkbox (the user can make multiple choices for data values), hidden (the values are defined by the form, not by the user), and image (the user selects an area of the image, then the x and y coordinates are sent with the name/value pairs).
PasswordUser supplies text that is hidden from view on the user's screen. Typically this appears as asterisks or dots.
RadioUser must choose one selection only from a list. This should not be confused with checkboxes where the user can select any and all of the choices presented.
ResetClears the form of all selections for re-entry by the user.
SubmitUsed by the user to submit the form data to the server.
textAttribute that uses the Size and Maxlength provisions to create a single line field for user input. Text is a single line text input field. If this is the only area for user input, then a submit button is not required. Simply pressing the Enter or Return key on the user's keyboard will send the data on. If more than a single line is needed the <TEXTAREA> tag should be used.
ValueUsed with the radio button, this sets the value for the selection available to the user.

Using these tags we can create an in-depth form that asks for user background data and tastes, which the CGI can then enter into your databases. Some of these tags were used to gather information in our guestbook script. After these tags are illustrated with examples, they are combined to a full form page that can be adapted to gather user data on your site.

<INPUT Type="hidden" Name="address" Value="new_user@my_server.com">
     Your Name: <INPUT Type="text" Name="user-name" Size="2Ø" Maxlength="3Ø">
<H2>Guess the secret word contest</H2><BR>
Try and guess our secret word to win a prize!<INPUT Type="password" Name="word_guess">
Where did you hear about out site?:<INPUT Type="radio" Name="Internet" Value="online">The Internet
     Please check all the mediums you use:<INPUT Type="checkbox" Name="television">
     Where are you from?<INPUT Type="image" Src="http://www.my_server.com/images/map.gif" 
	 Name="user_location" Align="top">
     Do you like our site?:<INPUT Type="radio" Name="site_feedback" Value="Yes" Checked>Yes
     <INPUT Type="radio" Name="site_feedback" Value="No">If not, then what don't you like?
<INPUT Type="text" Name="user_suggestions" Size="60" Maxlength="1ØØ">
     <INPUT Type="Submit" Value="Send it in!">
     <INPUT Type="Reset" Value="Do it again!">

To allow the user to choose from a list of options on a form, the <SELECT> tag is used. Although the default for selection is only one choice for the user, this can be modified using the Multiple attribute. The <OPTION> tag is used to define each choice available to the user. The attributes work as follows:

The <OPTION> tag is used in tandem with the <SELECT> tag. It has two attributes:

Typically these two tags might look like this:

Please choose one of our products as a gift:
     <SELECT Name="product_gifts">
<OPTION>Lead Pencil 2ØØØ
<OPTION>Staple-O-Matic!

The last tag available in creating forms is <TEXTAREA>, which is used to define the size of a text field for user input. This field is scrollable. The attributes that apply here are

This creates a better format for a text area in which to input user feedback than the previous example, which created only a text line. To change this, we can use a <TEXTAREA> tag that looks like this:

<INPUT Type="radio" Name="site_feedback" Value="No">If not, then what don't you like?
<TEXTAREA Name="user_suggestions" Rows="4" Cols="5Ø"><BR>

     </textarea>

When you combine all these tags, you get a better understanding of how the form works. This example combines the previous examples in a form that presents a sample of each of the elements. It is for collecting information about new users of your site. Screen representations of what the form would look like is shown in Figure 10.1 right after the script.

Figure 10.1 : Sample of different input features on an HTML form.

<HTML>
<! - - Example of form elements and attributes - - >
<HEAD>
<TITLE>
The New User Profile Form
</TITLE>
</HEAD>
<BODY>
<P>
We Want to Know More About You!<BR>
<HR>
<BR>
<FORM Method="POST" Action="http://www.my_server.com/cgi-bin/register.pl">
<P>
<INPUT Type="hidden" Name="address" Value="new_user@my_server.com">
Your Name: 
<INPUT Type="text" Name="user-name" Size="2Ø" Maxlength="3Ø">
<INPUT Type="hidden" Name="subject" Value="new_user_info"><BR><BR>
Where did you hear about out site?:
<INPUT Type="radio" Name="where" Value="online">The Internet
<INPUT Type="radio" Name="where" Value="television">On Television
<INPUT Type="radio" Name="where" Value="friend">A Friend<BR><BR>
Please check all the mediums you use:<BR>
<INPUT Type="checkbox" Name="television">Television<BR>
<INPUT Type="checkbox" Name="internet">The Internet
<INPUT Type="checkbox" Name="radio">Radio
<INPUT Type="checkbox" Name="print">Magazines and Newspapers
<BR><BR>
Please choose one of our products as a gift for fillin gout this form:
<SELECT Name="product_gifts">
     <OPTION>Lead Pencil 2000
     <OPTION>Staple-O-Matic!
     <OPTION>Glue Master Sticky Tape
     <OPTION>Log!
</SELECT><BR><BR>
Please choose your favourite Web browser:
<SELECT Name="browsers" Multiple Size=4">
     <OPTION Value="straight">AOL
     <OPTION Value="straight">Explorer
     <OPTION Value="hip">Navigator
     <OPTION Value="hip">Mosaic
     <OPTION Value="weird">Lynx
</SELECT><BR><BR>
<H3>Guess the secret word contest</H3><BR>
Try and guess our secret word to win a prize!<INPUT Type="password" Name="word_guess">
<BR><BR>
Do you like our site?:
<INPUT Type="radio" Name="site_feedback" Value="Yes" Checked>Yes
<INPUT Type="radio" Name="site_feedback" Value="No">If not, then what don't you like?
<BR><BR>
<TEXTAREA Name="user_suggest" Rows="4" Cols="5Ø">
Thanks for your thoughts!
</TEXTAREA><BR><BR>
Where are you from?<INPUT Type="image" Src="http://www.my_server.com/images/map.gif" 
Name="user_location" Align="top"><BR>
Now that you're done, let us know by sending us your info.<BR><BR>
<INPUT Type="Submit" Value="Send it in!">
<INPUT Type="Reset" Value="Do it again!">
</FORM>
<P>
Thanks for registering!
<HR>
</BODY>
</HTML>

NOTE
To effectively use an image map like the one used in our new user form example, you must define your graphic, or image map, properly. You can do this by using a Perl script, like the one for defining image maps discussed in Chapter 6 Please see that script for details

Environmental Variables

If you want to supply Web browsers with truly dynamic entities that use the CGI to do more than just retrieve other static HTML documents, like an educated gopher, then running executables from the server's side of things is a must. The CGI must also be able to take specific user's data, input for a specific task. Environmental variables are one of the ways to do this.

To understand how environmental variables differ from regular variables, it is important to know about scope. Scope refers to the extent to which a variable is understood. Most variables are redefined each time a program is run, most often only in a certain block of code within that program. This is the common, garden-variety file variable. When you get to environmental variables, however, their value stays the same throughout each script and application started within each CGI, or Perl window, or shell. Their value is based upon the first document opened by the browser.

This has various implications, not the least of which is the ability of different applications, or processes, to share the same environmental variables across the same shell.

To illustrate environmental variables we will use some Perl scripts. Before getting into the details of environmental variables, however, you should know that not all environmental variables are carried on every system. To check and see which environmental variables are supported by your NT server, you can use System Control Panel. The full list of System Environmental Variables is listed here, as well as two text boxes beneath, which can be used to create new environmental variables for your system.

You can also use this Perl script called env_var.pl to print out the environmental variables available to your CGI scripts, as in Figure 10.2.

Figure 10.2 : Environmental variables available to your server.

#!/usr/bin/perl
     #env_var.pl
     push(@INC, "/cgi-bin");
     require("cgi-lib.pl");
     print &Printheader;
     print "<HTML>\n";
     print "<HEAD><TITLE>Environmental Variables Available to the CGI</TITLE>
</HEAD>\n";
     print "<BODY>\n";
     print <<"eop";
     <CENTER>
     <TABLE border=1 cellpadding=12 cellspacing=12>
     <TH align=left><H2>Environmental Variable</H2>
     <TH align=left><H2>Contains</H2><TR>
     eop
     foreach $var (sort keys(%ENV)) {
          print "<TD> $var <TD> $ENV{var}<TR>";
     }
     print <<"eop"
     </TABLE>
     </BODY>
     </HTML>
     eop

This next script also can be used for determining what environmental variables are on your server. Instead of just displaying them on your browser, it can be more productive to have a text list of them. You can do this by e-mailing this list to yourself using a Perl script.

There are two environmental variables that are useful to collect user input; QUERY_STRING and PATH_INFO. To get data into these variables there are two methods available. Information can be added directly to an HTML link by the programmer, as with

<A HREF=http://www.my_server.com/cgi-bin/name.pl?data-request>Click here to read the member's names.</A>

where all that follows the question mark is output into the QUERY_STRING variable. This is true for all data that follows the first question mark in an URL of an <A HREF> tag. Data can be input into the PATH_INFO variable in a similar way:

<A HREF=http://www.my_server.com/cards.pl/bet=1ØØ/cards=5>Click here to start your game with $1ØØ.ØØ</A>

CGI will start up the program cards.pl and place everything after that field into PATH_INFO.

Both of these variables can be modified by using different methods inside a <FORM> tag. A form with METHOD=GET will place data into the QUERY_STRING variable. An example of this might look like this:

<FORM METHOD=GET ACTION="http://www.my_server.com/cards.pl">
First Card<INPUT NUMBER = "First Card"><BR>
Second Card<INPUT NUMBER = "Second Card"><BR>
INPUT TYPE=SUBMIT VALUE="Submit"
</FORM>

All the information the user inputs into the "First Card" and "Second Card" prompts will be placed in the QUERY_STRING variable. To double-check the entries, the CGI may echo back the data with a script like this:

#!/usr/bin/perl
     # cards.pl
     print "Content - type: text/html\n\n";
     print "You picked \"$ENV{QUERY_STRING}\" as your cards. Good choice!\n\n";
     exit;

This script will not print plain converted numbers, because they will still be encoded with spaces as "+", and so forth. You might try a regular expression to carefully remove these extra symbols.

In a script like this, the users' choices are shown back to them on their Web browser. Their input has been appended to a new URL:

http://www.my.machine/cards.pl?First+Card=TenofHearts&Second+Card=AceofSpades 

where the user had entered "Ten of Hearts" as her first choice and "Ace of Spades" as her second. The CGI has taken this input and appended it to a new Web page. Please note that the input here has been encoded and decoded so that certain characters, such as spaces, are translated before they proceed to the gateway script.

"PATH" is another important environmental variable. This is the variable that lets your CGI programs know how to find the other programs and files it may need. When the Perl interpreter looks for files referenced in a CGI program, it uses the PATH environmental variable to define where it should search. PATH is also used by your server's system to find files outside of the CGI program it is running. Making sure that PATH is properly defined is very important.

This is also a good place to check whether you are having problems with CGI scripts that depend on other files for successful execution. The different directories available to PATH are separated by a colon. An example of a defined PATH environmental variable may look like this:

PATH=/usr/bin/:/cgi-bin/perl/:/usr/local/public/:/bin:/perl/usr/local:

Whatever makes use of PATH starts on the left and looks in the first directory listed, and then it proceeds down the list. To speed up operations, list the directories judiciously and in the order of most use to least. The period at the end of the PATH values is not to terminate the list, but is a command to also search the current directory where the CGI program is located.

The only real problem that exists with environmental variables is that the gateway program could run extremely long strings through a shell script that has built-in limitations for string lengths. You might encounter this as the "running out of environment space" error. To avoid this you can run your data through standard input, or <STDIN>.

Standard Input <STDIN>

Remembering and using our METHOD=GET means of transferring data to the gateway, standard input can be used to modify the Perl script:

#!/usr/bin/perl
     # cards.pl
     $user_input = read(STDIN, $_, $ENV{CONTENT_LENGTH});
     print "Content - type: text/html\n\n";
     print "You picked \"$user_input\" as your cards. Good choice!\n\n";
     exit;

where the output to the user would be the same as in the previous example, except that the URL would not display the encoded QUERY_STRING after the script name, as

http://www.my_server.com/cards.pl

Although the METHOD=GET tag is useful, the METHOD=POST tag is even more so, because there is no restriction on the amount of information that it can pass to the gateway program. An example of using METHOD= POST follows:

<FORM METHOD=POST ACTION=http://www.my.machine/cards.pl/screen=subscribe>

where the user's information will go into both <STDIN> and the PATH_INFO variable.

Overall there are several ways to get data to the gateway program, which creates several software solving strategies to add to your bag of tricks. To make the most of the CGI, a full understanding of the set of available environmental variables is valuable. Environmental variables fall into two distinct categories: server meta-information-where the variable is independent of the client request and keeps the identical value regardless of the client's request, and the other-which is client-specific, and where the value is dependent on the client request.

It should be noted that some client-specific environmental variables can be defined by the server to which the client's request was sent.

Server Meta-Information Environmental Variables

These environmental variables are set by the server itself, and do not rely on the CGI to define them. They are always accessible by the CGI. The list of meta-information environmental variables is found in Table 10.4.

Table 10.4 Meta-Information Environmental Variables

Environmental Variable
Value
SERVER_ADMINThe e-mail address of the person responsible for all the Web-related concerns on your server, which is probably yourself.
SERVER_SOFTWAREIdentifies the name and version of the Web server. Its output comes in the form name/version.
SERVER_NAMESignifies the server's hostname, DNS alias, or the IP address.
GATEWAY_INTERFACEThe server CGI type and the revision level. It is output as CGI/revision.

Client-Specific Environmental Variables

These environmental variables are also known as request-header dependent, because they rely on the requests from the client to give them a value. The client-specific environmental variables are listed in Table 10.5.

Table 10.5 Client-Specific Environmental Variables

Environmental Variable
Value
AUTH_TYPEUsed to show the protocol-specific authentication method for validating user access. This is used only if the server supports user authentication.
CONTENT_LENGTHThe length of the content buffer as announced by the client in its request. It is used by the CGI to know when to cut off the data stream, which it does by reading an input buffer.
CONTENT_TYPEThe type of content the client has queried, like HTTP, POST, and PUT.
HTTP_REFERER/REFERER_URLThe URL from which the script was invoked.
HTTP_REQUEST_METHODSimply the HTTP methods request header remade into an environmental variable. The values here can range from the familiar GET and POST methods, to HEAD, PUT, DELETE, LINK, and UNLINK.
HTTP_USER_AGENTIdentifies the Web browser that the client uses to send its request. Its output is software/version library/version.
METHOD=GET (POST) ACTION= http://machine/path/
programname/extra-path-info
This was explained earlier. The supplementary data is put into PATH_INFO.
PATH_INFOWhere data from the METHOD=GET(POST) winds up.
PATH_TRANSLATEDWhere the server takes the virtual path found in PATH_INFO and translates it into a physical path.
QUERY_STRINGThe client data that follows the ? in an URL that is sourced by this particular script.
REMOTE_ADDRIdentifies the IP address of the client.
REMOTE_HOSTWhere server sets the client's hostname. If this data is not supplied the server should set REMOTE_ADDR instead because this variable holds the same value as REMOTE_ADDR.
REMOTE_IDENTUsed for logging. It holds the remote user's name.
REMOTE_USERThe user's authenticated user name.
REQUEST_METHODWhere the METHOD=GET(POST) information is housed.
SCRIPT_FILENAMEThe value of the full path to the CGI script.
SCRIPT_NAMEUsed to reference the virtual path the executable script takes. Handy for self-referencing URLs like ISINDEX queries.
SERVER_PORTIdentifies the port to which the client request was sent.
SERVER_PROTOCOLTakes the protocol that the client is using to make its request and outputs it as protocol/revision.

Conclusion

Looking into the CGI has lead us to how the CGI handles data from your Web pages using the HTTP protocol, MIME headers, and Perl scripts. These early explorations provoke even more questions about the CGI and how it works, which are presented in Chapter 11.