Chapter 19 Getting the Most Out of HTML with CGI

How CGI Works
Choosing a CGI Programming Language
CGI Environment Variables
Understanding Input/Output with CGI
A Trivial CGI Example
A CGI Example Using HTML and C
Testing CGI Systems
CGI Toolkits and Applications
Summary

The Common Gateway Interface (CGI) is a standard that governs how external applications are interfaced with Web servers. The reasoning behind the invention of CGI is simple: without it, the HTTP specification and all Web servers would have become a patchwork of ad hoc extensions.

CGI provides a way to write programs that will run on the server when they are invoked by the client Web browser through HTML code. These programs can be written in the C language, but C is just one possibility. For a discussion of other options, see the section later titled "Choosing a CGI Programming Language."

At this point, the astute reader might have noticed that there are no fewer than four areas of programming prowess needed to get this dog to hunt: CGI, HTTP, HTML, and C (or some other programming language). And just for good measure, you might want to throw in the Win32 API and SQL, depending on what your Web program will do after you finish laying the necessary foundation.

And as if that is not enough, you'll want to consider writing your CGI program using the newer ISAPI (Internet Services Application Programming Interface) standard for better performance. The fundamentals of ISAPI are very similar to CGI, except that ISAPI programs are compiled as DLLs rather than EXEs, and they use pointers to memory blocks instead of stdin/stdout. This book does not substantially cover ISAPI; however, everything you learn about CGI can be applied to ISAPI.

The reason you will want to run this challenging gauntlet is that CGI and ISAPI open the door to great new opportunities. CGI/ISAPI programs are often associated with Web forms. When the user finishes filling out an HTML form and submits it, the data stream that is returned to the server is called the form data. Keep in mind that just because you send a blank HTML form to the client Web browser, nothing is going to happen with the form data when it is submitted unless you, the Webmaster, make it happen. The form data would just land in the bit bucket if not for CGI or ISAPI.

CGI/ISAPI is a necessity if you want to save the form data into a database on the server, for example. Or perhaps the form data should be e-mailed to the Webmaster or some other party. Maybe the intent of the form is to have some data faxed or e-mailed back to the client. Or the form could be used to obtain a database query from the user, which is then sent to a database engine before the formatted results are finally returned to the client as an HTML file. These are just some of the possibilities available to anyone brave enough to master the details of client/server Web programming with CGI/ISAPI. (There are tools that make it possible to do much of this without programming; I will tell you about several of them later.) Although all these things can be accomplished with traditional programming, doing it on the Web makes applications platform-independent, distributed, easier to develop, and easier to update.

The purpose of this chapter is to give you the fundamentals of CGI, and show you two simple CGI examples and one sophisticated and practical example. Because all the source code is on the CD-ROM, programming knowledge is not required, but let's not kid ourselves-it would be very helpful. If you don't yet know about programming, you might just want to skim this chapter to get a glimpse of the possibilities. On the other hand, if you want to utilize CGI on your Intranet, this chapter will be a guiding light.

Note

CGI programs are also called CGI scripts or applications. The reason they are called scripts is that they can be written in Perl, or at the command shell, in which case they are interpreted rather than compiled. When C or Visual Basic is used for CGI, the terms CGI program or CGI application are preferred to the term CGI script because those languages are not interpreted in the traditional sense of script files. Even shorter, some people just refer to all such things as CGIs.

CGI scripts are not to be confused with a new product from Microsoft named Visual Basic Script or a new product from Sun and Netscape named JavaScript. Neither JavaScript nor VBScript are necessarily associated with the Common Gateway Interface.

How CGI Works

Figure 19.1 shows a high-level overview of how CGI forms-processing works. There are many other details of HTTP and TCP/IP than what are shown here, but I omit those in order to concentrate on the basic concepts of CGI.

Figure 19.1 : How CGI processes Web forms.

The annotated steps corresponding to Figure 19.1 follow. (I assume you are familiar with the way an HTML file gets created and displayed in the Web browser, which is the point at which step 1 begins.)

After the user has entered the form data in the Web browser, he chooses the Submit button, which is coded between the <FORM> and </FORM> tags in the HTML file. The Submit button is a link to a CGI (or ISAPI) application on the server. For more information about HTML forms, please review Chapter 5, "What You Need to Know About HTML."
The browser uses the POST method of the HTTP protocol to send the form data to the server. The GET method could also be used, but POST is preferred for form data.
The data travels through the Intranet or the Internet and arrives at the server, which then passes the data to the CGI application.
In addition to parsing the form data and processing it as desired, the CGI application must write the HTML response that will be sent back to the client. The CGI specification says that the Web server should read the stdout device of the CGI application.
The server adds appropriate HTTP header information and sends the output of the CGI application back through the network as an HTML response file, which the Web browser receives in memory.
The browser interprets the HTML code and displays the results on-screen for the user. At a minimum, this file should usually contain some notification that the data was processed by the server, followed by a hyperlink to take the user back to the HTML page he was on before choosing the link to the form page. In other words, the file puts the client back where he was before he came to Step 1 of this list.

Caution

Allowing any person with a Web browser to execute applications on your server is a security concern. Ensure that all the CGI applications are isolated to one directory and that no one else has access to that directory. With Microsoft IIS, all CGI applications are kept in a directory called scripts under the server root (by default). Also, be careful about using public-domain CGI applications that have not been tested over time to be secure.

Choosing a CGI Programming Language

Nearly all Web servers conform to the CGI 1.1 standard, which is a protocol agreement between your application and the Web server. With most Web servers, CGI applications must be console-mode programs located within the HTTP data directory tree. By saying console-mode, I mean that CGI applications cannot be Windows API programs or GUI programs. However, Microsoft IIS is one of several HTTP servers that takes advantage of ISAPI. ISAPI permits you to write Windows DLLs for your CGI applications, and therefore you do have access to the full Win32 API, includ-ing ODBC functionality.

Of course, it is very unlikely that you would want to write a GUI CGI or ISAPI application, because that would imply that you (as the Webmaster) were going to sit at the Web server waiting to interact with every client that sent data to the server. Remember, the client never sees the CGI or ISAPI program-they will only see the HTML output of the program. Nearly all CGI and ISAPI applications process the form data as background tasks because there could be hundreds of transactions per minute (depending on how popular your Web server becomes).

Another major advantage of ISAPI over CGI is performance. ISAPI passes memory blocks between the server and the application. CGI relies on launching a new program to process the form data from every client, and it uses environment variables and disk files to pass data back and forth.

In UNIX, which is where the Web got its start, CGI applications are frequently written in C, Perl, or the UNIX shell command language. In Windows NT, you can use C/C++ or Perl with most servers and Visual Basic with some. (Well, you can use Visual Basic with any CGI Web server. I'll show you how in the next chapter.) Many Windows NT Webmasters run a public-domain Perl 4 interpreter for CGI and Web site statistics. Perl 5, which includes some nice object-oriented extensions, has recently arrived on the scene, and you should definitely give it consideration as a CGI tool on your Intranet. (See Chapter 29 for more information about Perl.)

Both Perl and C have their advocacy camps. Perl offers great file and string handling, and the code is fairly easy to write and modify. On the other hand, because C is a compiled language, it offers better efficiency, both from the optimization of the compiled code and the fact that the interpreter is not launched for every client submission of form data. In addition, many claim that compiled programs provide better security than scripts because hackers can more easily modify the text of a script just before its execution.

In this chapter, I use some DOS command language, some C/C++, and of course, some HTML. The first example uses a very simple DOS batch file. The second example is written in C. The third example, presented in the next chapter, is a practical application that shows you how to put C++ and Visual Basic together to build an HTML form that can save data into an ODBC database on the server.

It's okay if you don't plan to learn programming. Most of the examples are already compiled on the CD-ROM and will run without your knowing how to program. In this chapter and the next, I have used Visual C++ 4.0 and Visual Basic 4.0.

CGI Environment Variables

The server uses environment variables to pass information to the CGI application. The environment variables are set after the HTTP GET or POST request is received by the Web server (see the next section) and before the server executes the CGI application. Most environment variables are fairly standard from server to server, but be aware that some differences exist. Nothing stops the vendor of a Web server from adding nonstandard environment variables for use by their customers.

The CGI standard specifies certain environment variables that are used for conveying information to a CGI script. The following subset of those environment variables is supported by most HTTP servers. If this list seems confusing, don't despair; most CGI programs don't need to use all these environment variables:

CONTENT_LENGTH-The length of the content as given by the client.

CONTENT_TYPE-For queries that have attached information, such as POST and PUT, this is the content type of the data.

GATEWAY_INTERFACE-The revision of the CGI specification to which this server complies. The format for this variable is CGI/revision.

HTTP_AccEPT-The MIME types that the client will accept. The format for this variable is type/subtype.

PATH_INFO-The extra path information, as given by the client. This variable enables scripts to be accessed by their virtual pathname.

QUERY_STRING-The information that follows the ? in the URL that referenced this script. This is the query information.

REMOTE_ADDR-The IP address of the remote host making the request.

REQUEST_METHOD-The method with which the request was made, such as GET, HEAD, and POST.

SCRIPT_NAME-A virtual path to the script being executed.

SERVER_NAME-The server's hostname, DNS alias, or IP address.

SERVER_PORT-The port number to which the request was sent.

SERVER_PROTOCOL-The name and revision of the information protocol this request came in with. The format for this variable is protocol/revision.

SERVER_SOFTWARE-The name and version of the server software answering the request. The format for this variable is name/version.

Other HTTP headers received from the client are available in environment variables of the form HTTP_*. For instance, the User-Agent header value is available in HTTP_USER_AGENT. Note that due to the rules of names in certain filesystems, - (dash) in the header names is replaced by _ (underscore) in the corresponding environment variable names. An understanding of the HTTP specification is probably a prerequisite to a full comprehension of the purpose of some of these environment variables.

Understanding Input/Output with CGI

The CGI application accesses information about how it was invoked through the environment variables initialized by the Web server; it reads any information supplied by the client (in a POST request) through stdin and sends output to the client through stdout. This process is pretty simple to understand, once you get the hang of it. (Isn't that how everything works?)

`GET` Versus `POST`

GET and POST are two HTTP methods of sending form data to the Web server. When you write a form in HTML, you should specify which HTTP method the browser will use when the form data is sent back to the server.

Listing 19.1 is a short block of HTML code that comprises a complete form. The line numbers are not a part of the HTML code. Note in line 2 that the form is using Method="POST". You could just as easily change this line to "GET". The main difference between GET and POST is that the CGI application will receive the POST data by reading the stdin device, whereas GET data would be received on the command line and in the QUERY_STRING environment variable.

Listing 19.1. A short and sweet HTML form.

1. <HTML><HEAD><TITLE>Simple Form</TITLE></HEAD><BODY>
2. <FORM Method="POST"
3. Action="http://domain\cgi-bin\prog.exe">
4. Your Name: <INPUT Name="user" SIZE="30"><P>
5. <INPUT Type=submit Value="Click here to send">
6. </FORM></BODY></HTML>

Usually, your forms will be much more complex than the one in Listing 19.1, which only contains one input field. Because many operating systems impose some limit on the length of the command line, it is usually best to use POST. On the other hand, if you know your form data is small, you can use GET.

CGI Command Lines

In the case of a GET request (or ISINDEX), the form data will be on the command line and in the QUERY_STRING environment variable. The command line will contain a question mark after the application name as the delimiter that marks the beginning of the form data. Suppose you change the HTML code in Listing 19.1 to use Method="GET", and the user types in the string User's Name in the text field named user.

The command line of the CGI application would look like the following:

\cgi-bin\prog.exe?user=User%27s+Name

The QUERY_STRING environment variable would look like the following:

user=User%27s+Name.

Your first observation is naturally going to be that this stuff looks somewhat strange. Your second observation is, hopefully, that the QUERY_STRING data appears somewhat more friendly looking than the command line data. To figure out what's going on with all those funny characters, recall from line 4 of Listing 19.1 that the input field was named user. Now that label is being sent back to you as the first word of QUERY_STRING. Everything after the equals sign in the QUERY_STRING represents the data that the user typed into that particular field. Because more than one field could be used, each one must be named uniquely in the HTML form and in the QUERY_STRING data that is sent back to the CGI application.

Remember that the example assumes that the user typed User's Name with no period on the end. (If he had typed a period, that would be another story-more about that later.) Checking the preceding QUERY_STRING above, notice that you almost have exactly what the user typed, except for the %27, which replaces the apostrophe, and the plus sign, which replaces the space character. HTTP calls for these translations because of operating system conventions for reserved characters in filenames. The same mechanism is used by HTTP to pass URLs, so the server needs to be able to distinguish between the two.

The percent sign is a hex escape character, and the two digits that follow it are used to indicate the ASCII code of a reserved character. The apostrophe sign has a hexadecimal code of 27. If the user typed a period, it would be replaced by %2E. Not all servers encode these characters because whether they are reserved or not depends on the operating system. For example, the apostrophe and the period are legal in some UNIX systems. The plus sign is simply the convention for encoding space characters. Another common translation is the dash character encoded as an underscore.

Finally, if there were other input fields in the HTML form, they would follow the data of the user field. Each name=value pair would be separated by an ampersand (&) character.

Summary of Seven Funny Characters

Table 19.1 is a quick review of the special characters you will come across in CGI. Some of these conventions make up what is known as URL-encoding.

Table 19.1. Special characters in CGI.

Special Character	Description
`+` (plus sign)	Used in place of space characters in user input.
`=` (equals sign)	Used to separate the field name from the field value.
`?` (question mark)	Used to mark the beginning of the form data on the command line.
`_` (underscore)	Used to replace dash characters.
`%` (percent sign)	Used to encode reserved ASCII characters, followed by two hex digits.
`&` (ampersand)	Used as the boundary between name/value pairs for each field in the HTML form.
`#` (number sign)	Used in URLs to indicate a section within an HTML document, sort of like a bookmark. This character is not strictly related to CGI; it can be used in any URL to an HTML document that contains an `<A>` tag with a `Name` attribute (called an anchor).

Reading from stdin

Recall that QUERY_STRING is not used for the POST method. Because POST is probably more typical, you need to understand how to read stdin to retrieve form data. (This is another reason why CGI programs are console-mode rather than GUI-GUI programs don't have a concept of stdin.)

First, the server will set the CONTENT_LENGTH environment variable to tell how many bytes to read from stdin. You must not read more than that amount. Then the POST-invoked program will read and parse the form data from the stdin device instead of the QUERY_STRING environment variable.

Whether you use POST or GET, have some standard routines in C or Perl to help you perform standard decoding. The C programs in this chapter include several useful functions for that purpose. Feel free to customize them and use them in your own programs. They are public-domain.

Writing to stdout

When the CGI application is done parsing and processing the input data, it must send a reply to the server. The server will forward the reply to the client after applying a header as per the rules of the HyperText Transfer Protocol.

The server will be listening to the stdout device of the CGI application while the latter is executing. The CGI program can generate HTML code on-the-fly or refer the server to another document that it would like to have sent instead. Either you want to compose an HTML document on-the-fly, or you want to refer to another document through HTTP, FTP, or Gopher anywhere on the Web. See the following section titled "A CGI Example in HTML and C" for all the details about composing an HTML response document from within the CGI application.

If you want the server to send another document that already exists, you can use the Location code. In C, you would execute a printf statement that looks something like the following:

printf("Location: ftp://FQDN/dir/filename.txt\n\n");

Because you must follow the header information with a blank line, the example has two newline characters.

Tip

It is very important that your CGI program prints out an extra blank line after the HTTP header and before the contents of the document that follows the header. A missing blank line is a common source of trouble when trying to debug CGI systems.

How to Learn More About CGI

The granddaddy of all CGI information centers on the Internet is ncSA, the National Center for Supercomputing Applications at the University of Illinois. Full details of how to write CGI scripts are given in the CGI specification, which can be found online at http://hoohoo.ncsa.uiuc.edu/cgi/. You will find that ncSA has CGI material at all levels from beginning to advanced, as well as a CGI test suite where you can try the programs and see the code. At the time I am writing this, Version 1.1 is the latest CGI specification. It is not available as a single document, but consists of several hyperlinked pages maintained at ncSA.

For further information about CGI, check out these other resources:

One of the best CGI and HTML documents available anywhere on the Internet is written by Michael Grobe at the University of Kansas. You'll find "An Instantaneous Introduction to CGI Scripts and HTML Forms" at the following URL:
http://kufacts.cc.ukans.edu/info/forms/forms-intro.html
For an introduction to HTML forms and CGI, see this URL (case-sensitive):
http://www.utirc.utoronto.ca/HTMLdocs/NewHTML/htmlindex.html
David Robinson has written an independent and detailed version of the CGI specification. Unlike the ncSA specification, his version exists as a single document (which makes it much easier to print), and it gives a description of all CGI environment variables. See this URL:
http://www.ast.cam.ac.uk/%7Edrtr/
Whether you want to post a question about a CGI roadblock you need help with or just pick up tips by reading the threads of others, the CGI newsgroup is definitely the place to be:
comp.infosystems.www.authoring.cgi.
Last but not least, don't forget to visit www.yahoo.com. Select Computers/WWW/CGI and browse the many resources available.

A Trivial CGI Example

There is a standard MIME type for plain ASCII text, Content type: text/plain. This MIME type is useful in a trivial but interesting example of CGI, which is often used as proof that the Web server and CGI are installed and running properly. The idea is to invoke a DOS batch file that echoes the values of the CGI environment variables on the server back to the Web browser. All you need to do is save the following text into a file named trivial.cmd (or copy it from the CD-ROM) in your cgi-bin or scripts directory (as configured in the Web server):

@echo off
echo content-type: text/plain
echo.
set

The set statement in this simple program prints the values of all the HTTP environment variables. The output is directed back to the Web browser. Now just write a line in your home page that links to trivial.cmd or create a new HTML file such as the following one, which is named trivial.htm on the CD-ROM:

<HEAD>
<TITLE>Trivial CGI Test</TITLE>
</HEAD>
<BODY>
<form action="trivial.cmd" method="POST">
<H2>Press the button to run the trivial CGI test.</H2>
<input type="submit" value="Go">
</FORM>
</BODY>
</HTML>

A CGI Example Using HTML and C

This section covers a complete CGI transaction, from server to client, back to server, and back to client. This example serves as a template from which you could build a more sophisticated CGI application. The CGI system you are going to build here starts with an HTML file that contains a form. When the user submits the form data, the server will determine that the Action attribute for the form refers to a CGI application. The server will start the application and send it the form data on stdin. Then the server will listen for stdout from the CGI application.

The CGI program is written in C. The program will show you how to retrieve the form data, parse it, and send back an HTML document. The HTML response is constructed within the CGI application because you should embed part of the form data in your response. You don't always have to create HTML on-the-fly from inside the CGI program, but doing so will make your Web pages more dynamic.

The Data Entry Form

To demonstrate CGI, you need to start with an HTML page that contains a URL pointing to a CGI application. Figure 19.2 shows how the data entry form appears in the Web browser as the user is filling it out.

Figure 19.2 : The data entry form that the user fills out.

Listing 19.2 shows the HTML code that gets the ball rolling with the sample CGI program. This file and the following C program are available on the CD-ROM if you want to experiment. Note that you will want to change the URL in the FORM ACTION variable to refer to your site (or use localhost).

Listing 19.2. The HTML code that creates the form.

<HTML>
<HEAD>
<TITLE>CGI Application Example</TITLE>
</HEAD>
<BODY>
<H1>CGI Application Example</H1>
<hr>
This is an example of a simple CGI application 
handling the data from an HTML form.
<BR>
<FORM ACTION="http://www.hqz.com/scripts/cgisamp.exe" METHOD="Post">
Please enter your name: <INPUT NAME="name" TYPE="text"><p>
<input type=submit value="When done, click here!">
</FORM>
</BODY>
</HTML>

The C Code

Before getting to the C program that will process the form data, consider the output of the C program. Listing 19.3 is the HTML code that is sent back to the client after the server obtains it from stdout of the CGI application.

Listing 19.3. The HTML code that is written to stdout by cgisamp.c.

<HEAD><TITLE>Submitted OK</TITLE></HEAD>
<BODY><h2>The information you supplied has been accepted.
<br> Thank You Scott</h2>
<h3><A href="http://www.hqz.com/cgisamp.htm">
[Return]</a></h3></BODY>

Figure 19.3 shows the browser on the client side after the CGI application has finished processing the form data. Note in Figure 19.3 (and Listing 19.3) that the HTML response sent by the CGI application is customized for each set of form data; it includes the name that the user supplied.

Figure 19.3 : The result of the CGI application as seen by the client.

Listing 19.4 shows the complete C program, called cgisamp.exe, which is executed by the server when the client submits the form data. The following is a quick list of the five functions in cgisamp:

strcvrt-Converts all occurrences of one character to another within a given string.
TwoHex2Int- Called when a percent character marks an escape code.
UrlDecode-Expands all the escape codes by calling TwoHex2Int.
StoreField-Retrieves field/value pairs from the form data.
main-Reads the form data from stdin and writes the HTML response to stdout.

Listing 19.4. The CGI application written in C language (cgisamp.c on the CD-ROM).

/***************************************************************************
 *  File: cgisamp.c
 *
 *  Use: CGI Example Script.
 *
 *  Notes: Assumes it is invoked from a form and that REQUEST_METHOD is POST.
 *  Ensure that you compile this script as a console mode app.
 *
 *  This script is a modified version of the script that comes with EMWAC
 *     HTTPS.
 *
 *  Date: 8/21/95
 *  Christopher L. T. Brown  clbrown@netcom.com
 *
 ***************************************************************************/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <io.h>

char InputBuffer[4096];
static char * field;
static char * name;

/* Convert all cOld characters   */
/* in cStr into cNew characters. */
void strcvrt(char *cStr, char cOld, char cNew)
{
       int i = 0;

       while(cStr[i])
       {
              if(cStr[i] == cOld)
                     cStr[i] = cNew;
              i++;
       }
}

/* The string starts with two hex */
/* characters.  Return an integer */
/* formed from them.              */
static int TwoHex2Int(char *pC)
{
       int Hi, Lo, Result;

       Hi = pC[0];
       if('0' <= Hi && Hi <= '9')
              Hi -= '0';
       else if('a' <= Hi && Hi <= 'f')
              Hi -= ('a' - 10);
       else if('A' <= Hi && Hi <= 'F')
              Hi -= ('A' - 10);

       Lo = pC[1];
       if('0' <= Lo && Lo <= '9')
              Lo -= '0';
       else if('a' <= Lo && Lo <= 'f')
              Lo -= ('a' - 10);
       else if('A' <= Lo && Lo <= 'F')
              Lo -= ('A' - 10);

       Result = Lo + 16 * Hi;
       return(Result);
}

/* Decode the given string in-place */
/* by expanding %XX escapes.        */
void urlDecode(char *p)
{
       char *pD = p;

       while(*p)
       {
              if (*p == '%')       /* Escape: next 2 chars are hex           */
              {                    /* representation of the actual character.*/
                     p++;
                     if(isxdigit(p[0]) && isxdigit(p[1]))
                     {
                            *pD++ = (char)TwoHex2Int(p);
                            p += 2;
                     }
              }
              else
                     *pD++ = *p++;
       }
       *pD = '\0';
}

/* Parse out and store field=value items. */
/* Don't use strtok!                      */
void StoreField(char *f, char *Item)
{
       char *p;

       p = strchr(Item, '=');
       *p++ = '\0';
       urlDecode(Item);
       urlDecode(p);
       strcvrt(p, '\n', ' ');
       strcvrt(p, '+', ' ');       /* Get rid of those nasty +'s */
       field = f;                  /* Hold on to the field just in case. */
       name = p;                   /* Hold on to the name to print*/
}

int main(void)
{
       int ContentLength, x, i;
       char *p,
              *pRequestMethod,
              *URL,
              *f;

       /* Turn buffering off for stdin.*/
       setvbuf(stdin, NULL, _IONBF, 0);

       /* Tell the client what we're going to send */
       printf("Content-type: text/html\n\n");

       /* What method were we invoked through? */
       pRequestMethod = getenv("REQUEST_METHOD");

       /* Get the data from the client      */
       if(strcmp(pRequestMethod,"POST") == 0)
       {
              /* according to the requested method.*/
              /* Read in the data from the client. */
              p = getenv("CONTENT_LENGTH");
              if(p != NULL)
                     ContentLength = atoi(p);
              else
                     ContentLength = 0;
              if(ContentLength > sizeof(InputBuffer) -1)
                     ContentLength = sizeof(InputBuffer) -1;

              i = 0;
              while(i < ContentLength)
              {
                     x = fgetc(stdin);
                     if(x == EOF)
                            break;
                     InputBuffer[i++] = x;
              }
              InputBuffer[i] = '\0';
              ContentLength = i;

              p = getenv("CONTENT_TYPE");
              if(p == NULL)
                     return(0);

              if(strcmp(p, "application/x-www-form-urlencoded") == 0)
               {
                     p = strtok(InputBuffer, "&");       /* Parse the data */
                     while(p != NULL)
                     {
                            StoreField(f, p);
                            p = strtok(NULL, "&");
                     }
               }
         }

      URL = getenv("HTTP_REFERER");             /* What url called me.*/
      printf("<HEAD><TITLE>Submitted OK</TITLE></HEAD>\n");
      printf("<BODY><h2>The information you supplied has been accepted.");
      printf("<br> Thank You %s</h2>\n", name);
      printf("<h3><A href=\"%s\">[Return]</a></h3></BODY>\n", URL);

      return(0);
}

Notice the calls in the main routine to the C library function getenv. That is how the program can determine if the REQUEST_METHOD is equal to POST and how many bytes it should read by checking CONTENT_LENGTH.

Another very important point to make about the main function is that it must output a partial HTTP header to go with the HTML document that it creates. This line appears near the top of the function:

printf("Content-type: text/html\n\n");

You might want to add error handling later, in which case you would probably create an alternative HTML response document. The HTTP header would need to be printed in any case. The CGI convention requires that the header be followed by a blank line before the HTML code that is sent. That is why the printf statement includes two newlines at the end. Please forgive my frequent reminders, but this point is important.

The content type indicates a MIME encoding that tells the client browser that the data stream to follow is HTML code in ASCII format. There are several standard MIME encoding types. See the CGI specification for further information.

Testing CGI Systems

Getting CGI systems to work properly obviously requires the ability to integrate several sophisticated tools. And what should a good software engineer do when faced with the challenge of building a complex system? One proven approach is to establish clear milestones to reach the overall goal, build the software one piece at a time (preferably as black boxes with as few interfaces as possible), and test each module separately as you go to prove that the milestones are met successfully.

For example, test the HTML form independently from the CGI program. You might even take the time to build a test environment for the CGI application so that you can verify its input/output completely independent of any interaction with the Web server. Doing this could yield a great payback when it comes time to debug or enhance the system, especially if it is a large application or if it interfaces with a database. The goal is to reduce the edit/compile/link/test cycle down to as tight a loop as possible. A test environment that doesn't involve running the server, launching the browser, and filling out the form will yield significant time savings over the long run.

CGI Toolkits and Applications

Before trying to write your own CGI applications, consider letting someone else do it for you. This section discusses several CGI toolkits that are available on the Internet or the CD-ROM. Whether you are just counting visitors at your site, tabulating more advanced statistics, or running a customer support form, there is bound to be something here that will help you make your Intranet come to life.

CGI PerForm

CGI PerForm was designed to work with both Windows NT and Windows 95 and provide all the basic CGI functionality needed by a WWW site, without requiring C or Perl. With a simple command file, template file, and HTML form, you can create an e-mail feedback form, guest book, or even a ballot box-or perform all three of those operations at the same time and as many times as you want. For more information about CGI PerForm, visit this URL:

http://www.rtis.com/nat/software/

How CGI PerForm Works

You can break down an interactive WWW page into three pieces:

The HTML form through which the data is typed in and submitted
The Common Gateway Interface (CGI) application that receives and processes the submitted data
The end result

CGI PerForm is one example of the CGI application that handles the incoming data and creates the result. A result can be a combination of more than one task or command. PerForm commands are discussed thoroughly in the online documentation that accompanies the product.

CGI PerForm uses a command file you create to determine what tasks it needs to perform on the data. A different command file is created for every interactive application needed. Each command requires certain key values in order for it to perform its task. A majority of the key values are filenames. Some of these files must already exist, such as a template file or a column file. Others are created by the command, such as a data file or the output file.

CGI PerForm takes all the incoming data supplied by the HTML form and stores it into a memory block. An HTML form supplies data in name=value pairs, for example, lastname=Smith. You can supplement the data supplied by the HTML form by plugging in hard-coded name=value pairs in the command file. These values go into the same memory block as the submitted data. You can hard-code values in your command file to hide them or to set defaults.

The next step is to use the data. You can save the data to a data file or a database or combine the results with a template file to create a confirmation message or a form letter to be mailed. The command can be performed as often as necessary with different key values. For example, you could save data submitted by a user into three different data files. These data files can have some of the same data as another, or two of them could be identical. You can also pass variables between command blocks to create unique files in which to store data at the user's request.

CGI2Shell 2.0

If you find yourself using a lot of CGI scripts, you'll like this little utility package from Richard Graessler of Germany. See this link for more information (and other utilities):

http://rick.wzl.rwth-aachen.de/rickg/index.html

The CGI2Shell applications are intended for Windows Web servers that do not support the execution of scripts without a corresponding shell in the command line of a <FORM ACTION=> or <A HREF=> tag.

The CGI2Shell Gateway is a set of programs that enable PATH_INFO to specify the name of a CGI script that will be executed with either the POST or GET method. Currently, the shells Perl.exe and Sh.exe, and the Windows NT command interpreter cmd.exe, are supported.

Using CGI2Shell

The CGI2Shell Gateway includes three programs, one for each shell it supports:

CGI2Sh.exe for Sh.exe
CGI2Perl.exe for Perl.exe
CGI2Cmd.exe for Cmd.exe (Windows NT only)

All you need to do is include the script with its path in the PATHINFO of the URL. For example:

http://host.domain/progpath/CGI2xxx.exe/scriptpath/script.ext

The shell programs must reside in the path or the same directory as CGI2xxx.exe.

CGIC

CGIC is a library of functions for CGI development with ANSI standard C written by Thomas Boutell. You can find more information about it at the following URL:

http://sunsite.unc.edu/boutell/cgic/cgic.html

EIT's CGI Library

Enterprise Integration Technology has created LIBCGI to assist programmers who are writing CGI systems in C language. The library consists of about 15 functions, and it is freeware. Originally written for UNIX as part of their Webmaster's Starter Kit, it has been ported to several other popular platforms, including Windows. As with several URLs mentioned in this book, I have not tried this product and cannot endorse it other than to suggest that you visit their site and have a look for yourself: http://wsk.eit.com.

Web Developers Warehouse

If you program with Borland C++ and don't mind paying for CGI and HTML tools, you should definitely drop by http://htechno.com/wdw/index.htm. A company called Specialized Technologies has developed a suite of products they call the Web Developers Warehouse. It includes three components: TCgi, HTML Objects, and Web Wizard. TCgi is a set of C++ classes for WinCGI, which works with the O'Reilly WebSite server and the FolkWeb server.

Visit their home page for more information. You can also try the demonstration programs, pay for the software electronically, and download it pronto.

Summary

The next chapter picks up where this one leaves off by showing you how to develop a functional CGI database application. The source code in HTML, Visual C++, and Visual Basic is ready to run from the CD-ROM, but the material is not for the faint at heart when it comes to programming. If you're ready to take a big step beyond static Web pages, read on.

Chapter 19

Getting the Most Out of HTML with CGI

CONTENTS

GET Versus POST