In Chapter 2, "The Basics," you saw a simple CGI program-hello.cgi-which printed Hello, World! on your Web browser. Displaying something on your Web browser, if you recall, is simply a matter of printing to the standard output (stdout).
You might find sending messages to a Web browser a necessary but not particularly useful capability of CGI programs; after all, HTML files fulfill the equivalent role. However, HTML files are static and unchanging, whereas CGI enables you to create dynamic pages. For example, you might want to have a different graphic on your home page every time someone accesses it. Perhaps you want a page that displays the current time. Or you might want a counter that displays the number of times a page has been accessed. You can do all of this with CGI provided you understand how CGI output works.
None of these examples require user input from the Web browser, and each can be written without any understanding of how CGI input works. (CGI input is discussed in Chapter 5, "Input.") This chapter describes how to properly format and send output to the Web browser. Along the way, you'll see several examples of CGI programs that send data to the Web browser. You'll also learn some features of the World Wide Web that are related to CGI output, including embedding CGI output or other HTML files within another file and how to not send anything back to the browser.
The best way to understand how to format information for output
is to review how the browser, server, and CGI program communicate
and to dissect the server response. Figure 4.1
shows how the server usually sends information to the browser.
The Web browser begins the process by asking the server to retrieve some file or to run some CGI program. If asked to retrieve a file, the server looks to see if the file exists. If it does, the server then determines what kind of file it is (for example: Is it HTML? Is it an image in GIF format? and so on). The server then tells the browser whether it was successful in retrieving the information, what kind of data it's about to send, and then sends the data itself. For now, you're primarily interested in how the server speaks to the browser.
Here's an example that demonstrates exactly how this process works. On my fictitious Web server myserver.org, I have the file hello.html, listed in Listing 4.1.
Listing 4.1. The hello.html file.
<html> <head>
<title>Hello, World!</title>
</head>
<body>
<h1>Hello, World!</h1>
</body> </html>
When you enter the URL http://myserver.org/hello.html, your Web browser sends a request to my server asking it to retrieve the file hello.html. My server looks to see if that file exists and whether it can retrieve it. It then determines that hello.html is an HTML file by looking at its extension. After my server has ascertained all this information, it sends the following back to the Web browser:
HTTP/1.0 200 OK
Date: Sun, 04 Feb 1996 01:51:49 GMT
Server: Apache/1.0.2
Content-type: text/html
Content-length: 98
Last-modified: Thu, 01 Feb 1996 02:22:09 GMT
<html> <head>
<title>Hello, World!</title>
</head>
<body>
<h1>Hello, World!</h1>
</body> </html>
Tip |
If you are using UNIX, you can see the raw information sent by the server using the telnet program. By default, Web servers use port 80 to communicate. To get the root index file of the Web site hcs.harvard.edu, type the following at your prompt: % telnet hcs.harvard.edu 80 Make sure that you press Enter twice after the GET / HTTP/1.0. GET and HTTP/1.0 must be uppercase. It will return the unparsed information so you can see what the Web browser typically receives. You can use this method to test servers and CGI scripts to make sure they return what you want them to return. |
There are two separate entities separated by a blank line: the header (called an HTTP header) and the body (the actual data). The entire header was added by the Web server before sending the information to the Web browser. The header provides information about the data while the body contains the actual data.
Before looking at how the server responds to a request to run a CGI program, first dissect the HTTP header. The first line of the server response is always a status message and is called the status header. It tells you whether or not the server was successful finding the file. The format of this line is
HTTP/1.0 number message
where number is the status number and message is the status message. A status number of 200 means the retrieval was successful. You learn a few other useful status numbers later in this chapter. HTTP/1.0 is the protocol type and version. The current version of HTTP (which essentially all Web servers use) is version 1.0. If you were connecting to an older Web server, you might receive an HTTP/0.9 status message instead, indicating the older version of the protocol used to communicate between client and server.
Skipping down a few lines, notice the line Content-length: 98. This tells the browser how large the data following the headers is in bytes. If you count the number of characters in Listing 4.1, including the invisible newline characters as a character, you will find that there are 98 of them.
Note |
If you look at your Web server access logs, you might see a line like this: hcs.harvard.edu - - [03/Feb/1996:20:53:39 -0500] The last two fields of the line, 200 2396, represent the status code and content-length, respectively, the same information that is returned by the server to the browser in the HTTP header. |
The Date header returns the date of the request according to the server. The Last-Modified header returns the date and time the particular information (in this case, hello.html) being returned was last modified. The Server header gives the name and version number of the server.
Finally, the Content-Type header tells the browser what kind of information to expect. text/html is the Multimedia Internet Mail Extension (MIME) type of HTML files. MIME is a way of describing the type of data. You learn MIME in more detail soon.
Carriage Returns and Newlines |
When you press Enter on your computer, you are probably used to seeing the cursor move to the first position on the next line. Although this behavior requires only one keystroke, in reality, two things are happening: the cursor is moving down one line and then moving to the beginning of the line. Old typewriters required two actions for the same effect. You pressed the Return key, which shifted the paper up a line, and pushed the cylinder (or carriage) back to the left, returning the carriage. HTTP, and indeed most Internet protocols, require both a carriage return (CR) and a line feed (LF) character at the end of each header. In C, these are represented as \r and \n. So when the server returns a document, each line of the header ends with both a CR and an LF. When you program CGI applications, you should end each header line and the blank line separating the header and body with a CRLF (\r\n). For example, in order to print your de facto Content-Type header in C, you would use printf("Content-Type: text/html\r\n"); rather than printf("Content-Type: text/html\n"); Most servers remove anything at the end of the headers from the CGI program and add the proper CRLF combination themselves, so ending your headers with only a \n will normally work. However, there is no guarantee that all servers will work in this manner, so you should make it a habit to use both \r and \n at the end of each header line. |
Caution |
On UNIX systems, printing \r\n does the same thing from Perl as it does in C: it prints a CRLF combination. However, on DOS/Windows systems, by default, Perl interprets \r and \n the same way. Therefore, the following Perl code: print "Content-Type: text/html\r\n"; will print Content-Type: text/html Note the blank line between the Content-Type and Expires header. This is clearly undesirable, because the server will believe that the Expires header is actually part of the body of the message rather than the header. In order to get the proper effect from Perl on a DOS/Windows platform, you need to use the binmode function. binmode STDOUT; Now, \r\n is interpreted properly. Because of this quirky behavior, the Perl examples in this book use \n rather than \r\n. Although it is technically incorrect, it will generally work across all platforms. Even though technically it should be a carriage return line feed (\r\n), using the characters that represent both of these will produce too many new lines in reality, where just the \n character will actually produce both a carriage return and a line feed, not just a return. |
In the preceding example, the server retrieved the HTML file and added the entire HTTP header itself. Web servers work almost exactly the same with CGI programs, except the server will provide only some of the headers; it expects the CGI program to provide the other important headers.
At the minimum, CGI programs need to return only one header: either a Content-Type or Location header. The former is the more common of the two. As discussed earlier, Content-Type tells the browser what sort of information to expect. Location offers an alternative location for certain data; this is useful for redirection and other types of requests. CGI programs can optionally send other HTTP headers, as well.
The Web browser needs to know what kind of information it's receiving so it knows how to properly display it. If it receives a GIF image, it needs to either display it inline or open an external application that will display it. If it receives an HTML file, it needs to interpret and render the HTML.
The only way the browser can know the type of information it's receiving is to ask the server. How does the server know? The most common way is by associating filename extensions with file types. The server finds the file, determines the type by looking at the extension, and tells the browser what kind of file it thinks it's returning.
Note |
For servers that determine file types by filename extensions, what you name your files is important. For example, if you call a GIF file picture.html, the server will think it's an HTML file, not a GIF file, and will tell the browser that. Consequently, you'll end up with garbage on your Web browser. |
Tip |
Many of these UNIX servers are configured to think that only files ending with .html are HTML files. Many Windows servers interpret .htm as HTML files. If you're moving .htm files from Windows to a UNIX server, you could either rename all the files to .html or reconfigure your server to associate both .html and .htm files as HTML files. To do this using either the NCSA or Apache server, change the following line in the conf/mime.types file from text/html html |
The server uses the MIME format to identify the file type to the browser. MIME was originally designed to extend the Internet mail protocol to use multimedia rather than just plain text. The MIME format as it applies to multimedia mail is not important for CGI programming (although MIME has some interesting historical roots in the early development of the Web); for more information on MIME, look at RFC1521.
What you should be concerned with here is the MIME Content-Type header. Data format content-types are specified as follows:
type/subtype
The type specifies the general type of data. There are seven types: text, image, audio, video, application, multipart, and message. You will never see the message content-type used in CGI programming, but the other six have different roles. Text, image, audio, and video are self-explanatory. The application type specifies application-specific or binary data (for example, a Microsoft Word document). Multipart designates a document with several different content-types embedded within one document; this type has some special uses for CGI programming, such as server-side push, which is discussed in Chapter 14, "Proprietary Extensions."
The subtype gives more exact information about the specific type. Text files can be plain text, rich text format (RTF), or HTML. Images can be GIF, JPEG, or countless other formats.
In order to maintain a standard and consistent set of MIME content-types,
the Internet
Assigned Numbers Authority (IANA) maintains a central registry
of registered MIME types. Most of the common MIME types (such
as GIF and HTML) have been registered. Other less common or proprietary
formats (such as Microsoft's WAV audio format) have not been registered,
and yet these formats are commonly distributed over the Web. For
MIME subtypes not yet registered at IANA, you precede your proposed
MIME type with an x-. For
example, the WAV audio formats content-type and subtype combination
is audio/x-wav. Common MIME
types are listed in Table 4.1.
Type/Subtype | |
text/plain | Plain text. By default, if the server doesn't recognize the file extension, it will assume the file is plain text. |
text/html | HTML files. |
text/richtext | Rich Text Format. Most word processors understand rich text format, so it can be a good portable format to use if you want people to read it from their word processor. |
image/gif | GIF images, a common, compressed graphics format specifically designed for exchanging images across different platforms. Almost all graphical browsers will display GIF images inline (using the <img> tag). |
image/jpeg | JPEG is another popular image compression format. Although it is a fairly common format, JPEG is not supported internally by as many browsers as is GIF. |
image/x-xbitmap | X bitmap is a very simple pixel-by-pixel description of images. Because it is simple and because most graphical browsers support it, it can be useful for creating small, dynamic images such as counters. Generally, X bitmap files have the extension .xbm. |
audio/basic | Basic 8-bit, ulaw compressed audio files. Usually ends with the extension .au. |
audio/x-wav | Microsoft Windows audio format. |
video/mpeg | MPEG compressed video. |
video/quicktime | Quicktime video. |
video/x-msvideo | Microsoft Video. Usually ends with the extension .avi. |
application/octet-stream | Any general, binary format that the server doesn't recognize usually uses this MIME type. Upon receiving this type, most browsers will give you the option of saving the data to a file. You can use this MIME type to force the user's browser to download and save a file rather than display it. |
application/postscript | Postscript files. |
Note |
You can define parameters for MIME types by appending a semicolon (;) and the parameters after type/subtype. Most CGI applications will never require it, but this is useful for specifying alternative HTML character sets (for example, foreign languages) or performing special tasks such as server-side push (as discussed in Chapter 14). |
Your CGI programs can deliver an existing file from your file system or from another Web server by using the Location header. To use it, you specify either a file location relative to the root of your Web directory tree or a URL.
Location: absoluteURI
For example, suppose that you want the CGI program to return the hello.html document in Listing 4.1. Listing 4.2 and 4.3 list the Perl and C code for this CGI.
Listing 4.2. index.cgi in Perl.
#!/usr/local/bin/perl
print "Location: /hello.html\n\n";
Listing 4.3. index.cgi in C.
#include <stdio.h>
int main()
{
printf("Location: /hello.html\r\n\r\n");
}
Note the blank line following the Location header. Even though there is no data following this header, the blank line is necessary so that the server knows there are no more headers.
When the browser requests one of these index.cgi programs, the server will return the same headers and the body as it does when the browser requests the hello.html file directly, something that looks like the following:
HTTP/1.0 200 OK
Date: Sun, 04 Feb 1996 01:51:49 GMT
Server: Apache/1.0.2
Content-type: text/html
Content-length: 98
Last-modified: Thu, 01 Feb 1996 02:22:09 GMT
<html> <head>
<title>Hello, World!</title>
</head>
<body>
<h1>Hello, World!</h1>
</body> </html>
Note that the Location header is nowhere in sight. When the CGI program sends a Location header followed by a file location, the server retrieves and returns the file as if that were the original request from the start.
However, if you specify a URL following the Location header, the server interprets the CGI header differently. Suppose, for example, that you want index.cgi to return the White House home page (URL:http://www.whitehouse.gov/). Your Perl code might look like that in Listing 4.4.
Listing 4.4. index.cgi returns the White House home page.
#!/usr/local/bin/perl
print "Location: http://www.whitehouse.gov/\n\n";
Now, the server returns something like this:
HTTP/1.0 302 Found
Date: Mon, 12 Feb 1996 08:38:20 GMT
Server: Apache/1.0.2
Location: http://www.whitehouse.gov/
Content-type: text/html
<HEAD><TITLE>Document moved</TITLE></HEAD>
<BODY><H1>Document moved</H1>
The document has moved <A HREF="http://www.whitehouse.gov/">here</A>.<P>
</BODY>
The server does not treat the request as if it were just a request for another file as it did in the previous example. Instead, it sends an appropriate message back with a different status number-302-and a Location header specifying the new URL. It is now the client's responsibility to make a new request to the White House Web server.
Both of these uses of the Location header can be useful in different kinds of CGI applications. Specifying a file in a Location header will allow you to keep HTML output separate from the CGI code, rather than hard coding the output within the CGI application. For example, suppose that I had a CGI program that processed a comment and returned a thank-you note. After it has finished processing the comment, the CGI program might return something like the following:
<html> <head>
<title>Thanks!</title>
</head>
<body>
<h1>Thanks!</h1>
</body> </html>
You could simply have your CGI program output this message using several print messages. However, because the message is static, you could also save this message in an HTML file and call it from your CGI program using the Location header. This way, if you want to modify the message in any way, you edit the HTML file rather than the CGI code.
You can redirect a Web browser to another Web site using a Location header followed by a URI. In Chapter 10, I describe two applications that rely on the Location header using both methods described earlier: a content-negotiation application, a program that sets expiration dates on files, and a redirection manager.
The first line of every server response contains a code that tells the browser the status of the transaction. The Web browser will react to certain status codes in certain ways. For example, the most common code is 200, which designates a successful transaction. Upon receiving this code, the browser normally just displays the accompanying data. On the other hand, if the browser receives 204 (No Content), then the browser knows that the server response contains no content. Upon receipt of code 204, the browser will simply remain at the current page rather than display a blank page.
Tip |
Often, when you're creating an imagemap, discussed in detail in Chapter 15, "Imagemaps," you want nothing to happen when the user clicks on an undefined area. People will sometimes try to circumvent this problem by specifying the page containing the imagemap as the default URL. For example, the page at http://myserver.org/map.html might contain an imagemap with the following map file: default http://myserver.org/map.html This solution is inefficient because whenever an undefined area (default) is selected by the user, the server sends the entire map.html over again and often, the browser wastes time refreshing the page. When the default area is selected you want the browser to do nothing at all. You can accomplish this by using a CGI program that uses the status code 204. The following C code, donothing.c, will achieve your goal: #include <stdio.h> Compile and install in your cgi-bin directory, and then set the default in your map configuration file to default /cgi-bin/donothing Now, when the user clicks on an undefined area, the browser will request the donothing CGI program that will tell the browser to do exactly that-nothing. |
Other useful HTTP status codes to know for CGI programming are listed in Table 4.2. In order to set the status code of the server response from your CGI program, use the Status header followed by the status code and an optional status message. For example, to send a status code of 204, print the following line:
Status: 204 No Content
The server will translate this CGI header into the equivalent HTTP server response:
HTTP/1.0 204 No Content
Status Code | ||
200 | OK | The document was successfully found and sent. |
204 | No Content | The transaction was successful, but there is no data to send. The Web browser should just remain on its current page. |
301 | Moved Permanently | The page is now located at a new URI, specified by the Location header. |
302 | Moved Temporarily | The page is temporarily located at a new URI, specified by the Location header. |
401 | Unauthorized | The document you're trying to access is protected by some authentication scheme. Browsers should typically prompt the user for a username and password upon receiving this status code. |
You can use the Status header to control how the server interprets the Location header. By default, if you specify a URI following the Location header, the server assumes a status of 302. Upon receiving this status code, the browser will assume that the page has been temporarily moved and that it will be back at the original location at a later time.
If you had moved a page permanently to a new location, then it is useful to send the 301 status code rather than 302. This tells the browser to access the URI specified by the Location header every time a request is made to the original URI.
You can manipulate the various defined HTTP status codes for your own purposes, or even invent some of your own. These status codes are discussed in more detail in Chapter 8, "Client/Server Issues."
Two HTTP headers you might find relevant when programming CGI are the Expires and Pragma header. Both tags are used to prevent Web browsers from caching documents. When a browser caches a document, it stores a copy locally to save itself from having to download the document again if the page is revisited. For example, suppose that you provided a news service over the Web that periodically provided up-to-date news articles. The document located at a URL might have changed several times over a period of time. Many browsers will cache a document the first time it accesses it and then reload that cached document when the user tries to go back to it by using the browser's Back button.
You can prevent the Web browser from caching the document by either using the Expires or the Pragma header. Expires enables you to declare a date and time when the document should expire; once that time has come, the browser should access the file from the server rather than from its cache. The date following the Expires header should be in Greenwich Mean Time (GMT)-also known as Universal Time-and in Internet time format as specified by RFC1123 ("Requirements for Internet Hosts-Application and Support," which can be found at http:
//andrew2.andrew.cmu.edu/rfc/rfc1123.html).
Sun, 06 Nov 1994 08:49:37 GMT
Unfortunately, this is not the only time format you might encounter. Although the preceding is the most correct way to report the time and date, there are some older formats that some programs might still use. These time formats are discussed in RFC850, "Standard for Interchange of USENET Messages," which can be found at http://andrew2.andrew.cmu.edu/rfc/rfc850.html.
Note |
Greenwich Mean Time (GMT) is five hours ahead of Eastern Standard Time (EST) and eight hours ahead of Pacific Standard Time (PST). |
If you never want a document cached, you could either set the
expire date to be the same time you sent the document, or you
could use the Pragma header.
The Pragma header enables
you to send customized directives to each receiving client. Web
communication is not always simply between two parties. Many times,
there are intermediate parties communicating with the browser
and the server, as shown in Figure 4.2.
Figure 4.2 : Intermediates in browser-server communication.
In Figure 4.2, the Web pages are accessed from the server by a proxy server, which then sends the pages to the browser. Each receiver-the proxy server and the browser-receives the Pragma header. One Pragma header that is understood by some browsers is the no-cache directive. When a browser receives the following header, it should understand this to mean not to cache the following document:
Pragma: no-cache
CGI output will enable you to create dynamic documents, whether they are HTML or graphics. You'll learn how to apply the information provided in the previous sections to create all sorts of dynamic documents.
Delivering an HTML document by using a CGI program is relatively simple. However, remember to be careful to specify proper pathnames in your HTML tags. For example, Listing 4.5 contains something you might see after filling out a comments page over the Web.
Listing 4.5. Output of CGI comments program (thanks.html).
<html> <head>
<title>Thanks!</title>
</head>
<body>
<h1 align=center>
<img src="image/smiley.gif" alt="" align=middle>
Thanks!
</h1>
<p>Thanks for submitting your comments. You can now return to our
<a href="index.html">home page.</a></p>
</body> </html>
Suppose you wanted to write a CGI program that printed the HTML in Listing 4.5. You might be inclined to write something like the code in Listing 4.6.
Listing 4.6. thanks.cgi.
#!/usr/local/bin/perl
print "Content-Type: text/html\n\n";
&print_thank_you;
sub print_thank_you {
print<<EOM;
<html> <head>
<title>Thanks!</title>
</head>
<body>
<h1 align=center>
<img src="image/smiley.gif" alt="" align=middle>
Thanks!
</h1>
<p>Thanks for submitting your comments. You can now return to our
<a href="index.html">home page.</a></p>
</body> </html>
EOM
}
Assuming thanks.cgi is in the /cgi-bin
directory, smiley.gif is in a directory called /images,
and index.html is located at document root, thanks.cgi would produce
the output in Figure 4.3.
Figure 4.3 : HTML sent by thanks.cgi gives a broken image and a broken link.
You get a broken image, and if you try to follow the link to the home page you will also get an error message. The problem is that you are not specifying relative paths for the image and the link. The base URL is /cgi-bin, but it should be /. The code in Listing 4.7 solves this problem. You could also solve this problem by using the <base> tag discussed in Chapter 3, "HTML and Forms."
Listing 4.7. thanks2.cgi.
#!/usr/local/bin/perl
print "Content-Type: text/html\n\n";
&print_thank_you;
sub print_thank_you {
print<<EOM;
<html> <head>
<title>Thanks!</title>
</head>
<body>
<h1 align=center>
<img src="/image/smiley.gif" alt="" align=middle>
Thanks!
</h1>
<p>Thanks for submitting your comments. You can now return to our
<a href="/index.html">home page.</a></p>
</body> </html>
EOM
}
This CGI is returning a static page, so you might find it easier to create an HTML page and have the CGI load that page. Assume that your HTML file-thanks.html, as listed in Listing 4.5-does not contain full pathnames for the image and link and that thanks.html is located in your document root directory.
There are two ways of returning this file. You could either load the file directly, or you could use the Location header. Implementations of both of these solutions in Perl are shown in Listings 4.8 and 4.9.
Listing 4.8. thanks3.cgi (loads the file directly).
#!/usr/local/bin/perl
# document root defined here
$root = '/usr/local/etc/httpd/htdocs/';
open(FILE,"$root/thanks.html") || die "Cannot open file.\n";
print "Content-Type: text/html\n\n";
while (<FILE>) {
print;
}
Listing 4.9. thanks4.cgi (uses the Location header).
#!/usr/local/bin/perl
print "Location: /thanks.html\n\n";
Both Listings 4.8 (thanks3.cgi) and 4.9 (thanks4.cgi) accomplish essentially the same thing, but there are notable differences in both the code and the result. thanks3.cgi is more complex but is probably more optimal because it opens the file and returns its contents in one process. Additionally, it will return a broken image because full pathnames are not used in the HTML file (the same problem you had in Listing 4.5, the original thanks.cgi). thanks4.cgi is the simpler, and in this case, the better solution. The HTML returned is correct because the base URL is the document root, not the /cgi-bin directory. However, thanks4.cgi is not as efficient as thanks3.cgi because upon receiving and parsing the Location header, the server must make another call to access the file. Thus, it is a two-step process rather than a one-step process like thanks3.cgi. The extra efficiency achieved in Listing 4.8 is almost trivial, however, and not worth the extra code complexity.
The examples so far have focused on CGIs with static output. Now that you understand the issues involved with CGI output in general, you are ready to start programming CGIs that create dynamic documents.
Coding CGI output often involves printing HTML tags and text. This can be a repetitive process that can be simplified by writing wrapper functions. For example, the header of almost all HTML documents will consist of an <html> tag, <head> tags, <title> tags, and perhaps the opening <body> tag. In C, printing the header of an HTML document might look like this:
printf("<html> <head>\n");
printf("<title>This is the title</title>\n");
printf("</head> <body>\n");
Instead of using several printf() calls to print the headers, you could create a header function like the following:
void html_header(char *title)
{
printf("<html> <head>\n");
printf("<title>%s</title>\n",title);
printf("</head> <body>\n");
}
Now, every time you need to output an HTML file, you can start with the following function rather than several printf() statements:
html_header("This is the title");
Many CGI programming libraries create such wrapper functions that
you might or might not want to use. For example, the cgi-lib.pl
package declares the output functions in Table 4.3.
PrintHeader | Prints Content-Type: text/html\n\n header. |
HtmlTop($title) | Prints <html>, <head>, and opening <body> tags. Also prints $title surrounded by <title> tags. |
HtmlBot | Prints the closing </body> and </html> tags. |
cgihtml declares the wrapper
functions for C programs, as listed in Table 4.4.
void html_header() | Prints a Content-Type: text/html\r\n\r\n header. |
void mime_header(char *mime) | Prints a Content-Type header with MIME type mime. |
void nph_header(char *status) | Prints an No-Parse header with status code and message status. |
void show_html_page(char *loc) | Prints a Location header with location loc. |
void status(char *status) | Prints a Status header with status code and a message defined by status. |
void pragma(char *msg) | Sends a Pragma header with directive msg. |
void html_begin(char *title) | An HTML header wrapper function. Equivalent to cgi-lib.pl HtmlTop. |
void html_end() | Complement to html_begin(). Prints the closing </body> and </html> tags. |
void h1(char *str) .. void h6(char *str) | Wrapper for headline tags <h1> through <h6>. Surrounds str by the appropriate tags. |
void hidden(char *name, char *value) | Defines a hidden input type with name name and value value. Useful for maintaining state; see Chapter 13, "Multipart Forms and Maintaining State." |
If you're a Perl 5 user, Lincoln Stein's CGI.pm library offers a powerful way of creating forms using object-oriented programming. For more information on CGI.pm, see URL:http://www-genome.wi.mit.edu/ftp/pub/software/WWW/cgi_docs.html.
First, I'll describe a CGI-date.cgi-that displays the local date and time. date.cgi does two things:
The most challenging part of the program will be to figure out how to calculate the time, so I'll begin there. In Perl, calculating the date and time is easy. Listing 4.10 lists the Perl code for calculating and printing the date.
Listing 4.10. Calculating the date using date.pl.
#!/usr/local/bin/perl
print $time = (localtime),"\n";
The localtime function converts the value returned by the system's time() function into familiar time components such as day, month, year, and so on.
In C, calculating the time depends on the platform you are using. On UNIX systems, you can use date.c in Listing 4.11.
Listing 4.11. Calculating formatted date/time using date.c.
#include <sys/time.h> /* on some systems, this is <time.h> */
#include <stdio.h>
int main()
{
struct timeval time_val;
struct timezone time_zone;
gettimeofday(&time_val,&time_zone);
printf("%s",asctime(localtime(&time_val.tv_sec)));
}
Now that you know how to calculate and print the current local time, you want to incorporate this into a CGI program. Instead of printing the date to the screen, you want to print it to the Web browser. Listings 4.12 and 4.13 offer solutions in both Perl and C.
Listing 4.12. Calculating the date in Perl using date.cgi.
#!/usr/local/bin/perl
# print CGI header
print "Content-Type: text/html\n\n";
# now print the time
print "<html> <head>\n";
print "<title>Current Date and Time</title>\n";
print "</head> <body>\n";
print "<h1>The date is now</h1>\n";
print "<p>$time = (localtime),"</p>\n";
print "</body>\n</html>\n";
Listing 4.13. Source code of C version of date.cgi, date.cgi.c.
#include <sys/time.h> /* on some systems, this is <time.h> */
#include <stdio.h>
int main()
{
struct timeval time_val;
struct timezone time_zone;
/* print CGI header */
printf("Content-Type: text/html\r\n\r\n");
/* calculate and print the time */
printf("<html> <head>\n");
printf("<title>Current Date and Time</title>\n");
printf("</head> <body>\n");
printf("<h1>The date is now</h1>\n");
gettimeofday(&time_val,&time_zone);
printf("<p>%s</p>\n",asctime(localtime(&time_val.tv_sec)));
printf("</body> </html>\n");
}
You should have found this example fairly simple. Fortunately, almost all CGI programs are as simple as date.cgi. The complexity will usually lie in the program design and the main algorithms of the program rather than in the CGI parsing or output routines. After figuring out how to calculate the current date and time, printing it to the Web browser is simple.
Dynamic pages are obviously useful for CGI programs that return some output based on some input, perhaps submitted by a user via a form. For example, in a database search, the CGI program will need to construct the HTML page as it finds items in the database.
However, writing a CGI program for every dynamic document can be a waste. For example, if you wanted the current date displayed on every HTML document on your server, one way would be to write a CGI program for every document on the server. This is not a practical solution.
What would be ideal is if you could mark certain spots in your HTML documents by using a special tag, and then have either a CGI program or the server parse the document for that tag and replace it with the current date. I write a CGI program that accomplishes this in Chapter 10, "Basic Applications." However, for many servers, such a CGI program is not necessary because the server will parse the documents for you.
Servers that will preparse HTML documents for special tags have a feature called server-side includes (or SSI for short). Implementations of SSI vary among different servers, but the idea is the same. The server will normally reserve a few special commands, usually surrounded by the HTML comment tag <!-- -->. When the server accesses a document, it will parse it for these tags or server-side includes. Upon reading one of these server-side include tags, the server will replace it with the appropriate text. This could be anything from the output of a CGI program to another HTML document.
Although server-side includes are a nifty feature, they have several disadvantages. First, your server response time is less efficient because the server must parse each document it accesses before returning any output to the browser. Second, server-side include implementations that enable you to return the output of CGI programs inherit the same security risks inherent in CGI. Third, some implementations enable you to include the output of system commands. This can weaken the security of your system in many ways, and I highly recommend that you disable this feature if you have it. Security issues are discussed in greater detail in Chapter 9, "CGI Security."
Here, I discuss a few server-side include commands for the NCSA server and how you can combine them with CGI applications. A complete reference to NCSA server-side include commands is in Appendix C, "Server-Side Includes."
Configuring NCSA for Server-Side Includes |
By default, the NCSA server disables server-side includes. In order to enable them, you need to take two steps. First, you want your server to recognize certain files as parsed HTML rather than regular HTML. Add the following line to your srm.conf file: Add-Type text/x-server-parsed-html .shtml The server will now preparse any file with the extension .shtml for server-side includes. If you want the server to preparse all HTML files (warning: this will significantly increase the load on the web server), add this line instead: Add-Type text/x-server-parsed-html .html Now, you need to enable server-side includes. Add the Includes option to the Options line in the access.conf file. It should look something like the following: Options Indexes Includes FollowSymLinks Note that Includes will enable you to include output of both CGI programs and system programs. The latter is undesirable. You might instead use IncludesNOEXEC to allow included files, but not the output of executables. |
The general format for NCSA server-side includes is
<!--#command tag1="value1" tag2="value2" -->
command is one of six items: config, include, echo, eû ze, flastmod, and exec. For purposes here, only echo and exec are important for now. Tag and value pairs are the parameters for the command. There can be any number of parameters depending on the command.
If the server-side include is printed from a CGI program, then
it inherits all of the CGI environment variables. Additionally,
all preparsed pages have several other environment variables defined,
as listed in Table 4.5.
DOCUMENT_NAME | The name of the document the server returns. |
DOCUMENT_URI | The URI of the document. Note: this is a virtual, and not absolute, URI. |
QUERY_STRING_UNESCAPED | The unescaped QUERY_STRING, if one is included. |
DATE_LOCAL | The local date. |
DATE_GMT | The date in GMT. |
LAST_MODIFIED | The date the document was last modified. |
You can include the value of any of these environment variables by using the echo command. For example, if you wanted to include the current time in each of your documents, you would use the following tag:
<!--#echo var="DATE_LOCAL" -->
The date.shtml in Listing 4.14 does the same thing as date.cgi in Listing 4.12.
Listing 4.14. Displaying the date with date.shtml.
<html> <head>
<title>Current Date and Time</title>
</head>
<body>
<h1>Current Date and Time</h1>
<!--#echo var="DATE_LOCAL" -->
</body> </html>
No CGI programming was necessary to accomplish this task because the server takes care of the work of calculating the date for you. Many servers will automatically calculate other values for you as well, so that you can include these values without having to write a line of code. For example, some servers have built-in commands for querying databases while others have built-in counters.
Fortunately, you don't need to rely on the server providing these variables to accomplish the same effect. You can use your /cgi-bin/date.cgi program to accomplish the same thing. To do this, you use the exec command
<!--#exec cgi="program.cgi" -->
where program.cgi is the name of the CGI program. You need to specify the path for the CGI program, as well. Using the date.cgi program in Listing 4.12, you accomplish the same thing by embedding the following line in your .shtml document:
<!--#exec cgi="/cgi-bin/date.cgi" -->
You need to keep a few things in mind when you are including the output of your CGI programs in your HTML documents. First, the server does not check to make sure the CGI program returns HTML. If your CGI program returns an image, you will see junk on your screen. Second, you cannot pass parameters to your CGI program over the command line. (Passing parameters is discussed in Chapter 5, "Input.") Finally, notice that including CGI output is extremely inefficient. First, the server accesses the document, which it then parses. Next, the server runs a CGI program. Finally, it sends the output of the whole thing to the browser. On a heavily loaded server, you are not going to get good performance if you include CGI output in your HTML documents.
"Dynamic documents" doesn't imply only text. Your CGI programs can send dynamically created graphics, sounds, or any other type of media for that matter. You simply print the appropriate MIME header followed by a blank line and then the raw data.
image.cgi in Listing 4.15 will load a GIF image located on your file system and send it to the browser.
Listing 4.15. image.cgi.
#!/usr/bin/perl
$file = '/usr/local/etc/httpd/htdocs/images/picture.gif';
print "Content-Type: image/gif\n\n";
open(GIF,"<$file") || die "Can't open GIF\n";
while (read(GIF,$buffer,16384)) {
print $buffer;
}
image.cgi first sends a Content-Type header, and then loads the file in $file and prints it to stdout. You could easily modify this program to send audio or video files, as well, simply by editing the $file variable and the Content-Type header.
How would you include an inline image generated by a CGI program? You cannot use server-side includes as noted previously. Fortunately, the <img> tag is smart enough to interpret CGI programs (because the <img> tag forces another HTTP GET command and the server is forced to interpret the new header), so you could have displayed the output of image.cgi by using the following tag:
<img src="/cgi-bin/image.cgi">
image.cgi, like some of the earlier examples, is a somewhat useless program in its current form. If you wanted to display picture.gif inline in a document, the proper way to do it would be with the following tag:
<img src="/images/picture.gif">
However, you can extend image.cgi to do other things. For example, image.cgi could randomly select an image from several files and display it. This way, you could have a different image every time your document was accessed.
Counter programs often take advantage of this property of the <img> tag, particularly so that counters can be used on servers that don't enable parsed HTML and includes. This will also lower server load because the HTML file does not need to be parsed by the server.
Additionally, instead of simply loading premade graphics from the file system, you can actually generate images on the fly. For example, you could design a CGI program that generates a custom-designed map with the parameters and details defined by the user. Or you could design a coloring book over the Web. Once again, the complexity of such applications lies in generating these graphics, not in outputting the graphics to the Web browser.
Tip |
Thomas Boutell's gd library is an excellent tool for generating GIF images on the fly. It is written in C, and there are Perl and Tcl interfaces available for it. You can get more information on gd from URL:http://www.boutell.com/gd/. |
You can now use your knowledge of CGI output to write a CGI counter. Many people like to include a nifty-looking counter that displays the number of times a page has been accessed. There are two main pieces to a counter program:
This time, your challenge not only lies in the main algorithm-reading and updating the access count-but in displaying the number. You can either display the number as text or as a graphic. Here, you'll develop both a text and graphical counter. In Chapter 5, "Input," you'll combine the two into one intelligent counter.
In order to keep track of accesses, I'll store the number of accesses in a text file. The text file will simply contain an integer representing the number of accesses.
Normally, you could divide this process into three steps:
However, remember that this program is being run in a multiuser environment. More than one person can access the counter at once, in which case several different programs will be trying to write to the same file. If you do not make sure only one process can write to a file at a time, you risk losing or garbling the information.
Caution |
When your CGI program writes to a file, remember to lock the file first so that nothing else can write to it while your CGI program writes to it. |
To prevent this, you need to implement file-locking. File locking prevents more than one program from writing to a file at once. Various platforms have different system-specific functions for file locking. Although these functions can be useful for many reasons, they are not generally portable.
A simple and portable method of file locking is to create a temporary file called a lock file. Before writing to a file, a program should check for the existence of the lock file. If the lock file exists, the program should wait until the lock file goes away. After the data file is unlocked, the program creates a new lock file, writes the new data, and then deletes the file.
Your algorithm for keeping track of accesses now looks like this:
Listings 4.16 and 4.17 contain Perl and C code for keeping track of accesses.
Listing 4.16. Perl code for counting accesses.
$data = '/usr/local/etc/httpd/htdocs/counter.data';
$lockfile = '/usr/local/etc/httpd/htdocs/counter.LOCK';
sub increment {
# read the data
open(DATA,$data) || die "Can't open data file.\n";
$accesses = <DATA>;
$accesses++;
close(DATA);
# check for lock file
while (-e $lockfile) {
sleep 2; # wait 2 seconds
}
# create lockfile
open(LOCK,">$lockfile") || die "Can't create lockfile.\n";
close(LOCK);
# write new value
open(DATA,">$data");
print DATA "$accesses\n";
close(DATA);
# delete lockfile
unlink($lockfile);
}
Listing 4.17. C code for counting accesses.
#include <stdio.h>
#include <stdlib.h>
#define datafile "/usr/local/etc/httpd/htdocs/counter.data";
#define lockfile "/usr/local/etc/httpd/htdocs/counter.LOCK";
int increment()
{
FILE *data, *lock;
char number_string[10]; /* won't have a number greater than 10 digits */
int number;
/* read data */
data = fopen(datafile,"r");
fgets(number_string,10,data);
close(data);
number = atoi(number_string);
number++;
/* check for lockfile */
while (fopen(lockfile,"r") != NULL) {
close(lockfile);
sleep(2);
}
/* create lockfile */
lock = fopen(lockfile,"w");
close(lockfile);
/* write new value */
data = fopen(datafile,"w");
fprintf(data,"%d\n",number);
fclose(data);
/* delete lockfile */
unlink(lockfile);
return number;
}
Caution |
In UNIX, you must make sure that your CGI program has permission to write the access count to the data file. This means that the CGI program must have permission to both write to the file and read and write in the appropriate directory. |
You are ready to display the number. Note that this increment function-and consequently the entire counter application-has limited functionality. The counter data file is hard coded in the program and can consequently be used only with one corresponding HTML document. A more sophisticated counter program would enable you to define where the counter data is located. The counter program in Chapter 5 will do this. Additionally, the increment function is not extremely robust because it assumes that the counter.data file contains only one number. It would be nice (but not necessary) to add some error routines.
A textual counter simply needs to print the number to the stdout. The completed textual counters in both Perl and C are in Listings 4.18 and 4.19.
Listing 4.18. text-counter.cgi (in Perl).
#!/usr/local/bin/perl
$data = '/usr/local/etc/httpd/htdocs/counter.data';
$lockfile = '/usr/local/etc/httpd/htdocs/counter.LOCK';
# main routine
&increment;
print "Content-Type: text/html\n\n";
print $accesses;
sub increment {
# read the data
open(DATA,$data) || die "Can't open data file.\n";
$accesses = <DATA>;
$accesses++;
close(DATA);
# check for lock file
while (-e $lockfile) {
sleep 2; # wait 2 seconds
}
# create lockfile
open(LOCK,">$lockfile") || die "Can't create lockfile.\n";
close(LOCK);
# write new value
open(DATA,">$data");
print DATA "$accesses\n";
close(DATA);
# delete lockfile
unlink($lockfile);
}
Listing 4.19. text-counter.cgi.c.
#include <stdio.h>
#include <stdlib.h>
#define datafile "/usr/local/etc/httpd/htdocs/counter.data";
#define lockfile "/usr/local/etc/httpd/htdocs/counter.LOCK";
int increment()
{
FILE *data, *lock;
char number_string[10]; /* won't have a number greater than 10 digits */
int number;
/* read data */
data = fopen(datafile,"r");
fgets(number_string,10,data);
close(data);
number = atoi(number_string);
number++;
/* check for lockfile */
while (fopen(lockfile,"r") != NULL) {
close(lockfile);
sleep(2);
}
/* create lockfile */
lock = fopen(lockfile,"w");
close(lockfile);
/* write new value */
data = fopen(datafile,"w");
fprintf(data,"%d\n",number);
fclose(data);
/* delete lockfile */
unlink(lockfile);
return number;
}
int main()
{
int accesses = increment();
printf("Content-Type: text/html\r\n\r\n");
printf("%d",accesses);
}
Once again, the difficulty of this program was in keeping track of the accesses. After I wrote that function, the text output function consisted of no more than a few lines. However, the graphical counter is a slightly more complex task because counter.cgi must now generate graphical numbers.
How can you embed this information in an HTML document? Because the counter works only for one specific document, the best solution would be to have the CGI program output the entire document in addition to the counter information. However, because I generalize counter.cgi in the next chapter, for now I use a server-side include to display the results. The HTML document might look like the one in Listing 4.20.
Listing 4.20. count.html.
<html> <head>
<title>Home Page</title>
</head>
<body>
<h1>Home Page</h1>
<p>You are the
<!--#exec cgi="/cgi-bin/text-counter.cgi">
person to access this page.</p>
</body> </html>
If you wanted to embellish your text counter a little, you could have it comma-separate the numbers. Or you could write out the number in words rather than in digits. You can add these embellishments simply by adding another function; drastic code change is not necessary.
Displaying a graphical counter requires a few more steps:
You can use any number of methods to generate the graphic. You could have pregenerated digit graphics (that either you have drawn yourself or that you have downloaded from someone else), or you could generate the graphic from scratch or by using a library (such as the gd library).
Tip |
A good source for GIF images of counter digits is the Digit Mania home page at URL:http://www.comptons.com/digits/digits.htm. |
I will use the XBM format, a fairly simple way of describing images (see the sidebar titled "The X Bitmap (XBM) Image Format" for a description of the XBM format). I have a predefined set of XBM. Listing 4.21 defines ten digits in XBM format.
Listing 4.21. Digits in XBM format.
/* taken from http://www.math.unh.edu/~black/resource/bitmapCounter.h */
#define digit_width 8
#define digit_height 12
static char *digits[10][12] = {
{"0x7e", "0x7e", "0x66", "0x66", "0x66", "0x66",
"0x66", "0x66", "0x66", "0x66", "0x7e", "0x7e"},
{"0x18", "0x1e", "0x1e", "0x18", "0x18", "0x18",
"0x18", "0x18", "0x18", "0x18", "0x7e", "0x7e"},
{"0x3c", "0x7e", "0x66", "0x60", "0x70", "0x38",
"0x1c", "0x0c", "0x06", "0x06", "0x7e", "0x7e"},
{"0x3c", "0x7e", "0x66", "0x60", "0x70", "0x38",
"0x38", "0x70", "0x60", "0x66", "0x7e", "0x3c"},
{"0x60", "0x66", "0x66", "0x66", "0x66", "0x66",
"0x7e", "0x7e", "0x60", "0x60", "0x60", "0x60"},
{"0x7e", "0x7e", "0x02", "0x02", "0x7e", "0x7e",
"0x60", "0x60", "0x60", "0x66", "0x7e", "0x7e"},
{"0x7e", "0x7e", "0x66", "0x06", "0x06", "0x7e",
"0x7e", "0x66", "0x66", "0x66", "0x7e", "0x7e"},
{"0x7e", "0x7e", "0x60", "0x60", "0x60", "0x60",
"0x60", "0x60", "0x60", "0x60", "0x60", "0x60"},
{"0x7e", "0x7e", "0x66", "0x66", "0x7e", "0x7e",
"0x66", "0x66", "0x66", "0x66", "0x7e", "0x7e"},
{"0x7e", "0x7e", "0x66", "0x66", "0x7e", "0x7e",
"0x60", "0x60", "0x60", "0x66", "0x7e", "0x7e"},
};
Each list of items in this array represents the bitmap values for an 8¥12 digit. In order to access the values of the digit n, you would access digit[n][].
For the image counter, instead of printing the numbers, I print the appropriate values of the digits array. This will send an image rather than text. The complete program in C is in Listing 4.22.
Listing 4.22. image-counter.cgi.c.
#include <stdio.h>
#include <stdlib.h>
#define datafile "/usr/local/etc/httpd/htdocs/counter.data";
#define lockfile "/usr/local/etc/httpd/htdocs/counter.LOCK";
#define counter_width 7
#define digit_width 8
#define digit_height 12
static char *digits[10][12] = {
{"0x7e", "0x7e", "0x66", "0x66", "0x66", "0x66",
"0x66", "0x66", "0x66", "0x66", "0x7e", "0x7e"},
{"0x18", "0x1e", "0x1e", "0x18", "0x18", "0x18",
"0x18", "0x18", "0x18", "0x18", "0x7e", "0x7e"},
{"0x3c", "0x7e", "0x66", "0x60", "0x70", "0x38",
"0x1c", "0x0c", "0x06", "0x06", "0x7e", "0x7e"},
{"0x3c", "0x7e", "0x66", "0x60", "0x70", "0x38",
"0x38", "0x70", "0x60", "0x66", "0x7e", "0x3c"},
{"0x60", "0x66", "0x66", "0x66", "0x66", "0x66",
"0x7e", "0x7e", "0x60", "0x60", "0x60", "0x60"},
{"0x7e", "0x7e", "0x02", "0x02", "0x7e", "0x7e",
"0x60", "0x60", "0x60", "0x66", "0x7e", "0x7e"},
{"0x7e", "0x7e", "0x66", "0x06", "0x06", "0x7e",
"0x7e", "0x66", "0x66", "0x66", "0x7e", "0x7e"},
{"0x7e", "0x7e", "0x60", "0x60", "0x60", "0x60",
"0x60", "0x60", "0x60", "0x60", "0x60", "0x60"},
{"0x7e", "0x7e", "0x66", "0x66", "0x7e", "0x7e",
"0x66", "0x66", "0x66", "0x66", "0x7e", "0x7e"},
{"0x7e", "0x7e", "0x66", "0x66", "0x7e", "0x7e",
"0x60", "0x60", "0x60", "0x66", "0x7e", "0x7e"},
};
int increment()
{
FILE *data, *lock;
char number_string[10]; /* won't have a number greater than 10 digits */
int number;
/* read data */
data = fopen(datafile,"r");
fgets(number_string,10,data);
close(data);
number = atoi(number_string);
number++;
/* check for lockfile */
while (fopen(lockfile,"r") != NULL) {
close(lockfile);
sleep(2);
}
/* create lockfile */
lock = fopen(lockfile,"w");
close(lockfile);
/* write new value */
data = fopen(datafile,"w");
fprintf(data,"%d\n",number);
fclose(data);
/* delete lockfile */
unlink(lockfile);
return number;
}
int main()
{
int number = increment();
int i,j,numbers[counter_width];
/* convert number to numbers[] */
for (i = 0; i <= counter_width; i++) {
numbers[counter_width - i] = number % 10;
number = number / 10;
}
/* print the CGI header */
printf("Content-Type: image/x-xbitmap\r\n\r\n");
/* print the width and height values */
printf("#define counter_width %d\n",counter_width * digit_width);
printf("#define counter_height %d\n",digit_height);
/* now print the bitmap */
printf("static char counter_bits[] = {\n");
for (j = 0; j <= digit_height; j++) {
for (i = 0; i <= counter_width; i++) {
printf("%s",digits[numbers[i]][j]);
if ((i < counter_width - 1) || (j < digit_height - 1))
printf(", ");
}
printf("\n");
}
printf("}\n");
}
The X Bitmap (XBM) Image Format |
The X Window system (available on most UNIX platforms) uses a default image format called X Bitmap (or XBM ). This is a simple bitmap format for defining black-and-white images. XBMs are in a C-style format: #define imagename_width 8 imagename is the name of the image. The two macros-imagename_width and imagename_height-define the width and height of the graphic. The array of bits-imagename_bits[]-represents the bitmap in a hexadecimal format. For example, you can depict any black-and-white bitmap on a grid of Ons and Offs. Represent On with a one and Off with a zero. So an eight-pixel-by-one-pixel image might look like a list of eight digits of either one or zero, such as 00111100. This eight-digit number is the binary representation of 60. In hexadecimal notation, you can express this number as 3C or 0x3c in C notation. In order to determine the bits of your image, draw your image on a grid. The width of your grid must be a multiple of 8. If your image's width is not a multiple of 8, the extra information will just be ignored. For example: 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 This is a bitmap of a triangle pointing east. The first and second 8 bits of line 1 are 00000000, or in hexadecimal, 0x00. The first 8 bits of line 2 are 01100000, which is 0x60. So the first 3 bits of this bitmap would be static char arrow_bits[] = { 0x00, 0x00, 0x60 and so on, for the rest of the values. You will have 22 values total. X Bitmaps have been traditionally supported by graphical Web browsers because many of the original Web browsers were developed for X Window. You can safely use this simple image format to generate dynamic inline images for graphical browsers. The MIME type for XBM is image/x-xbitmap. |
In order to display the counter, you need to embed the CGI program within an <img> tag. Listing 4.23 shows an example.
Listing 4.23. Example of using counter.cgi in g-count.html.
<html> <head>
<title>Home Page</title>
</head>
<body>
<h1>Home Page</h1>
<p>This page has been accessed:
<img src="/cgi-bin/image-counter.cgi">
times.
</body> </html>
image-counter.cgi in its current state has the additional flaw of being useless with a text browser. It would be nice to combine the two counters and to improve it overall. Nevertheless, with no knowledge other than how CGI output works, you have managed to create a fairly sophisticated CGI application.
Normally, the server will parse CGI output and perform extra tasks before sending it to the Web browser. For example, if you use the Location header with a file on the file system, the server will locate that file and send it with normal document headers. If you don't specify a Status header, the server will determine one for you. If you do specify a status code and message, the server will reformat it into the appropriate HTTP header and send it to the client. For efficiency, the server will normally buffer the output and send it in chunks rather than in individual bytes.
Sometimes, you would rather have your CGI program communicate directly with the client rather than through the server. For example, if you want to avoid the extra cost of preparsing the CGI headers, or for some reason, if you don't want the output to be buffered (for example, in server-side push applications, as described in Chapter 14, "Proprietary Extensions").
Your CGI programs can communicate directly with the client using no parse headers (nph). An nph CGI program is responsible for sending all of the appropriate HTTP headers. For example, the first header of every nph CGI program must be the status of the transaction:
HTTP/1.0 200 Transaction ok
At minimum, an nph CGI must send an HTTP Status header and a Location or Content-Type header.
No parse header programs are normally specified by preceding the CGI program name with nph-.
CGI output requires printing headers and the data, separated by a blank line, to the standard output. After the server receives these headers, it will normally parse and process the headers before passing the information to the client. The exception to this is when your program is an nph CGI, in which case it communicates directly with the Web client.
The two most common CGI headers are the Location and Content-Type header. You will find that almost all of your CGI programs use the Content-Type header.
As you have seen from the examples in this chapter, you can write fairly sophisticated applications simply by understanding how CGI output works. In the next chapter, I finish my discussion of the CGI protocol by discussing how CGI input works.