This appendix provides a reference for the CGI protocol and related variables, including MIME types, environment variables, and hexadecimal encoding for nonalphanumeric characters.
To output something from a CGI application, print to stdout. You format output as follows:
headers
body/data
Headers consist of the HTTP header's name followed by a colon, a space, and the value. Each header should end with a carriage return and a line feed (\r\n), including the blank line following the headers.
Header name: header value
A CGI header must contain at least one of the following headers:
Location: URI
Content-Type: MIME type/subtype
Status: code message
You can include additional headers, including any HTTP-specific
headers (such as Expires
or Server) and any custom
headers. See Chapter 4,"Output,"
for a discussion of the Location
header. Table A.1 lists the status codes, which tell the client
whether the transaction was successful or not and what to do next.
See Chapter 8, "Client/Server Issues,"
for more about status codes.
The request was successful and a proper response has been sent. | |
If a resource or file has been created by the server, it sends a 201 status code and the location of the new resource. Of the methods GET, HEAD, and POST, only POST is capable of creating new resources (for example, file uploading). | |
The request has been accepted although it might not have been processed yet. For example, if the user requested a long database search, you could start the search, respond with a 202 message, and inform the user that the results will be e-mailed later. | |
The request was successful but there is no content to return. | |
The requested document has a new, permanent URL. The new location should be specified in the Location header. | |
The requested document is temporarily located at a different location, specified in the Location header. | |
If the client requests a conditional GET (that is, it only wants to get the file if it has been modified after a certain date) and the file has not been modified, the server responds with a 304 status code and doesn't bother resending the file. | |
The request was bad and incomprehensible. You should never receive this error if your browser was written properly. | |
The client has requested a file that requires user authentication. | |
The server understands the request but refuses to fulfill it, most likely because either the server or the client does not have permission to access that file. | |
The requested file is not found. | |
The server experienced some internal error and cannot fulfill the request. You often will see this error if your CGI program has some error or sends a bad header that the server cannot parse. | |
The command requested has not been implemented by the server. | |
While the server was acting as a proxy server or gateway, it received an invalid response from the other server. | |
The server is too busy to handle any further requests. |
MIME headers look like the following:
type/subtype
where a type is any one of the following:
The subtype provides
specific information about the data format in use. A subtype preceded
by an x- indicates an experimental
subtype that has not yet been registered. Table A.2 contains several
MIME type/subtypes. A complete list of registered MIME types is
available at URL: ftp://ftp.isi.edu/in-notes/iana/assignments/media-types.
Type/Subtype | Function |
text/plain | Plain text. By default, if the server doesn't recognize the file extension, it assumes that the file is plain text. |
text/html | HTML files. |
text/richtext | Rich Text Format. Most word processors understand rich text format, so it can be a good portable format to use if you want people to read it from their word processors. |
text/enriched | The text enriched format is a method of formatting similar to HTML, meant for e-mail and news messages. It has a minimal markup set and uses multiple carriage returns and line feeds as separators. |
text/tab-separated-values | Text tab delimited format is the simplest common format for databases and spreadsheets. |
text/sgml | Standard General Markup Language. |
image/gif | GIF images, a common, compressed graphics format specifically designed for exchanging images across different platforms. Almost all graphical browsers display GIF images inline (using the <img> tag). |
image/jpeg | JPEG is another popular image compression format. Although a fairly common format, JPEG is not supported internally by as many browsers as GIF is. |
image/x-xbitmap | X bitmap is a very simple pixel-by-pixel description of images. Because it is simple and because most graphical browsers support it, it can be useful for creating small, dynamic images such as counters. Generally, X bitmap files have the extension .xbm. |
image/x-pict | Macintosh PICT format. |
image/tiff | TIFF format. |
audio/basic | Basic 8-bit, ulaw compressed audio files. Filenames usually end with the extension .au. |
audio/x-wav | Microsoft Windows audio format. |
video/mpeg | MPEG compressed video. |
video/quicktime | QuickTime video. |
video/x-msvideo | Microsoft Video. Filenames usually end with the extension .avi. |
application/octet-stream | Any general, binary format that the server doesn't recognize usually uses this MIME type. Upon receiving this type, most browsers give you the option of saving the data to a file. You can use this MIME type to force a user's browser to download and save a file rather than display it. |
application/postscript
application/atomicmail application/andrew-inset | PostScript files. |
application/rtf | Rich Text Format (see text/richtext above). |
application/applefile
application/mac-binhex40 application/news-message-id application/news-transmission |   |
application/wordperfect5.1 | WordPerfect 5.1 word processor files. |
application/pdf | Adobe's Portable Document Format for the Acrobat reader. |
application/zip | The Zip compression format. |
application/macwriteii | Macintosh MacWrite II word processor files. |
application/msword | Microsoft Word word processor files. |
application/mathematica
application/cybercash |   |
application/sgml | Standard General Markup Language. |
multipart/x-www-form-urlencoded | Default encoding for HTML forms. |
multipart/mixed | Contains several pieces of many different types. |
multipart/x-mixed-replace | Similar to multipart/mixed except that each part replaces the preceding part. Used by Netscape for server-side push CGI applications. |
multipart/form-data | Contains form name/value pairs. Encoding scheme used for HTTP File Upload. |
As an example, the header you'd use to denote HTML content to follow would be
Content-Type: text/html
No-Parse Header (nph) CGI programs communicate directly with the Web browser. The CGI headers are not parsed by the server (hence the name No-Parse Header), and buffering is usually turned off. Because the CGI program communicates directly with the browser, it must contain a valid HTTP response header. The first header must be
HTTP/1.0 nnn message
where nnn is the three-digit status code and message is the status message. Any headers that follow are standard HTTP headers such as Content-Type.
You generally specify NPH programs by preceding the name of the program with nph-.
Note that HTTP is at version 1.0 currently, but 1.1 is being worked on as this book is being written, and some features and headers from 1.1 have already been implemented in some browsers and servers.
CGI applications obtain input using one or a combination of three methods: environment variables, standard input, and the command line.
ISINDEX enables you to enter keywords. The keywords are appended to the end of the URL following a question mark (?) and separated by plus signs (+). CGI programs can access ISINDEX values either by checking the environment variable QUERY_STRING or by reading the command-line arguments, one keyword per argument.
CGI environment variables provide information about the server,
the client, the CGI program itself, and sometimes the data sent
to the server. Tables A.3 and A.4 list some common environment
variables.
Environment Variable | |
GATEWAY_INTERFACE | Describes the version of CGI protocol. Set to CGI/1.1. |
SERVER_PROTOCOL | Describes the version of HTTP protocol. Usually set to HTTP/1.0. |
REQUEST_METHOD | Either GET or POST, depending on the method used to send data to the CGI program. |
PATH_INFO | Data appended to a URL after a slash. Typically used to describe some path relative to the document root. |
PATH_TRANSLATED | The complete path of PATH_INFO. |
QUERY_STRING | Contains input data if using the GET method. Always contains the data appended to the URL after the question mark (?). |
CONTENT_TYPE | Describes how the data is being encoded. Typically application/x-www-form-urlencoded. For HTTP File Upload, it is set to multipart/form-data. |
CONTENT_LENGTH | Stores the length of the input if you are using the POST method. |
SERVER_SOFTWARE | Name and version of the server software. |
SERVER_NAME | Host name of the machine running the server. |
SERVER_ADMIN | E-mail address of the Web server administrator. |
SERVER_PORT | Port on which the server is running-usually 80. |
SCRIPT_NAME | The name of the CGI program. |
DOCUMENT_ROOT | The value of the document root on the server. |
REMOTE_HOST | Name of the client machine requesting or sending information. |
REMOTE_ADDR | IP address of the client machine connected to the server. |
REMOTE_USER | The username if the user has authenticated himself or herself. |
REMOTE_GROUP | The group name if the user belonging to that group has authenticated himself or herself. |
AUTH_TYPE | Defines the authorization scheme being used, if any-usually Basic. |
REMOTE_IDENT | Displays the username of the person running the client connected to the server. Works only if the client machine is running IDENTD as specified by RFC931 |
Environment Variable | |
HTTP_ACCEPT | Contains a comma-delimited list of MIME types the browser is capable of interpreting. |
HTTP_USER_AGENT | The browser name, version, and usually its platform. |
HTTP_REFERER | Stores the URL of the page that referred you to the current URL. |
HTTP_ACCEPT_LANGUAGE | Languages supported by the Web browser; en is English. |
HTTP_COOKIE | Contains cookie values if the browser supports HTTP cookies and currently has stored cookie values. A cookie value is a variable that the server tells the browser to remember to tell back to the server later. |
A full list of HTTP 1.0 headers can be found at the following location:
http://www.w3.org/hypertext/WWW/protocols/HTTP/1.0/spec.html
Input from forms is sent to the CGI application using one of two methods: GET or POST. Both methods by default encode the data using URL encoding. Names and their associated values are separated by equal signs (=), name/value pairs are separated by ampersands (&), and spaces are replaced with plus signs (+), as follows:
name1=value1&name2=value2a+value2b&name3=value3
Every other nonalphanumeric character is URL encoded. This means that the character is replaced by a percent sign (%) followed by its two-digit hexadecimal equivalent. Table A.5 contains a list of nonalphanumeric characters and their hexadecimal values.
The GET method passes the encoded input string to the environment variable QUERY_STRING. The POST method passes the length of the input string to the aenvironment variable CONTENT_LENGTH, and the input string is passed to the standard input.