|
|
|
To access the contents, click the chapter and section titles.
HTML 4.0 Sourcebook
(Publisher: John Wiley & Sons, Inc.)
Author(s): Ian S. Graham
ISBN: 0471257249
Publication Date: 04/01/98
Chapter 10 Data Processing on an HTTP Server
Having an HTTP server to deliver documents is all well and good, but the true power of the Web is only unleashed when you add dynamic content and user interaction. This means that the server must do more than just deliver data: It must be able to dynamically process and deliver content and respond to complex data sent to the server by a user.
The HTTP protocol, through the GET, POST, and PUT methods, provides many mechanisms for sending user-selected data to the server. But, what to do with the data when it arrives? As mentioned earlier, an HTTP server generally does not itself process these data; in fact, it would be impossible to write a server that came prepared to do all the special processing everyone would want. Instead, servers come with generic tools that let local server administrators add data processing functionality in a locally customizable way. The traditional method is via the Common Gateway Interface (CGI), which is a mechanism that can link a running HTTP server with completely separate programs, known as gateway programs, that do this second level of processing. This is still the most commonly used mechanism and is the main topic of this chapter. Many modern servers also support server programming interfaces, which allow for special processing modules that can be compiled and linked to the server. This is a bit like adding CGI right into the server, eliminating the separation between server and gateway processes. The comparative advantages and disadvantages of this alternate approach are also discussed in this chapter.
The Common Gateway Interface
The Common Gateway Interface (CGI) is the specified Web standard for communication between an HTTP server and server-side gateway programs. When a URL is accessed that references a gateway program, the server launches this gateway program as a separate running process and passes to it any ISINDEX, FORM, or other data sent by the client. When the gateway program finishes processing the data, it sends the results back to the server, which in turn forwards these data to the client that made the initial request. The CGI specifications define how these data are passed from the server to the gateway program, and vice versa. This data flow is schematically illustrated in Figure 10.1.
Server Applications Programming Interfaces
Gateway programs are ideal for many problems, as they can be easily added without modifying the HTTP server software. However, flexibility comes at the expense of speed and scalability: Starting up a gateway program involves significant operating system overhead, which can slow server response, particularly when the demand for a CGI program rises and the system tries to run a number of CGI programs in parallel. Most modern servers support linked-in modules, written in C or other compiled languages, to incorporate gateway-like processing right into the server. In a similar vein, certain Netscape HTTP servers support compiled Java modules, through a special Java interface incorporated into the server, while Microsoft servers support server-side Active-X components.
In all cases, these modules are written using a special server applications programming interface, or API, which is the software interface that links the modules to the underlying server. Unfortunately, the APIs used by each server vendor (Netscape, Microsoft, Apache, or other) are different and incompatible, so that modules written for one server do not work with another. However, this is the approach to use if you want fast server response for things like transaction processing or if you have generic and commonly used CGI functionality (such as imagemapping) that can be easily incorporated into the server. This chapter does not discuss module programming and designthe References section lists online documentation on these topics.
Figure 10.1 Schematic diagram illustrating the data flow between a client, an HTTP server, and a server-side CGI program.
Gateway Programming Languages
Gateway programs can be compiled programs written in languages, such as C, C++, or pascal, or they can be executable scripts written in languages such as perl, tcl, and the various shell programs. In fact, many gateway programs are perl scripts, since these are easy to write and modify and are easily transportable from machine to machine. In addition, execution speed is often not an important factor with gateway programs, since the slowest component is often the resource the gateway connects to and not the gateway program itself. After all, if a database takes many seconds to complete a query, it does not matter if the gateway program takes an extra millisecond to start up. Even when this is not the case, speed is usually not an issue, since most CGI programs are actually quite small and very fast to start.
This chapter first reviews how data are communicated between a client and server (using the HTTP protocol) and then discusses how data are communicated between a server and a gateway program (the CGI mechanisms). This is followed by five examples that explore the details of the CGI mechanisms for the relevant HTTP methods, namely GET and POST, and for different HTML user input tools, namely ISINDEX and FORM elements. Lastly, there are brief discussions of how data sent by a client are decoded in gateway programs and of security issues you should be aware of when writing gateway programs.
Chapter 11 follows up this overview with several CGI programming examples and also provides a list of CGI utility programs and libraries available over the Internet.
Communication with Gateway Programs
The CGI mechanisms describe what data are passed from a server to a server-side gateway program, and vice versa, and how they are passed. In general, all data that a client sends to an HTTP server are made available, using three CGI mechanisms, to a referenced gateway. In turn, a gateway program has two CGI mechanisms for returning data to the server and from there to the client. These mechanisms are discussed below.
|