- 52 -

CGI Scripts

by Tim Parker

IN THIS CHAPTER

If you do any work with the World Wide Web, you will come across the term CGI, or Common Gateway Interface. Although we can't hope to cover all you need to know about CGI in a chapter, we can look at what CGI is and does, why you want to use it, and how to go about using it.

If you get involved in doing more than simple Web page design (we look at HTML and Java in the next couple of chapters), you will eventually end up using CGI in some manner, usually to provide extra functionality to your Web pages. For that reason, and so that you will know just what the term means, we need to look at CGI in a bit of detail.

What Is CGI?

You now know what CGI stands for--Common Gateway Interface--but that doesn't help you a lot when it comes to understanding what CGI does. The name is a little misleading. Essentially, CGI is involved in an application that takes its starting commands from a Web page. For example, you might have a button on your Web page that launches a program to display statistics about how many people have visited your Web site. When the button is clicked, an HTML command starts a program that performs the calculation for you. CGI is involved in the interface between the HTML code and the application, and it allows you to send information back and forth between the HTML code and other applications that aren't necessarily part of the Web page.

CGI does more than that, but it is usually involved in applications that interface between a Web page and a non-Web program. CGI programs don't have to be started from a Web page, but they often are because a CGI program has a special set of environment conditions that involve interactions between components that are otherwise hard to simulate.

What does that mean? When you run a Web page written in HTML, the Web server sets up some environment variables that control how the server operates. These environment variables are used to control and pass information to programs, as well as many other operations. When a person clicks a button on your Web page to launch an external application, those environment variables are used to pass parameters to the program (such as who is starting the application or what time it is). When the application sends information back to the Web server, that information is passed back through variables.

So when we talk about CGI programming, we really mean writing programs that involve an interface between HTML and some other program. CGI deals with the interface between the Web server and the application (hence the "interface" in the name).

What's so exciting about this? In reality, the number of behaviors you can code on a Web page in HTML is somewhat limited. CGI lets you push past those barriers to code just about anything you want, and have it interact properly with the Web page. So if you need to run custom statistics on your Web page based on a client's data, you can do it through CGI. CGI can pass the information to the numbers-crunching application and then pass the results back to HTML for display on the Web page, to take a simple example. In fact, there's a whole mess of things you can do on even the simplest Web page when you start using CGI, and that is why it is so popular.

The CGI is usually built into the Web server, although it's not required to exist in all Web servers. Luckily, almost every server on the market (except the very early servers and a few stripped-down ones) contain the CGI code. The latest versions of the Web servers from NCSA, Netscape, CERN, Apache, and many others all have CGI built in.

CGI and HTML

To run a CGI application from a Web page, you make a request to the Web server to run the CGI application. This request is made through a particular method that is responsible for invoking CGI programs. (A method is a procedure or function.) Many methods are built into HTTP (HyperText Transfer Protocol, the protocol used by the World Wide Web); the method used to call the CGI application depends on the type of information you want to transfer. We'll come back to methods in a moment, after we look at how the CGI code is embedded in the HTML for the Web page.

As you will see in the next chapter, HTML involves the use of a bunch of tags. To call a CGI program, a tag is used that gives the name of the program, as well as the text that will appear on the Web page when the HTML code is executed. For example, the HTML tag

<a href="crunch_numbers"> Click here to display statistics </a>

displays the message Click here to display statistics on the Web page. When the user clicks there, the program called crunch_numbers is called. (The <a> and </a> HTML tags are "anchor" tags that indicate a link to something else. Wherever the tag is positioned in the rest of the HTML code dictates exactly how the page will look on a Web browser.)

As you will see when we look at HTML in the next chapter, you can even use hyperlinks to call a program on another machine by supplying the domain name. For example, the HTML tag

<a href="www.tpci.com/stats.cgi"> Display Statistics </a>

displays the message Display Statistics on whatever Web page the code runs on. When it is selected by the user, the program stats.cgi on the Web server www.tpci.com is located and run. This server could be across the country--it doesn't matter to either HTML or CGI, as long as the reference can be resolved.

Three kinds of methods are normally used to call a CGI application: the GET, HEAD, and POST methods (all are part of HTTP). They differ slightly in when you use them. We will look at each method briefly so that you know what each does and when it is used.

A GET method is used when the CGI application is to receive data in an environment variable called QUERY_STRING. The application reads this variable and decodes it, interpreting what it needs in order to perform its actions. The GET method is usually used when the CGI application has to take information but doesn't change anything.

The HEAD method is much the same as the GET method, except that the server only transmits HTTP headers to the client. Any information in the body of the message is ignored. This method can be useful when you need to handle only a user ID, for example.

The POST method is much more flexible and uses stdin (standard input) to receive data. A variable called CONTENT_LENGTH tells the application how much of the data coming into the standard input is important so that it knows when all the data has arrived. The POST method was developed to allow changes to the server, but many programmers use POST for almost every task to avoid the truncation of URLs that can occur with GET.

Various environment variables are used by CGI, most of which are covered in much more detail in any CGI programming book. Describing all the variables here without showing you how to use them would be a bit of a waste.

CGI and Perl

If you do get into CGI programming, you will probably find that most of it is done in the Perl programming language (which we looked at in Chapter 29, "Perl"). CGI programming can be done in any language (and many Web page designers like C, C++, or Visual Basic because they are more familiar with those languages), but Perl seems to have become a favorite among UNIX Web programmers. Shell scripts are also popular under UNIX (and hence Linux), but they are not portable to other operating systems.

Perl's popularity is easy to understand when you know the language: It's powerful, simple, and easy to work with. Perl is also portable, which means you can develop CGI programs on one machine and move them without change to another platform.

Many Perl CGI scripts can be found on the Web. A quick look with a search engine such as AltaVista will usually reveal hundreds of examples that can be downloaded and studied. For example, one of the most commonly used Perl scripts is called GuestBook. Its role is to allow users of your Web site to sign into a guest book and leave a comment about your Web pages. Usually, the guest book records the user's name and e-mail address, her location (normally a city and state or province), and any comments she wants to make. Guest books are a good way to get feedback on your Web pages, and they also make those pages a little more friendly.

When run, the GuestBook CGI program displays a form that the user can fill in, and it then updates your server's database for you. Various versions of GuestBook can be found around the Web, but a sample browser display showing the GuestBook Perl CGI script is shown in Figure 52.1.

Each GuestBook Perl script looks slightly different, but the one shown in Figure 52.1 is typical. The information entered by the user is stored in the server's database for the administrator there to read.

FIGURE 52.1. A sample GuestBook Perl script requesting information about the user.

Figure 52.2 shows another Web page with a bunch of sample CGI programs launched from a menu. The selection for the domain-name lookup shown in Figure 52.2 results in the CGI application doing a bunch of standard HTTP requests to the server and client, displaying the results shown in Figure 52.3. As you can see, the output shown in Figure 52.3 is in standard font and character size, and no real attempt has been made to produce fancy formatting. This is often adequate for simple CGI applications.

FIGURE 52.2. A Web page with some sample CGI applications, a mix of Perl and C, with the domain-name CGI sample ready to launch.

The Perl CGI scripts are not complicated. The top example (Who Are You?) in the demonstration page shown in Figure 52.2 looks up your information through an HTTP request. The Perl code for this is shown in Figure 52.4, displayed through Netscape. As you can see, only a few lines of code are involved. Any Perl programmer can write this type of CGI application quickly.

FIGURE 52.3. The domain-name lookup Perl CGI script results in this screen for the author's machine.

FIGURE 52.4. The Perl source code for the Who Are You? application shown in Figure 52.2.

Summary

CGI programming is easy to do, especially with Perl, and adds a great deal of flexibility to your applications. When you feel comfortable writing HTML code and developing your own Web pages (which we can't explain in this book because of space restrictions), you should try your hand at CGI programming and really put some zing into your Web site.