The World Wide Web
Why use CGI?
The Common Gateway Interface, or CGI, is a standard for communication between Web documents and CGI scripts you write. CGI scripting, or programming, is the act of creating a program that adheres to this standard of communication. A CGI script is simply a program that in some way communicates with your Web documents. Web documents are any kind of file used on the Web. They can be HTML documents, text files, image files, or any number of other file formats. The existence of this gateway between programs you write and your Web documents allows you to create much more dynamic and interactive Web pages than you could with HTML alone.This chapter will help you understand the role of CGI scripting within the World Wide Web and will show why you would want to use it. First, you will be introduced to some of the key elements and terminology of the Web, such as HTTP, URLs, HTML, and CGI. Then you will learn some of the advantages of CGI scripts.
The World Wide Web
Many people have heard of the World Wide Web, but not everyone knows what it is. Even people who use it may have trouble defining it precisely. The World Wide Web is a global collection of interconnected documents on the Internet. Because the World Wide Web has grown explosively and has been advertised so extensively, many people think it is the same thing as the Internet. However, the World Wide Web is only a part of the Internet.The Internet has been around for over three decades. It began as a Department of Defense program for enabling computers to communicate over great distances without requiring a central server to route the communications traffic. Since those early days, the Internet has grown substantially. Early on it was adopted by the academic community, and more recently it has been commercialized. The federal government no longer funds the Internet directly, leaving private and public telecommunications companies in charge of the major backbones-the major network connections of the Internet. The telecommunications companies charge Internet service providers for connections to the backbone, and Internet service providers in turn charge companies and individuals for their access to the Internet. The Internet itself is nothing more than an enormous number of networked computers all over the world. Like any computer network, the Internet has various software programs running on it, such as e-mail, newsgroups, FTP, gopher, and the World Wide Web.
The World Wide Web, or Web, was born in 1989 at CERN (the European Laboratory for Particle Physics). Since then, it has grown at a phenomenal rate. Today, Web traffic accounts for somewhere between one third and one half of the total traffic on the Internet. Because the Internet consists of many other sources of traffic, many of which have been around for decades, this is an impressive feat.
So, what is the Web? In simple terms, the Web is a part of the Internet that uses the Hypertext Transfer Protocol (HTTP) to display hypertext and images in a graphical environment. Hypertext refers to the ability to present text documents that are interlinked. You might click on a portion of the text in a document and be taken to another section of text in a different document. The Web is based on the concept of hypermedia, which is a superset of hypertext. Think of hypermedia as various forms of media (text, graphics, sound files, and so on) that are interlinked. For example, you could click on a text link in one document and display a graphic image. Figure 1.1 illustrates both a text link and an image link. Clicking on the word "resume" would take you to a page with the actor's rÈsumÈ, and clicking on the picture itself would take you to a larger version of the same image. In the early days of the Web, text links always had a different color of underlined text, and graphic links were always enclosed within a colored box. Now, however, the current shape of the mouse pointer gives you a better indication of what is and isn't a link. If the mouse pointer changes into a hand with the index finger extended, as shown below the "resume" link in Figure 1.1, the object being pointed to is a link to another document. Documents on the Web are interlinked so you can navigate between them by selecting links. The name World Wide Web alludes to the Web's spiderweb-like nature.
Figure 1.1: An example of a link
Clients and Servers
To understand the World Wide Web and CGI programming, you must understand the division between Web clients and Web servers and how HTTP facilitates the interaction between the two. Simply put, a server handles requests from various clients. For example, suppose you are using a word processing program to edit files on another computer. Your computer would be the client because it is requesting the file from another computer. The other computer would be the server because it is handling your computer's request. With networked computers, clients and servers are very common. A server typically runs on a different machine than the client, although this is not always the case. The interaction between the two usually begins on the client side. The client software requests an object or transaction from the server software, which either handles the request or denies it. If the request is handled, the object is sent back to the client software. On the World Wide Web, servers are known as Web servers, and clients are known as Web browsers. Web browsers request documents from Web servers, allowing you to view documents on the World Wide Web. There's a good chance that you have already used a Web browser. Some of the most common browsers are Netscape's Navigator, Microsoft's Internet Explorer, and NCSA's Mosaic. Like most software companies that distribute Web browsers, these companies also distribute Web server software.The process of viewing a document on the Web starts when a Web browser sends a request to a Web server. The Web browser sends details about itself and the file it is requesting to the Web server in HTTP request headers. The Web server receives and reviews the HTTP request headers for any relevant information, such as the name of the file being requested, and sends back the file with HTTP response headers. The Web browser then uses the HTTP response headers to determine how to display the file or data being returned by the Web server. (There's more information on these headers in Chapter 2.)
Note: This discussion barely scratches the surface of what is actually happening, but it is enough for our study of CGI scripting. If you want more details on HTTP headers, check Chapter 2 as well as the "Useful Web Pages" section of the Appendix.
When a Web browser requests a CGI script from a Web server, the server starts the CGI script and passes the HTTP request headers to it. The information stored in the request headers is available for your script to use. Normally, when a CGI script is finished executing, the output is passed back to the Web server, which formats an HTTP response header and sends the information to the Web browser. It is possible, however, for your CGI script to format the HTTP response header and send the data directly to the Web browser. You can use this approach to reduce the work load of your Web server.Whether the Web browser is requesting a file or a CGI script, the browser has to know the location of the Web server and the name of the file in order to make the request. With the millions of documents on the Web, you might wonder how the Web browser knows exactly where to look for the file you want to see. You probably also realize that many files on the Web have the exact same name. So how do the Web browsers get the correct document? Each file on the Web has a unique identifier that not only sets it apart from other documents but also describes where it is located. These unique identifiers are called uniform resource locators, or URLs.
Uniform Resource Locators
The uniform resource locator (URL) is like an address for Web documents. Every document on the Web has a unique URL, and each part of the URL pro