HTML 4.0 Sourcebook:Introduction

To access the contents, click the chapter and section titles.

HTML 4.0 Sourcebook
(Publisher: John Wiley & Sons, Inc.)
Author(s): Ian S. Graham
ISBN: 0471257249
Publication Date: 04/01/98

Table of Contents

Introduction and Book Outline

In 1995, in the introduction to the first edition of this book, I stated that the World Wide Web had “taken the Internet by storm.” At the time I hoped this was appropriate, but felt in my heart that I was probably overstating things. In retrospect, or course, it was a gross understatement. Over the past three years—and three editions of this book—the Web has grown far beyond everyone’s expectations (well, perhaps not those of Marc Andreessen), to become one of the core technologies of the 1990s. Hundreds of thousands of companies now offer products and services via the Web, and the trickle of Web-related products available in late 1994 has grown, by late 1997, into a torrent. Purely Web-based companies such as Netscape Communications and Yahoo! are now market-valued in the billions of dollars—and these companies did not exist three years ago! Meanwhile, traditional software companies such as Oracle, Sun, and Microsoft have totally redesigned their product and business models, while noncomputer-related businesses, from news and entertainment to financial services, high technology, and manufacturing, are adopting these new technologies as a new paradigm for internal operations, and as a new way of communicating with clients and customers. This is not the simple swell of “a storm,” but a tidal wave that threatens—and promises—to change our society in ways that we cannot yet imagine.

This is because the World Wide Web model makes distributing and accessing any form of digital data easy and inexpensive for anyone—company or consumer—with profound implications for business, culture, and society. Thus, it is no surprise that seemingly “everyone” is now buying or downloading the latest in Web tools and is madly learning how to build pages so that they too can join this new electronic world. Indeed, this is probably why you have picked up this book—to learn about the tools, and how to build Web pages!

The Web Model

A tool may be easy to use, but usually requires skill and training to be used well. This is certainly true of the tools involved in preparing and distributing information via hypertext documents and Internet Web servers. Just as designing a book or magazine requires experience and knowledge in the tools of design and typography, preparing well-designed, useful, and reliable Web resources requires an in-depth understanding of how the tools that deliver these resources work, and how to use them well. The intention of this book, as with the first three editions, is to help you develop this understanding. Given a basic feeling for what the Internet is—simply a system, rather like a courier service, for communicating digital information from one place to another—there are four essential concepts that you need to understand:

Uniform Resource Locators, or URLs. These are the means by which Internet resources are addressed in the World Wide Web. If you want to specify a resource on the Internet, you specify its URL.

The HyperText Markup Language, or HTML. This is the markup language with which World Wide Web hypertext documents are written and is what allows you to create hypertext links, fill-in forms, etcetera. Writing good HTML documents involves both technical issues (proper construction of the document) and design issues (ensuring the information content is clearly presented to the user).

The HyperText Transfer Protocol (HTTP) and HTTP client-server interactions. HTTP servers are designed specifically to distribute hypertext documents, and you must know how the underlying HTTP protocol works if you are to take advantage of its powerful features.

Server-side resource processing. This lets a user with a Web browser interact with resources lying on an HTTP server, by providing a tunnel through the server to these resources. This can be either through the so-called Common Gateway Interface, or through special modules built into the server.

The goal of this book and its companion Web site (www.wiley.com/compbooks/graham) is to explain these main concepts, and give you the tools you need to develop your own high-quality World Wide Web products. The remainder of this introduction looks briefly at these components and explains their basic features, and outlines the organization of the book. A figurative summary of these different components and the relationships between them is found in Figure I.1.

Uniform Resource Locators

Uniform Resource Locators, or URLs, are a naming scheme for specifying how and where to find any Internet server resource, such as those available from HTTP, FTP, or WAIS servers. For example, the URL that references the important file bunny_hop.zip in the directory /pub/web/browsers on the FTP server ftp.banzai.net is simply:

<ftp://ftp.banzai.net/pub/web/browsers/bunny_hop.zip>

World Wide Web hypertext documents use URLs to reference other hypertext resources.

Figure I.1. Schematic diagram illustrating the essential components of the World Wide Web. The user’s tool is the browser, or user agent—the program that understands and displays HTML documents. The browser can interpret URLs to determine where a resource is, and can use the URL-specified protocol to retrieve the resource. One of the most important protocols is HTTP—most WWW servers use this protocol and are called HTTP or Web servers. Using a Web server’s CGI or Common Gateway Interface (or other, similar mechanisms), users can access other resources on the Web server, such as databases.

The HyperText Transfer Protocol

The HyperText Transfer Protocol, or HTTP, is an Internet communications protocol designed expressly for the rapid distribution of hypertext documents. Like other Internet tools such as FTP, WAIS, and Gopher, HTTP is a client-server protocol. In the client-server model a client program, running on the user’s machine, sends a message requesting service to a server program running on another machine on the Internet. The server responds to the request by sending a message back to the client. In exchanging these messages, the client and server use a well-understood protocol. FTP, WAIS, and Gopher are other examples of Internet client-server protocols, all of which are accessible to a World Wide Web browser. However, the HTTP protocol was designed expressly for hypertext document delivery. Today, almost all Web services are delivered via HTTP servers.

Server-Side Resource Processing

At the simplest level, HTTP servers simply “serve up” files when clients request them. However, HTTP servers support additional important features:

• The ability to return to the client information generated by other programs running on the server.

• The ability to take data sent from the client and pass this information on to other programs on the server for further processing.

The special server-side utilities that implement these features are often called gateway programs, as they usually act as a gateway between the HTTP server and other resources accessible to the Web server, such as databases. Just as a server can access many files, an HTTP server can access many different gateway programs; in both cases you specify which (file or program) resource you want through a URL.

The interaction between the server and these gateway programs is governed by the Common Gateway Interface (CGI) specifications. Using the CGI specifications, a programmer can easily write simple programs or scripts to process user queries, interrogate databases, make images that respond to mouse clicks, and so on.

Many servers also let you program gateway-like functionality directly into the server, for increased speed and performance.

The HyperText Markup Language

The HyperText Markup Language, or HTML, is the language used to prepare Web hypertext documents. These are the documents you distribute on the World Wide Web and are what your human clients actually see. HTML contains commands, called elements or tags, to mark text as headings, paragraphs, lists, quotations, and so on. It also has tags for including images within the documents, for including fill-in forms that accept user input, and, most importantly, for including hypertext links connecting the document being read to other documents or Internet resources such as WAIS databases or anonymous FTP sites. It is this last feature that allows the user to click on a string of highlighted text and access a new document, an image, or a movie file from a computer thousands of miles away. And how does the HTML document specify where this other document is? Through a URL, which is included in the HTML markup instructions and which is used by the user’s browser to find the designated resource.

What resources can URLs point to? They can be other HTML documents, pictures, sound files, movie files, or even database search engines. They can be downloadable programs in Java or other languages. They can be located on the user’s computer or anywhere on the Internet. They can be accessed from HTTP servers or from FTP, Gopher, WAIS or other servers. The URL is an immensely flexible scheme, and in combination with HTML, yields an incredibly powerful package for preparing a web of hypertext documents linked to each other and to Internet resources around the world. This image of interlinked resources is in fact the vision that gave rise to the name, World Wide Web.

Overview of the Book

This book is an introduction to HTML, URLs, HTTP, and the CGI interface and to the design and preparation of resources for delivery via the World Wide Web. It begins with the HTML language. Almost every resource you prepare will be presented through an HTML document, so that your HTML presentation is your “face” to the world. It is crucial that you know how to write accurate HTML, and that you understand the design issues involved in creating attractive, useful documents, if you are to make a lasting impression on your audience and present your information clearly and concisely. It won’t matter if your Internet resources are the best in the world if your presentation of them is badly designed, frustratingly slow to access, or difficult to use.

HTML is also an obvious place to start. You can write simple HTML documents and view them with a Web browser such as Internet Explorer, Netscape, Mosaic, or lynx without having to worry about CGI programs, HTTP servers, or other advanced features. You can also easily include, in your documents, URLs pointing to server resources around the world, and get used to how the system works: Browsers understand HTML hypertext anchors and the URLs they contain and have built-in software to talk to Internet servers using the proper protocols. You can accomplish a lot just by creating a few pages of HTML.

Chapter 1 is an elementary introduction to HTML and to the design issues involved in preparing HTML documents. This nontechnical chapter combines a brief overview of HTML with a discussion of some aspects of document design. The details of the HTML language and more sophisticated client-server issues are left to later chapters. Design issues are very important in developing good World Wide Web presentations. HTML documents are not like text documents, nor are they like traditional hypertext presentations, since they are limited by the varied capabilities of browsers and by the speed with which documents can be transported across the Internet. Chapter 2 discusses what this means in practice and gives guidelines for avoiding major HTML authoring mistakes. In most cases this is done using examples, with the important issues being presented in point form, so that you can easily extract the main points on first reading.

The issue of images and graphics also comes up often in Web page design: Images are an important addition to any Web page, either as simple images or as clickable imagemaps. However, they must be carefully processed to make them Web-friendly: The image files must be small, in the right format, and of the right “style” for display by computer. These and other image-related issues are discussed in Chapter 3.

At the same time, designing an HTML document collection is more than just writing pages—the design of a collection is critically important, and involves design issues that are not always apparent from the point of view of a single page. Chapter 4 looks in detail at the issues surrounding document collection design, and will help you through the process of designing a real document “web.”

Chapter 5 takes a more practical look at Web design issues, and describes how to go about planning and implementing a site (determining why you are building a site defining your audience, planning the site layout, etc.), cost analysis (how to estimate the costs of different site components), and maintenance (how to maintain the site, and how to estimate the costs of this process). This chapter helps to connect the theory of Chapters 1 through 4 with the practical realities of designing, building, and maintaining a Web collection.

One point that is emphasized throughout the book is the importance of using correct HTML markup constructions when you create your HTML documents. Although HTML is a relatively straightforward language, there are many important rules specifying where tags can be placed. Ensuring that your documents obey these rules is the only way you can guarantee that they will be properly displayed on the many different browsers your site visitors may use. All too often, writers prepare documents that look wonderful on one browser but end up looking horrible, or even unviewable, on others.

Although some general rules for constructing valid HTML are included in Chapters 1 and 5, Chapters 6 and 7 and the references therein should be used as detailed guides to correct HTML. In particular, Chapter 6 presents a detailed exposition of the current “definitive” version of HTML, known as HTML 4, and of the allowed nesting of the different HTML markup instructions. Chapter 7 continues along this line, but looks at more advanced features, such as framed documents, advanced HTML forms and tables, proprietary HTML extensions by browser vendors, document scripting (JavaScript), cascading style sheets, font embedding, and experimental HTML features that are not yet formally part of the “standard” HTML language, or that are not yet widely supported. You can use Chapter 6 as a guide for writing universally viewable HTML documents, and Chapter 7 as a guide to advanced features, and as a preview of coming attractions.

Of course HTML is only a beginning. To truly take advantage of the Web you need to understand the interaction between browsers and HTTP servers, and be able to write server-side gateway programs that take advantage of this interaction. These topics are covered in Chapters 8 through 11. Chapter 8 describes the URL syntax in detail, while Chapter 9 delves into the specifics of the HTTP protocol used to communicate with HTTP servers, and discusses the basics of HTTP server operation. Chapter 10 then describes the details of the Common Gateway Interface (CGI) specification for writing server-side programs that interface with an HTTP server. Chapter 11 gives several concrete and clearly explained examples of real-world CGI programs, to show how the issues from Chapters 8 through 10 affect gateway program design. This chapter also contains a detailed reference list of resources useful in developing CGI or other server-based applications—many of these resources are available right over the Web, just waiting for you to go and get them.

Chapters 6 through 11 are the technical core of this book, and will be useful reference material when you are writing HTML documents, JavaScript scripts, or CGI programs.

Book Notation

In this book, HTML element names are generally given in boldface capital letters, for example, DIV. Similarly, the names of URL schemes are given in a boldface lowercase type, as in the phrase http URLs. A monospace font is generally used for explicit examples of HTML or other code, as in <DIV CLASS=“foo”> to denote a specific DIV element tag. Also, JavaScript and Cascading Style Sheet code, as well as system environment variable names, are given in a Courier font. Program, directory, and file names are often given in italics, to make them stand out from the text and to reduce confusion. However, this is not always the case, and in many situations the names are given in a regular, non-italicized font, to make the text easier to read.

URL references are written using the standard text font. However, to make the text shorter and easier to read, the http:// portion has been omitted from all http URLs. Most browsers (in particular, Netscape Navigator and Internet Explorer) assume that strings typed into the Location (Netscape) or Address (Microsoft) windows are http URLs, if no other protocol is specified. With some other browsers you will need to explicitly add the http:// portion. And, of course, you always need to add the http:// when you use a URL as an HREF value in an HTML document!

The Companion Web Site

www.utoronto.ca/ian/books/

www.wiley.com/compbooks/graham/

For those of you familiar with the previous editions, this fourth edition has been both significantly expanded and brought up-to-date. Indeed, there was so much new material, that not everything made it into print! Instead, the companion Web site, available at either of the URLs listed at the beginning of this section, has been used as an “adjunct” to the book, containing the more time-sensitive material (such as lists of software resources and descriptions, or lists of defined MIME types), plus additional content that simply didn’t fit, or that simply “worked” better on the Web. For more information on what the site contains, go to the “About the Web Site” section at the end of the book.

Table of Contents

Products | Contact Us | About Us | Privacy | Ad Info | Home

Use of this site is subject to certain Terms & Conditions, Copyright © 1996-2000 EarthWeb Inc.
All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.