Click Here!
home account info subscribe login search My ITKnowledge FAQ/help site map contact us


 
Brief Full
 Advanced
      Search
 Search Tips
To access the contents, click the chapter and section titles.

HTML 4.0 Sourcebook
(Publisher: John Wiley & Sons, Inc.)
Author(s): Ian S. Graham
ISBN: 0471257249
Publication Date: 04/01/98

Bookmark It

Search this book:
 
Previous Table of Contents Next


Chapter 6
HTML in Detail

This chapter and Chapter 7 provide a detailed exposition of the HyperText Markup Language, or HTML. These chapters are written from a document developer’s point of view and are designed to help authors create well-designed, valid HTML documents. The two chapters present detailed descriptions of every HTML element and of the allowed hierarchical relationships amongst these elements. Although relatively straightforward, the material is most easily followed by those who already have a basic understanding of HTML, at the level outlined in Chapters 1 and 2 of this book. If things seem confusing, you should probably go back to these earlier chapters and review the examples given there.

This chapter is divided into 14 sections. The first explains the structural rules and design principles behind HTML and outlines specific features of the HTML markup model. The second section defines the terminology used, in this chapter and Chapter 7, for explaining the details of the HTML element specifications. The subsequent 12 sections break the elements into the following categories:

  Basic Structure (HTML, HEAD, and BODY)
  Head Meta-Information Elements (BASE, ISINDEX, LINK, META, SCRIPT, STYLE, TITLE)
  Body Text Block and Heading Elements (ADDRESS, BLOCKQUOTE, CENTER, P, etc.)
  Fill-In Forms (FORM and related elements)
  Lists and List-Related Elements (DL, UL, OL, DIR, MENU, DT, DD, LI)
  Tables and Tabular Structures (TABLE and related elements)
  Inclusion Elements (APPLET, IMG, OBJECT, etc.)
  Hypertext Relationship Elements (A, LINK)
  Inline Text-Formatting Elements (STRONG, CITE, B, I, etc.)
  Character-Like Elements (BR)
  Phrase-Level Meta-Information Elements (BASEFONT, MAP, AREA, SCRIPT)
  Special Elements (DEL, INS, NOSCRIPT)

Note that Figure 6.3 provides an alphabetical element-specific Table of Contents for Chapters 6 and 7, as a complement to the index at the back of the book.

This chapter focuses on the markup elements of HTML 4, which is the current, definitive version of HTML, and is the language version you should use to design universally viewable documents. However, several extensions to this version of HTML are in common use—some proprietary (Netscape- or Microsoft-specific, for the most part) and some part of the ongoing HTML standardization process. These elements—EMBED, FRAMEs, SPACER, BLINK, MARQUEE, and so on (there are a lot of them)—are described in Chapter 7. Similarly, some aspects of the HTML 4 specification (for example, Internationalized HTML, and new elements for FORM input mechanisms) are not yet widely supported. Discussion of these features is also deferred to Chapter 7. Chapter 7 also outlines other Web technologies that are strongly related to HTML, such as scripting languages (e.g., JavaScript), style sheets, and markup languages for mathematics. Note, however, that many of these more advanced/experimental features will not function on many of the browsers currently in use. This is an important consideration if you want your documents to be accessible to the widest possible audience.

The Basics of Markup Languages

As mentioned in Chapter 1, the HyperText Markup Language is designed to specify the logical organization and formatting of text documents, with extensions to include inline images, fill-in forms, embedded objects or programs, and hypertext links to other Internet resources. The goal of this approach has been a language that:

  Is not bound to a particular hardware or software environment
  Represents the logical structure of a document and not its presentation

This approach reflects the fact that, in a distributed environment like the Web, individuals viewing a document may use many different “browser” programs with very different formatting capabilities. For example, it is not terribly useful specifying that a portion of text must be presented with a 14 point Times Roman font, if the person “viewing” the document is using a Braille reader. For this reason, HTML does not specify details of the document typesetting and instead marks text according to its logical meaning, such as heading, list, or paragraph. The details of the presentation of these elements are left to the browser, which uses the logical description of the document to present the material in the best possible way. Thus, a well-designed HTML document can be intelligibly presented by graphical and nongraphical browsers, or even by nonvisual browsers, such as one that presents the content via Braille.

Defining HTML: The Document Type Definition

The rules of HTML are defined using the Standard Generalized Markup Language, or SGML. SGML, an international standard specified by the International Standards Organization (ISO), is an extremely sophisticated tool for defining markup languages that describe structured documents—HTML is just one of many languages that are defined using SGML. The details of SGML are complex, and fortunately not critical to an HTML document developer. One component that is useful, however, is the definition of the HTML syntax, contained in a special SGML document called a Document Type Definition, or DTD. This is a simple text file, often having an imaginative name like html.dtd. The HTML DTD can be used, in combination with SGML parsing programs such as sgmls, to validate the syntax of any HTML document—that is, it can check for HTML markup errors in a document and let the author know where they are. The “References” section at the end of this chapter suggests places where you can obtain the official DTD file for HTML, while “Web Management and Maintenance Tools” on the companion Web site discusses document validation using sgmls.

This book is a guide to authoring HTML documents, and although it provides a quite complete description of HTML, it should not be considered the definitive reference for the language. For comprehensive details, you should read the Internet Engineering Task Force (IETF) and World Wide Web Consortium (W3C) documents listed in the “References” section at the end of this chapter and at the end of Chapter 7.

Typing Characters in HTML Documents

As discussed in Chapter 1, an HTML document is just a text document containing printable characters and can be created and edited with any text editor. Of course, when a text editor creates a file, it stores the characters as binary codes, where the relationship between the typed characters and the binary codes is called the character encoding. On current computers, the most common character encoding is known as ISO Latin-1 (also known as ISO 8859-1—see Appendix A on the companion Web site for more information about ISO Latin-1 characters, character encodings, and character sets in general). Latin-1 is an 8-bit character encoding that encodes 256 different characters (28=256): These characters, listed in Table A.1, consist of the 128 (27=128) characters defined in 7-bit US-ASCII character encoding (also known as ISO 646) plus 128 additional characters that use the eighth bit. The ASCII characters are essentially those found on standard US keyboards, while the additional 128 consist of accented and other characters common in western European languages.

There are many other character encodings, optimized for different languages or writing systems (e.g., Cyrillic, Arabic, Japanese, Chinese, Korean, etc.). However, Latin-1 is by far the most common encoding in Web documents—indeed, until recently, it was the only encoding formally supported by the Web protocols. For these reasons, when receiving an HTML document, most Web browsers assume the document to be encoded using Latin-1, unless explicitly told otherwise. Appendix A on the companion Web site describes how a server (or a document) can indicate that a document was created using a different encoding system.


Previous Table of Contents Next


Products |  Contact Us |  About Us |  Privacy  |  Ad Info  |  Home

Use of this site is subject to certain Terms & Conditions, Copyright © 1996-2000 EarthWeb Inc.
All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.