HTML 4.0 Sourcebook:HTML in Detail

To access the contents, click the chapter and section titles.

HTML 4.0 Sourcebook
(Publisher: John Wiley & Sons, Inc.)
Author(s): Ian S. Graham
ISBN: 0471257249
Publication Date: 04/01/98

Table of Contents

Attribute Values as Literal Strings

Formally, HTML has two main mechanisms for handling values assigned to attributes: literal strings and name tokens. A literal string is just that—a string of characters to be accepted literally as typed by the author, including the preservation of case. Literal strings must be surrounded by double quotation marks, since otherwise the string may be prematurely ended at a space or other character. Literal strings can contain any sequence of printable characters, including HTML character and entity references—a browser will turn these references back into the desired characters. Clearly, you must use character or entity references to include the double quotation mark (e.g., ") inside an attribute value, since this character would otherwise be interpreted as marking the end of the string.

Most attributes that can be assigned arbitrary, author-defined strings, such as HREF and SRC (uniform resource locators), ALT (IMG elements), and NAME (fragment identifiers for anchor elements), are handled as literal strings.

Attribute Values as Name Tokens

Name tokens are restricted character strings that can only contain the letters a–z or A–Z, the numbers 0–9, periods (.), and hyphens (-), and must begin with a letter. Unlike literal strings, name tokens are case insensitive, so that the token abba is equivalent to ABbA . Because they are simple, name token attribute values do not need to be surrounded by quotation marks, as in the string “text” in the assignment TYPE=text. However, since it is never an error to include quotation marks, it is safest to leave them in.

Name tokens are used for values defined as part of HTML, such as the value “text” in the element <INPUT TYPE="text"...> .

The HTML DTD specifies whether an attribute value is a literal string or a name token and defines the allowed values for those attributes that take name tokens. For example, the HTML DTD states that the ALIGN attribute of an H1 element can take the values “left,” “right,” and “center,” but no others. Consequently, HTML validators (tools that check the validity of an HTML document) can check for incorrect values. This is not true with literal strings, and a validator has no way of knowing if a specified literal string is valid or not.

In this book, attribute values are, in general, placed inside quotation marks, since name tokens are always valid inside quotation marks, and the added quotation marks help them stand out from the regular text.

Browser Handling of HTML Errors, Unknown Elements, and Unknown Attributes

On the World Wide Web, browsers are generous in their handling of HTML documents. Thus, even if a document is badly constructed, for example with missing or misplaced tags, a browser will do the best it can to present the content. Often the document looks odd, due to the resulting formatting decisions, but from a user’s point of view this is infinitely better than seeing nothing at all. This emphasizes the importance, on the author’s side, of designing valid HTML documents that can be properly viewed by any browser.

At the same time, HTML is an evolving language, and new elements and attributes are constantly being added, either as part of the formal language development process (HTML 2.0, HTML 3.2, HTML 4, etc.) or as customized extensions introduced by browser designers. For such an evolution to work, there must be ways for browsers to handle HTML elements, or element attributes that they do not understand.

In general, browsers ignore elements, or element attributes, that they do not understand. For example, the BLINK element is Netscape-specific—on other browsers, the <BLINK> ... </BLINK> tags are usually ignored, and the enclosed text is rendered as regular text (given whatever other elements the string is inside). Similarly, in HTML 4, paragraphs can be centered using the ALIGN attribute, that is, <P ALIGN="center"> . If a browser does not understand this attribute or the value assigned to it, it simply ignores the attribute and use its own preferred paragraph alignment.

However, many elements will not produce readable documents if the browser does not understand the relevant tags. Particular examples are the FRAME (discussed in the next chapter) and TABLE elements. In general, you can assume that any new element that implies both logical and physical structure will be poorly displayed by a browser that does not understand the element.

Basic HTML Document Structure

Every HTML document has two main parts: a head, which contains information about the document, but which is not displayed to the reader; and a body, which contains the part of the document to be displayed by a browser. These parts are defined by the HEAD and BODY elements, respectively. The basic structure of all HTML documents is then (commentary in italics):

<HTML>
   <HEAD>
   ...elements valid in the document HEAD
   </HEAD>
   <BODY>
  ...elements valid in the document BODY
   </BODY>
</HTML>

Note how Figure 6.1 follows this outline. The outer HTML element declares the enclosed text to be an HTML document. Directly inside this lie the HEAD and the BODY. The BODY contains the text and associated HTML markup instructions of the material to be displayed. The HEAD, which must appear before the BODY, contains elements that define information about the document, such as its title. Certain elements can only appear in the HEAD, while others can only appear in the BODY.

The preceding structure is modified somewhat for FRAME-based documents. For these, the basic document structure is:

<HTML>
   <HEAD>
   ...elements valid in the document HEAD
   </HEAD>
   <FRAMESET ..>
      .....FRAME and FRAMESET Elements ...
   </FRAMESET>
   <NOFRAMES>
      <BODY>
      ...elements valid in the document BODY
      </BODY>
  </NOFRAMES>
</HTML>

where the special markup due to the frame-related elements is shown in boldface. Note how a BODY can be included within the NOFRAMES element—the content is displayed by browsers that do not understand FRAMESET. Frame elements are discussed in detail in Chapter 7.

BODY Content: Block and Inline Elements

BODY content elements are divided into two broad categories: block and inline. Block elements define blocks of text, such as paragraphs or tables, while inline elements define sections of text or inserted objects such as images or applets that appear inline within the text. In general, inline elements can appear inside a block element, but not vice versa: For example, an H1 heading (a block element) can contain an EM element (an inline element), whereas the opposite is not true.

The distinction between block and inline is particularly important for internationalized documents—that is, those containing a mix of different languages and writing systems—since the two groups of elements inherit text layout directionality (text laid out from left to right, or right to left) in different ways. Furthermore, the block or inline nature of an element can be altered using Cascading Style Sheets—CSS lets an author change the basic formatting characteristics of an element to be either block or inline. These issues are discussed in more detail in Chapter 7.

The new INS and DEL elements are an exception to this block/inline distinction. INS and DEL can appear anywhere within the BODY of the document or within any element lying inside the BODY and denote content that has been inserted or deleted relative to some other version of the document. Thus INS or DEL can be either inline or block elements, depending on context. Please see the sections describing these elements for more details.

Table of Contents

Products | Contact Us | About Us | Privacy | Ad Info | Home

Use of this site is subject to certain Terms & Conditions, Copyright © 1996-2000 EarthWeb Inc.
All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.