HTML 4.0 Sourcebook:HTML in Detail

To access the contents, click the chapter and section titles.

HTML 4.0 Sourcebook
(Publisher: John Wiley & Sons, Inc.)
Author(s): Ian S. Graham
ISBN: 0471257249
Publication Date: 04/01/98

Table of Contents

HTML as a MIME Type

As discussed in Example 5 in Chapter 2, all data communicated over the Web have an associated MIME content-type to indicate the type of the data. In particular, the HTTP protocol uses these content types to communicate the type of data being sent out (or received) by a server—-the appropriate content-type header field is included within the header message that precedes the data being sent. For example, a JPEG format image file being sent from an HTTP server to a client would have the message string

Content-Type: image/jpeg

as part of the HTTP header (see Chapter 9) that precedes the actual data. Similarly, when an HTML document is served, the header that precedes it contains the string

Content-Type: text/html

to indicate that the data are an HTML document and not just plain text. HTTP and MIME types are discussed in more detail in Chapter 9 and Appendix B found on the companion Web site.

DOCTYPE Public Text Identifier

As mentioned several times, HTML is an evolving language. You can (and should!) formally specify the version of the language used in a document by including, as the first line in the text, a string known as a public text identifier. The declaration for HTML 4 is:

<!DOCTYPE HTML PUBLIC “-//W3C//DTD HTML 4.0//EN”>

where the text inside double quotation marks is the identifier for the DTD that applies to the document. DOCTYPE specifications are placed at the start of a document by HTML editors (such as SoftQuad’s HoTMetaL) that rigorously enforce correct markup as defined by the DTD.

Elements and Markup Tags

The overall structure of the HTML language was covered in Chapter 1. The following is a review of the basic concepts, using the document in Figure 6.1 as an example, with Figure 6.2 showing typical rendering of this example.

An HTML document is simply a text file in which certain strings of characters, called tags, delimit regions of the document and assign special meanings to them. In the jargon of SGML, these regions and the enclosing tags are called elements. The tags are strings of characters surrounded by the less-than (<) and greater-than (>) characters. For example

<H1>

is the start tag for an H1 (a heading) element, while the similar tag with a leading slash character

</H1>

is the corresponding end tag. The entire H1 element is then the string:

<H1> Environmental Change Project </H1>

Each element has a name,¹ which appears inside the tags and which defines what the element means. For example, the H1 element marks a level one heading. Elements that mark or contain blocks of text (such as H1) are also called containers. Most elements are containers and mark regions of the document into blocks of text, which in turn may contain other elements containing other blocks of text, and so on. You can think of a document as a nested hierarchy of these elements, with the complete hierarchy defining the entire document.

¹In SGML terminology, the name of an element is formally called a generic identifier, or GI.

Figure 6.1An example of a simple HTML document.

<HTML>
<HEAD>
  <TITLE> Environmental Change Project </TITLE>
</HEAD>
<BODY>
<H1> <A NAME=“env-change”> Environmental </A> Change Project </H1>

<P>Welcome to the home page of the Environmental Change Project.
This project is different from other projects with similar
names.  In our case we actually wish to change the climate.
For example, we would like hot beaches in Northern
Quebec, and deserts near Chicago.

<P> So how will we do this.  Well we do the following:
<UL>
  <LI><A HREF=“burn.html”><EM>Burn down</EM></A> more forests
  <LI>Destroy the <A HREF=“http://who.zoo.do/ozone.html”>Ozone</A> layer
  <LI>Breed more <A HREF=“ftp://foo.do.do/cows.gif”>cows</A> (for extra
      greenhouse gas)
</UL>
</BODY>
</HTML>

Figure 6.2 Display, using the Internet Explorer 3 browser of the document listed in Figure 6.1.

Optional End Tags

In some cases, element end tags are optional. This is so when the end of an element can be unambiguously determined from subsequent tags. As an example, look at the LI elements in Figure 6.1. These elements define a single list item inside the UL unordered list element but do not require </LI> end tags. This is because the end of a given item is implied by the next <LI> start tag or by the </UL> end tag ending the list.

Empty Elements

Some elements (such as the IMG [insert an inline image], HR, and BR [line break] elements) do not “contain” anything and are called empty elements. In HTML, empty elements cannot have end tags.

Element Attributes

Most elements have attributes, which are quantities that specify properties for a particular instance of an element. For example, the A (hypertext anchor) element can take the HREF attribute, which specifies the target of a hypertext link. Most attributes are assigned values. For example, HREF is assigned the URL of the target document, as in:

<A HREF=“http://who.zoo.do/Ozone.html”> Ozone </A> layer

Attributes are always placed inside the start tag. Attributes are often optional, in which case they can be left out.

Element Nesting

Elements are always nested, with this nesting reflecting the structure of the document (for example, emphasized text inside a paragraph, inside a form, inside the BODY). However, elements can never overlap. Thus the structure

<A HREF=..><EM>Burn down</EM></A> more forests

is valid HTML markup, while (comment in italics) is not. In addition, all elements have restrictions as to what can be nested inside them and where they, themselves, can be nested. Details of the allowed nestings are presented later in this chapter as each element is discussed.

Some browsers can recover from simple nesting errors, so that mistakes are often hard to spot. If you are lucky, you will get mail from someone questioning why he or she cannot properly view your document. A better choice is to use a validation tool, such as sgmls, to check your documents for mistakes. This option is discussed in “Web Management and Maintenance Tools” on the companion Web site.

Case-Insensitive Element and Attribute Names

Element and attribute names inside the markup tags are case insensitive. Thus, the strings <H1> and <h1> are equivalent, as are

<a HreF=“Dir1/foo.html”><EM>Burn down</eM> </a> more forests

and

<A href=“Dir1/foo.html”><em>Burn down</eM> </A> more forests

Element and attributes names are nevertheless usually written in uppercase, so that document developers can more easily see the markup tags and attributes.

Case-Sensitive Attribute Values

While element and attribute names are case insensitive, the values assigned to attributes are often case sensitive. An obvious example is a URL assigned to an HREF attribute. A URL can contain both directory and filename information. Many computers allow both upper- and lowercase characters in file and directory names, so it is crucial that case be preserved. For a document author, this is ensured by enclosing the attribute argument in double quotes, as done in Figure 6.1.

Table of Contents

Products | Contact Us | About Us | Privacy | Ad Info | Home

Use of this site is subject to certain Terms & Conditions, Copyright © 1996-2000 EarthWeb Inc.
All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.