HTML 4.0 Sourcebook:Introduction to the HyperText Markup Language

To access the contents, click the chapter and section titles.

HTML 4.0 Sourcebook
(Publisher: John Wiley & Sons, Inc.)
Author(s): Ian S. Graham
ISBN: 0471257249
Publication Date: 04/01/98

Table of Contents

Heading Elements

The first element in the BODY is an H1 element. H1 stands for a level 1 heading element. In HTML, headings come in six levels, H1 through H6, with H1 being the highest (most important) heading level, and H6 the lowest. A browser must then take the H1 element content and display it in a manner appropriate to a major heading. For example, Netscape Navigator (Figure 1.2) shows the heading

<H1>This is a Heading</H1>

as a large, boldfaced string of characters, left-justified and separated by a wide vertical space from the following text. Lynx, on the other hand (Figure 1.3), shows it as a capitalized text string, centered on the page. This comparison is designed to remind you of the point made at the beginning of this section: Different browsers may render the same elements in very different ways. HTML markup instructions are designed to specify the logical structure of the document far more than the physical layout. If documents take advantage of this feature, then the browser is free to find the best way to display items, such as headings, consistent with its own limitations.

Interpretation of Spaces, Tabs, and New Lines

Referring back to Figure 1.1, review the next few lines of text:

<P>Hello.  This is not a very exciting document.

I
    bet you were expecting <EM>poetry</EM>, or

some kind of <STRONG>exciting <BR> fact</STRONG> about the Internet and
the World Wide Web.

As shown in Figures 1.2 and 1.3, these lines are rendered as a continuous paragraph of text, ignoring the blank lines, extra spaces, and tabs that are present in the original file. The rendering of an HTML document largely ignores extra spaces, tabs, and blank lines, and treats any combination of these characters as a single word space. This means that extra space characters, line breaks, and indentations can be used to organize the logical layout of an HTML document, making it easier to see the placement of the tags relative to the text. This is done in the bottom half of Figure 1.1 (note how the <LI> list items are indented inside the list) and in most of the examples in this book. This concept will be familiar to anyone who has written computer programs or used typesetting languages such as TeX or Scribe and is equivalent to using spaces and tabs to make a computer program easier to read.

Character Highlighting

The first sentence in Figure 1.1 contains two additional elements: EM for emphasis and STRONG for strong emphasis. Note that these are logical descriptions of the enclosed text and do not directly specify a physical formatting style. The HTML specifications recommend that text marked with EM be italicized and that text marked with STRONG be rendered as bold. This is exactly what is done by Netscape Navigator, as shown in Figure 1.2. Lynx, on the other hand, renders both EM and STRONG as underlined text (Figure 1.3). Character-based programs such as lynx can really do only four things to text—underline it, boldface it, force it to capital letters, or display it in reverse video. Given these limitations, lynx cannot render as distinct all the different elements in HTML. It therefore renders EM and STRONG in the same way. On the other hand, a text-to-speech converter would have no problem distinguishing EM and STRONG, and could simply modify the spoken intonation to account for the specified emphasis. Unfortunately, it is hard to include a text-to-speech example in a book!

Highlighting elements, such as EM and STRONG, can be placed almost anywhere you find regular text, the only exception being within the TITLE element in the HEAD. The content of a TITLE element can only be text; there can be no HTML elements inside it. Recall that the text inside a TITLE is not part of the document, but simply a text string providing information about the document. Thus markup has no meaning here.

HTML has several other logical highlighting elements, such as CODE for computer code, KBD for keyboard input, VAR for a variable, DFN for the defining instance of a term, CITE for a short citation, and so on. HTML also has physical highlighting elements, such as B for boldfaced, I for italics, TT for typewriter font (fixed-width characters), and U for an underlined font. Where sensible, specify logical meaning for text strings rather than these physical styles: Logical styles assign true meaning to the associated text and give a browser more flexibility in determining the best presentation.

Physical highlighting tags are particularly useful when translating from a word processor format that already contains tags for boldface, italics, or other physical styles, since these styles can be directly converted to their HTML equivalents. They are also useful for specifying physical formatting that is purely decorative and for which there is no important structural meaning.

Paragraphs and Vertical Spacing

Look at the next line, beginning with the string Sorry.:

<P>sorry.  No such luck.       This document

does
contain examples of HTML markup, for example, here is an “unordered
list”:
<UL>

The tag marks the beginning of a paragraph and is best thought of as marking the start of a paragraph container. Most browsers interpret the that starts a paragraph by skipping a line, as shown in Figures 1.2 and 1.3. Note also that a paragraph mark can be anywhere in a line. For example, the three lines (note the blank line between the two lines of text):

the World Wide Web.

<P>Sorry.  No such luck.       This document

can equally well be written as:

the World Wide Web.  <P>Sorry.  No such luck.  This document

Recall that the rendering of an HTML document depends only on where the markup tags are located relative to the text they describe. Of course, putting at the beginning of a line makes it easier to read the “raw” HTML, and is thus a good idea.

Notice that Figure 1.1 does not include ending tags to mark the ends of the paragraphs. In HTML, ending paragraph tags are optional. The rule is that a paragraph is ended by the next tag that starts another paragraph or by any other tag that starts another block of text, such as a heading tag (<Hn>), a quotation tag (<BLOCKQUOTE>), or list tags (<UL>, <OL>, <DIR>, <MENU>, <DL>). Thus, the paragraph ending with the words unordered list is ended by the following <UL> tag that marks the beginning of an unordered list element.

Don’t Use Empty Paragraphs

The HTML specification recommends that, if two or more adjacent elements describing the logical structure of the document require some special vertical spacing, only one of the spacing values (the larger) should be used and the other should be ignored. This implies that constructions like

<p><p><p><p>  This is a paragraph

should yield at most a single paragraph break. If you enter markup such as this, you will find that the spacing is rendered differently by different Web browsers: Some will leave extra space, and some will not.

Adding Extra Vertical Space

If you really need extra vertical space, try using consecutive line break elements:

<BR> <BR> <BR>

This is valid HTML, and it usually yields the extra spacing you require. The long-term solution for detailed formatting is found in style sheets, discussed in Chapter 7.

Table of Contents

Products | Contact Us | About Us | Privacy | Ad Info | Home

Use of this site is subject to certain Terms & Conditions, Copyright © 1996-2000 EarthWeb Inc.
All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.