home account info subscribe login search My ITKnowledge FAQ/help site map contact us


 
Brief Full
 Advanced
      Search
 Search Tips
To access the contents, click the chapter and section titles.

HTML 4.0 Sourcebook
(Publisher: John Wiley & Sons, Inc.)
Author(s): Ian S. Graham
ISBN: 0471257249
Publication Date: 04/01/98

Bookmark It

Search this book:
 
Previous Table of Contents Next


Chapter 1
Introduction to the HyperText Markup Language

What is a text markup language? A markup language is a way of describing, using instructions embedded within a document, what the different parts of the text mean or what they are supposed to look like. For example, suppose I want to indicate that the words “Frozen Albatross” should be displayed in boldface. A markup language might let you express this desire by typing

[beg_bold] Frozen Albatross [end_bold]

meaning: “turn on boldface, write the words ‘Frozen Albatross’, and then turn off boldface.” The text strings [beg_bold] and [end_bold] (in boldface italics for emphasis) are part of the markup language, which here turns boldface on and off. In general, a markup language has many such codes, or tags, to allow for a rich description of the document content and desired rendering.

Every electronic text processing tool uses some kind of markup language. Most of the time, this language is hidden from the user, although some word processors let you use a “reveal codes” command to display the actual markup commands—these commands are usually sequences of unprintable characters, unlike the printable characters used in the above example. Nevertheless, the idea is the same: A markup language is just a collection of codes, embedded in the document, that explain the meaning or desired formatting for the text.

Physical Versus Semantic Markup

There are two basic markup language approaches. The first is known as physical markup. In this approach, the markup tags explicitly say how the document should look, and contain commands such as: “indent 0.5 inches, print the word ‘Frozen Albatross’ using an 18 point Arial font, . . .” and so on. This is ideal for printing the text, but bad if printing is not the primary goal. Suppose, for example, you want to display the document on a computer that is not capable of the requested formatting. In this case, the physical formatting information is useless, and the computer has no easy way of determining a good alternative presentation for the text.

The second approach is known as logical or semantic markup. Here, the markup language defines the meaning of the text and not how it looks. Using semantic markup, the previous example might be written as

[beg_heading] Frozen Albatross [end_heading]

which means: “the enclosed text, ‘Frozen Albatross’, is a heading.” The advantage is that the markup encodes the structural meaning of the text and not its physical representation. It is now easy to translate this heading into the formatting commands: “indent 0.5 inches, print the word ‘Frozen Albatross’ using an 18 point arial font, . . .” should you be printing the document to paper, or into other instructions should you be presenting the document on some other medium, such as a computer display or a Braille reader. Thus, although semantic formatting is more difficult (you have to think about what each part of the document means when you add the markup instructions), it is a much more powerful and flexible for describing text and has become the technique of choice for modern document processing systems. This includes modern word processors such as Word or WordPerfect that now incorporate many semantic markup features into their markup model. Indeed, Microsoft Word “Styles” are really just a way of defining semantic markup tags and relating that markup to desired physical formatting.

What is the HyperText Markup Language, or HTML? Despite all the hype, HTML is simply another markup language.* However, unlike the others, HTML is designed specifically for marking up electronic documents for delivery over the Internet and for presentation on a variety of different possible displays. As a result, HTML is very much a semantic markup language, designed to specify the logical organization of a text document; there are very few physical formatting commands in HTML. In addition, HTML has important extensions that allow for hypertext links from one document to another, as well as other extensions that allow for user input and user interaction.


*The rules of HTML are defined using a language called the Standard Generalized Markup Language or SGML. SGML is an extremely sophisticated tool for defining markup languages—HTML is just one example of this process. SGML is discussed in a bit more detail in Chapter 6.

It is important to stress these design principles, because they explain the large differences between authoring with HTML and writing documents using word processors. HTML was not designed to be the language of a “What You See Is What You Get” (WYSIWYG) word processor such as Word or WordPerfect. Instead, HTML takes a “What You Get Is What You Meant” (WYGIWYM) approach, such that authors must construct documents with sections of text (and/or images and other embedded objects, such as Java applets) marked as logical entities, such as titles, paragraphs, lists, quotations, and so on. The interpretation of these marked elements is then largely left up to the browser displaying the document. This approach builds enormous flexibility into the system and allows the same document to be displayed by browsers of very different capabilities.

Consequently, there are browsers for machines ranging from fancy UNIX graphics computers to plain-text terminals such as VT-100s or old 8086-based DOS computers. As an example, in viewing the same well-designed HTML document, a graphical browser like Netscape Navigator may present major headings in a large, slanted, and boldfaced font (since elegant typesetting is possible with graphics displays) and may include attractive inlined graphical elements indicated by the HTML markup, while a text-only browser like lynx may just center the title, use a single font for all the text, and display text alternative descriptions to the images; and a Braille browser would present the same text information in a completely different way. However, all these presentations will reproduce the logical organization and meaning of the original text document, since this information was built in using the HTML language.

Specifying Formatting—Cascading Style Sheets

Of course, a page designer can ignore the semantic markup approach and use embedded images, tables, and formatting-specific HTML tags to define quite precise page layout and design—albeit with a lot of hard work! However, this comes at the price of portability, accessibility, and speed—documents so designed will inevitably display poorly on some browsers or under some conditions (e.g., when image loading is disabled). Indeed, using large quantities of formatting-specific markup most often leads to large, slow-to-download documents. Clearly, a designer in the real world must make compromises between these issues, aware of the strengths and weaknesses of each design choice. The goal of this book is to give you the tools to make these choices wisely.

The modern approach is to use a second language (known as a style sheet) to define formatting details and to apply the instructions of this language to an HTML document while it is being displayed. The latest versions of the Netscape and Microsoft browsers (Versions 4) support a style sheet language known as Cascading Style Sheets, or CSS. Using CSS, an author can define how particular HTML elements should be formatted, positioned, and displayed, without having this information coded into the HTML markup. Then, when the page is displayed, the browser will use the instructions to adjust the formatting and layout according to the rules specified in the CSS style sheet.

CSS style sheets only work if the HTML markup is rigorously correct—that is, if the placement of the HTML elements follow the rules of the HTML language. Moreover, style sheets only work with the latest versions of Web browsers—the CSS instructions are totally ignored by Netscape Navigator 3 (and, for the most part, Internet Explorer 3) or earlier, so that HTML-specific formatting is needed for these older browsers.


Previous Table of Contents Next


Products |  Contact Us |  About Us |  Privacy  |  Ad Info  |  Home

Use of this site is subject to certain Terms & Conditions, Copyright © 1996-2000 EarthWeb Inc.
All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.