home account info subscribe login search My ITKnowledge FAQ/help site map contact us


 
Brief Full
 Advanced
      Search
 Search Tips
To access the contents, click the chapter and section titles.

Platinum Edition Using HTML 4, XML, and Java 1.2
(Publisher: Macmillan Computer Publishing)
Author(s): Eric Ladd
ISBN: 078971759x
Publication Date: 11/01/98

Bookmark It

Search this book:
 
Previous Table of Contents Next


Syntactically speaking, this code contains several errors:

  <TITLE> is a required element, and the document has no title.
  The KEYWORD attribute in the <META> tag has a mismatched quotation mark ().
  The value of the BGCOLOR attribute is invalid because hexadecimal values can be made up only with the digits 0–9 and A–F. Also, the hexadecimal value should be offset by a pound sign (#).
  The heading style tags are mismatched.
  No closing </BODY> and </HTML> tags are present.

Despite these errors, look at Figure 11.1; this figure shows the document through Netscape Navigator. It looks pretty good, doesn’t it? Why should authors adhere to proper HTML syntax when most popular browsers, in the absence of good syntax, can usually figure out what the user wants? The fact that browsers are so forgiving has led to very sloppy HTML authoring habits. This is a problem because, if the trend toward automated processing of Web documents continues, it will be imperative for all documents to adhere to proper syntax. Otherwise, it will be impossible for programs to correctly parse them.

By now you are probably losing confidence in HTML’s capability to meet the electronic publishing needs of the future. Rest assured, you would not be alone in feeling that way. Many content providers also raised these concerns and escalated the situation to the point that the W3C began to consider alternatives. One of the easiest alternatives to consider was that of going back to HTML’s parent language—SGML. As the next section illustrates, however, for a set of different reasons, SGML is not the markup language of the future.


FIGURE 11.1  Despite several syntax errors, Netscape Navigator still rendered this HTML document.

Problems with SGML

If you were to use SGML to mark up Web documents, you would certainly have no trouble with flexibility. After all, SGML is really a meta-language, or a language for defining other languages. It provides a vast set of features for devising description languages for documents as short as a single page or as long as several printed volumes.

But therein lies the problem. SGML is so vast that it is overkill for most kinds of Web publications. The SGML standard stretches on for pages and pages, making it more difficult for

  Content providers to mark up content
  Programmers to write parsers, browsers, and other processing programs

SGML has so many optional features that it is just too cumbersome for the needs of Web publishers. Yet it is much more extensible and structured—both very desirable features—than HTML. How then can the Web publishing world harness the best qualities of SGML without all its high maintenance features?

XML: The Best of Both Worlds

The answer is XML. XML is a simplified version of SGML that throws out many of the features of SGML that just don’t apply to Web publishing activities. The result is a meta-language that provides SGML’s structure and flexibility without all the complexities. Specifically, XML is

  Extensible. XML’s flexibility comes from its capability to enable you to make up your own XML elements. This means that you can introduce tags into XML as appropriate to your publishing needs.
  Portable. The catch to being able to introduce your own tags is that you need to be able to communicate the syntax of those tags to others. Fortunately, it is fairly simple to produce files that capture the rules of your markup and enable others to properly read or process your XML documents.
  Structured. One feature that XML takes from SGML is a rigid adherence to a specific structure. If a document is not structured properly, it is not considered to be XML.
  Descriptive. XML elements are necessarily divorced from specifying how content is to be presented. Thus, the elements are free to describe the meaning of what they contain. This permits more “intelligent” handling by parsers and other processing programs.

XML retains the best features of SGML without the intricacies of SGML, making it a much more accessible language. XML also retains some of the good things about HTML: It is easy for Web document authors to use and easy for programmers to write software to render XML code. The balance of this chapter is dedicated to introducing you to the work that has been done so far in developing an XML standard. After an overview of some basic concepts, you will read about elements and entities, the two major components of XML. You will then learn how to set up different kinds of links in XML documents. Finally, you will read about how style sheets can be used with XML files to specify presentation and how XML is already slated for use in a number of specialty publishing areas.


NOTE:  XML is a very young language. The XML 1.0 recommendation was published in early 1998, and only now are software developers moving toward producing XML parsers and browsers. Content developers have begun to devise different applications of XML—for example, the Mathematical Markup Language (MathML) is a specialized markup language derived from XML. Many other disciplines have their own XML-based markup languages in the works.

XML is eventually expected to supplant HTML as the “mother tongue” for Web publishing, but this will not happen overnight. When the World Wide Web Consortium (W3C) finalized the HTML 4.0 recommendation earlier this year, it announced that the next version of HTML would be the first important step toward migrating to XML. It also said that it would take approximately 18 months to develop this transitional language, so you have plenty of time to learn the basics of XML and develop markup languages appropriate to your Web publishing activities.


XML Overview

Before you dive into the specifics of XML, it is helpful to be grounded in some of the basic ideas. From what you have read so far, you understand the motivation for creating XML. (HTML is too limited; SGML is too broad.) In addition, some fundamental concepts—if mastered now—will make it easier to read the remaining sections of this chapter. These concepts include

  The different types of XML markup
  Document Type Definitions (DTDs)
  Valid XML documents
  Well-formed XML documents

The next three sections examine these ideas.


Previous Table of Contents Next


Products |  Contact Us |  About Us |  Privacy  |  Ad Info  |  Home

Use of this site is subject to certain Terms & Conditions, Copyright © 1996-2000 EarthWeb Inc.
All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.