home account info subscribe login search My ITKnowledge FAQ/help site map contact us


 
Brief Full
 Advanced
      Search
 Search Tips
To access the contents, click the chapter and section titles.

Platinum Edition Using HTML 4, XML, and Java 1.2
(Publisher: Macmillan Computer Publishing)
Author(s): Eric Ladd
ISBN: 078971759x
Publication Date: 11/01/98

Bookmark It

Search this book:
 
Previous Table of Contents Next


CHAPTER 12
Anatomy of an XML Document

by Simon North and Jim O'Donnell

In this chapter
XML Markup 328
A Sample XML Document 328
Logical Structure 331
Physical Structure 332
Markup Delimiters 334
Element Markup 334
Attribute Markup 335
Naming Rules 337
Comments 338
Character References 339

Just as student doctors begin their medical training by dissecting a human body and learning the nature and relation of the parts before they learn how to treat them, so this exploration of XML can begin with an examination of a small XML document with all its parts identified. In this chapter, we will look more closely at a sample XML document and break it down into its component parts. We will cover the following topics:

  The components of an XML document
  Logical and physical structures used in XML

XML Markup

XML has very simple rules for distinguishing between the content of a document and the XML markup elements used to describe it. Most of these rules are described in the following list:

  The start of XML markup elements is identified by either the less than symbol (<) or the ampersand (&) character.
  Three other characters, the greater than symbol (>), the apostrophe or single quote (), and the double quotation mark (), are used by XML for markup.
  To use these special characters as content within your document, you must use the corresponding general XML entity (shown in Table 12.1). XML entities are discussed in greater detail in Chapter 15, “XML Characters, Notations, and Entities.”
  Everything else not used to denote XML markup represents the content of the document.
  See “Entities,” p. 391.
Table 12.1 Predefined XML Entities

Character Replacement

& &amp;
&apos;
> &gt;
< &lt;
&quot;

A Sample XML Document

Listing 12.1 shows the XML code for a simple home page. This is a simple example, but it does contain all the important parts that you will find in nearly all XML documents.

Listing 12.1 Home.xml—A Simple XML Home Page


<?xml version=”1.0"?>
<home.page>
   <head>
      <title>
         My Home Page
      </title>
      <banner source=”topbanner.gif”/>
   </head>
   <body>
      <main.title>
         Welcome to My Home Page
      </main.title>
      <rule/>
      <text>
         <para>
            Sorry, this home page is still under construction.
            Please come back soon!
         </para>
      </text>
   </body>
   <footer source=”foot.gif”/>
</home.page>

In the following sections, we will break apart the above XML home page and describe what each part of it achieves and how it is set up.

The XML Declaration

<?xml version=”1.0"?>

The XML declaration identifies what follows as being XML code, states what version of the XML standard the code complies with, and specifies whether the document can be treated as a standalone document (yes) or whether a DTD must also be retrieved to be able to make full sense of the contents. Creating XML DTDs will be discussed in Chapter 14, “Creating XML Document Type Definitions.”

See “Getting Sophisticated with External DTDs,” p. 364.

The XML declaration is, in fact, a “processing instruction” (identified by the ? at its start and end), but for now it’s enough to treat it as a standard declaration. This declaration is not strictly compulsory (the fact that the document is XML code can also be announced by the Web server in the same way that is often done for HTML documents), but it is a good idea to get into the habit of always including such a declaration because it will increase the portability of your code.

The Root Element

<home.page>
   …
</home.page>

Each XML document must have only one root element, and all the other elements must be completely enclosed in that element. In this document, the root element is defined by the start tag of the <home.page> element and the end tag of the </home.page> element.

In XML, a non-empty element must consist of three things: a start tag, content (either text or other elements), and an end tag. The name that you use in the element start tag must exactly match the name you use in the end tag. If you want to use an odd combination of cases to increase the legibility of long names (for example, ThisIsAnIntelligibleName), you must be careful to exactly match the case usage in both the opening and the closing tags.

Empty XML Elements

<banner source=”topbanner.gif”/>

<rule/>

<footer source=”foot.gif”/>

Empty elements are a special case in XML. In SGML and HTML, it is obvious from the definition of an element (in the DTD) that it is empty and has no comment. XML, in keeping with its developers’ design goals, requires you to be much more explicit. Indeed, you may well not be using a DTD at all, and so it could be quite hard to decide whether an element is—or should be—empty. Empty elements, therefore, have to be clearly identified as such, and to do so, a special empty tag close delimiter is used, /> as in

<empty_element/>

To maintain a certain degree of backward compatibility with SGML and HTML, instead of using the special empty tag close delimiter, you can simply use a closing tag. The equivalent to the preceding code is

<empty_element/></empty_element>

In our sample document, the three empty elements are used to denote a graphic image to be used as the banner and footer of the XML home page, as well as to indicate a rule, representing a division between the title and the main body of the home page.


Previous Table of Contents Next


Products |  Contact Us |  About Us |  Privacy  |  Ad Info  |  Home

Use of this site is subject to certain Terms & Conditions, Copyright © 1996-2000 EarthWeb Inc.
All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.