Platinum Edition Using HTML 4, XML, and Java 1.2:Anatomy of an XML Document

To access the contents, click the chapter and section titles.

Platinum Edition Using HTML 4, XML, and Java 1.2
(Publisher: Macmillan Computer Publishing)
Author(s): Eric Ladd
ISBN: 078971759x
Publication Date: 11/01/98

Table of Contents

Attributes to XML Element Tags

Element start tags can include one or more optional or mandatory attributes that give further information about the elements they delimit. The syntax for specifying an attribute is

<element_type_name attribute_name=”attribute.value”>

If elements were nouns, then attributes would be adjectives. We could, therefore, say

<fruit taste=”sharp”>

or even:

<problem size=”huge” cause=”unknown” solution=”run.away”>

In direct contrast to SGML and HTML, where multiple declarations are considered to be fatal errors, XML deals with multiple declarations of attributes in a unique manner. If an element appears once with one set of attributes and then appears again with a different set of attributes, the two sets of attributes are simply merged. The first time you use the fruit element, for instance, you might include the taste attribute, as shown above. In a subsequent use of fruit, you can introduce a different attribute, such as color. Each time you do this, the complete set of attributes is merged to form the set of all possible attributes for that element.

NOTE: An XML processor is a software package, library, or module that is used to read XML documents. The XML processor makes it possible for an XML application, such as a formatting engine or a viewer, to access the structure and content of an XML document.

Logical Structure

Conceptually, a big difference usually exists between XML and HTML markup. With a few exceptions, most HTML tags perform functions related to how the content is displayed. XML markup, on the other hand, is meant to convey what the content means.

XML uses its start tags and end tags as containers; the start tag, the content, and the end tag form a single element. Therefore, elements can be considered to be the objects out of which an XML document is assembled. Each XML document must have only one root element, and all the other elements must be perfectly nested inside that element. Perfectly nested means that if an element contains other elements, those elements must be completely enclosed within that element.

Now look at what that means for our simple example of Listing 12.1. If we sketch out the structure of the elements in this XML document, we obtain the kind of tree structure of elements shown in Figure 12.1.

As you can see from Figure 12.1, the document has a tree-like structure with the root element (<home.page>) at the top of the tree (or base, depending on how you look at it). All the elements that are inside this element are neatly contained within each other. An XML document must contain one—and only one—root element, and no elements can be either partially or completely outside, after or before that element.

To make it easier to refer to the relationships between elements and to elements with respect to other elements, we say that an element is the parent of the elements that it contains. The elements that are inside an element are called its children. Elements that share the same parent element are called siblings.

In our simple example of Listing 12.1, <home.page> is the parent of all the other elements, <text> is the parent of <para>, <title> is a child of <head>, and <title> and <banner> are siblings. Going down the element tree, each child element must be fully contained within its parent element. Sibling elements may not overlap.

FIGURE 12.1 The logical structure of elements.

The arrangement of the elements in an XML document is called the logical structure. As you will see next, an XML document also has a physical structure, and to be usable (technically, in order to be “well formed”), the logical and the physical structure of an XML document must be consistent.

Physical Structure

One of the key concepts in XML is that of the entity. If you are to really understand XML, it is essential that you fully understand what entities are. Various types of entities exist, and it is the entities—far more than the elements—that determine how the XML processor deals with XML code. You will learn about entities in some detail in a later chapter, but for now it is enough to think of an entity as being a physical storage unit—an object; although in fact, most entities will usually be separate computer files.

The main entity that you work with all the time, although you will hardly ever notice it is there, is the document entity. This document (or root) entity, as we have seen, is logically divided into elements (other logical components exist that we will discuss later, but for now it is enough to concentrate on the elements).

Entities can reference other entities and cause them to be included in the XML document. You’ve already met some entities; the entities listed in Table 12.1 that we use to include markup characters in normal text are in fact internal entities. For now, we’ll examine the basic reference to a graphics file that is so common in HTML Web pages, and it is our derivative example in Listing 12.1:

<banner source=”topbanner.gif”/>

The banner element’s source attribute refers to an external entity (it isn’t contained in the current document), which is an external graphics file. If this were HTML code, the graphic would appear in your Web browser at this point in the document. In XML terms, this graphics file is called an unparsed entity; the XML processor ignores the content of the entity and simply passes it on to the application.

XML is a little stricter than HTML about the inclusion of external graphics files. (As you will learn later, XML requires you to specify the notation or format that the graphic is in.) XML is also able to include far more than a simple graphic—it is possible to include an external XML source as an entity within another XML document. However, that is where potential problems can be started.

XML is able to include entities that contain XML code, text, HTML code—almost anything. Depending on how the referenced entity is identified, it could be processed (parsed) by the XML processor as if the XML code had been in the original document (root entity) and not in an external file. To further complicate matters, that entity could reference another entity, and so on to infinity. Apart from the practical problems that this might cause (you can imagine trying to open a small document and getting several thousand linked pages!), this creates special problems when the included entities also contain markup.

Table of Contents

Products | Contact Us | About Us | Privacy | Ad Info | Home

Use of this site is subject to certain Terms & Conditions, Copyright © 1996-2000 EarthWeb Inc.
All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.