Platinum Edition Using HTML 4, XML, and Java 1.2:Creating XML Documents

To access the contents, click the chapter and section titles.

Platinum Edition Using HTML 4, XML, and Java 1.2
(Publisher: Macmillan Computer Publishing)
Author(s): Eric Ladd
ISBN: 078971759x
Publication Date: 11/01/98

Table of Contents

Anything—absolutely anything—that appears between the opening tag (<![CDATA[) and the closing tag (]]>) will not be recognized as markup. You do not need to “escape” any markup characters in a CDATA section (in fact, you can’t because the escape itself won’t be recognized). The only thing that will be recognized is the end of section tag (]]>), so this string cannot be included in a CDATA section, and as a logical consequence, you cannot put one CDATA section inside another.

CAUTION:
Using markup characters in a CDATA section like this in an XML document, whose existence is built around markup, goes against the grain. An XML processor is therefore intended to be very strict with this feature. The opening string and closing string for a CDATA section must be used exactly as shown here. The slightest deviation—a tab or a space character somewhere inside one of the strings—will be punished immediately. Should you do this, the content of the CDATA section will either be treated as markup, or the rest of your document (as far as the next CDATA section that is closed properly) will be treated as part of the CDATA section and all the markup will be ignored.

CDATA sections are one of the recommended ways to embed application code (JavaScript or Perl code, for instance) in your XML code. You could place the embedded code in comments (as is often done in HTML documents), but the XML processor is not required to pass the comment text to an application. Therefore, a risk exists that the contents of comments will be stripped out before the application sees them.

You could declare your own type of element to contain the embedded code (like the <script> element in HTML 4), but by doing so, you are implicitly breaking the spirit of generic markup even though it is permitted to do so. This would not, however, prove to be much help if your embedded code contained characters that could be interpreted as markup because the contents of these elements would be parsed in the normal way by the XML processor.

The other way to embed code, and probably the best way, is by using processing instructions, which will be discussed in the next section.

Processing Instructions

The XML declaration that is (or at least should be) at the start of every XML document is a processing instruction:

<?xml version=”1.0"?>

XML markup is meant to be generic, and in a perfect world, it would be. However, times will occur when you really do need to enter instructions for specific applications. One of these applications could be a script interpreter, and so, like CDATA sections, processing instructions are good places to put embedded code. Better still, although CDATA sections are purely a way of avoiding having characters interpreted as markup, processing instructions can be targeted to your application. This would enable you, for example, to have two or more sets of embedded script code intended for different processors or interpreters and enable you to identify them separately, as shown in this listing of a partial XML document:

<para>
   This is text containing two processing instructions,
      <?javascript I can put whatever I like here?>
      <?perl And I can put whatever I like here too?>
   one for each interpreter.
</para>

No restrictions exist at all on the content of the processing instructions (the content is not even considered by the XML processor to be part of the document’s character data), but the name that you choose must comply with XML’s naming rules.

Markup Declarations

Before we get into the details of declaring elements and attributes, we’ll quickly review where in the XML document these are made, as shown in the following:

<?xml version=”1.0"?>
<!DOCTYPE page [
   <!-- this is where the internal DTD subset is located. -->
]>
<page>
   <!-- this is where the content of the root element is located. -->
</page>

As shown, the XML document begins with the XML declaration. At this stage, the document still does not have an external DTD, so the declaration as shown is sufficient. The DOCTYPE declaration follows, in which the internal DTD subset information can be entered. Finally, between the root element, <page>, is the content of the document.

Although the full syntax can be somewhat more complex than that which is shown here (the full syntax for DTDs will be shown in the next chapter), for use with an internal DTD subset only, the syntax takes the form

<!DOCTYPE document.type.name [ internal.subset ]>

where the document type name should be the same as the name of the XML document’s root element (<page> above).

Element Declarations

The first kind of declaration is the element declaration. This takes the form

<!ELEMENT name content>

The name is a standard XML name constructed in accordance with the naming rules discussed in Chapter 12. The content part of the element declaration describes either a specific content in the form of the keyword EMPTY or the keyword ANY, or it consists of a so-called content model that describes the sequence and repetition of elements that are contained inside (are children of) this element.

Empty Elements

Empty elements have no content (they are forbidden to have any content) and they are marked up as either

<empty.element/>

<empty.element></empty.element>

An empty element is declared like this:

<!ELEMENT empty.element EMPTY>

Unrestricted Elements

The opposite pole to an empty element is an unrestricted element. An unrestricted element can contain any element that is declared elsewhere in the XML document’s DTD (in either the internal or the external DTD subset). Because we aren’t using an external DTD subset at this point, the XML processor cannot know about any elements declared in an external DTD subset.

An unrestricted element’s content is declared like this:

<!ELEMENT any.element ANY>

and you cannot declare that the content should be in any order.

Element Content Models

An element content model consists of a description, using a very simple grammar, of the elements that may appear in the content of the current element, in what order they may or must appear, and how often they may or must appear. They are used to describe the structure of your XML documents, for instance, declaring that chapter elements must appear within section elements. Other examples of situations these models describe will be given in the following sections.

Table of Contents

Products | Contact Us | About Us | Privacy | Ad Info | Home

Use of this site is subject to certain Terms & Conditions, Copyright © 1996-2000 EarthWeb Inc.
All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.