Platinum Edition Using HTML 4, XML, and Java 1.2:Creating XML Documents

To access the contents, click the chapter and section titles.

Platinum Edition Using HTML 4, XML, and Java 1.2
(Publisher: Macmillan Computer Publishing)
Author(s): Eric Ladd
ISBN: 078971759x
Publication Date: 11/01/98

Table of Contents

Using occurrence indicators, we could generalize our return.address element as follows:

<!ELEMENT return.address ((business.name,attn?) | personal.name*),(street.address+,city,state,zip)>

This element breaks down as follows:

• First, a choice exists between either business.name and attn or personal.name. The ? after attn indicates that it can occur either zero or one time. The * after personal.name indicates it can occur any number of times, including zero. This leads to the following possibilities for this part of the content model:

• Business name by itself

• Business name with one attn name

• One or more personal names

• Nothing (the zero option of the * on personal.name)

• Then, one or more street.address elements can be included.

• Finally, one each of city, state, and zip must be used.

NOTE: As you can see, occurrence indicators give you a little control over the frequency of occurrence of an element or group of elements (not at all, once, or an unlimited number of times). This all-or-nothing approach is a little too loose for a lot of possible XML applications. Therefore, initiatives such as Microsoft’s proposed XML-Data standard are very important. This standard, described at http://www.microsoft.com/standards/xml/default.asp, would give XML content authors more control over the data in their documents.

Character Content

One more type of element content is a little bit different from what we have discussed so far. Where text—and only text—is allowed inside an element, this is identified by the keyword PCDATA in the content model (parsable character data). To prevent you from confusing this keyword with a normal element name (and to make it impossible for you to use it as a name), the keyword is prefixed by a hash character (#), which is called the reserved name character (RNI).

The following element declarations:

<!ELEMENT para (title, text)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT text (#PCDATA)>

would enable you to write this in your XML document:

<para>
   <title>My Life</title>
   <text>
      My life has been very quiet of late.
   </text>
</para>

A parsable character data element that cannot contain any further markup is therefore where the markup stops and normal text takes over.

Character Data Models
Don’t lose sight of the fact that XML’s content models are only concerned with the structure of an XML document; they make no attempt to control its content. An element that is totally devoid of data content will still match a #PCDATA content model.

Mixed Content

Elements that can contain either text (parsable character data), elements, or both are a real problem sometimes. They are given the name mixed content models, and they require extra care. The important point is that it is difficult for an XML processor to distinguish between unintentional PCDATA (spaces, tabs, line endings, and so on) and element content. An accidental space between an end tag and the next start tag could lead to some confusion on the part of the XML processor.

To declare mixed content, you use the content model grammar you have learned so far, but you must use it in a particular way. The content model has to take the form of a single set of alternatives, starting with #PCDATA and followed by the element types that can occur in the mixed content, each declared only once. Except when #PCDATA is the only option (as you saw earlier), the * qualifier must follow the closing parenthesis:

<!ELEMENT pick (#PCDATA | eeney | meeney | miney | mo)*>

Attribute Declarations

Although you can declare only one element at a time, elements can have lots of attributes, and so the attributes are all declared at once in an attribute specification list. An attribute declaration has the form

<!ATTLIST element.name attribute.definitions>

It is normal practice to keep the attribute declaration for an element close to the declaration of the element itself, but there is absolutely no requirement to do so; it just makes maintenance easier.

Attribute Specification Lists

An attribute specification list consists of one or more attribute specifications (for readability they are often put on separate lines, but this is not required). An attribute specification list does the following for an element:

• It declares the names of allowed attributes.

• It states the type of each attribute.

• It may provide a default value for each attribute.

Each attribute specification consists of a simple attribute name and attribute type pair statement of the form

attribute.name attribute.type

Attribute Types

Three types of attributes exist:

• A string attribute is one whose value consists of any amount of character data.

• A tokenized attribute is an attribute whose value consists of one or more tokens that are significant to XML.

• An enumerated attribute type is an attribute whose value is taken from a list of declared possible values.

String Types The values of string types are simple strings of characters. Any attributes used in an XML document that does not have a DTD (either an internal DTD subset or an external DTD subset) is automatically treated as a string type attribute. An example of a string type declaration is

<!ATTLIST owner CDATA>

and you would then use it like this:

<book owner=”Hammersmith Public Library”>

You can also use an internal entity (in this case it’s given the more generic name general entity) in the value of a string type attribute:

<book owner=”&my.local; Public Library”>

Table of Contents

Products | Contact Us | About Us | Privacy | Ad Info | Home

Use of this site is subject to certain Terms & Conditions, Copyright © 1996-2000 EarthWeb Inc.
All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.