|
|
|
To access the contents, click the chapter and section titles.
Platinum Edition Using HTML 4, XML, and Java 1.2
(Publisher: Macmillan Computer Publishing)
Author(s): Eric Ladd
ISBN: 078971759x
Publication Date: 11/01/98
CHAPTER 14 Creating XML Document Type Definitions
- In this chapter
- Why Have a DTD at All? 360
- DTDs and Validation 360
- Document Type Declarations 361
- Internal DTD Subset 362
- Standalone XML Documents 362
- Getting Sophisticated with External DTDs 364
- Developing the DTD from XML 365
- A Home Page DTD 368
- Richness and Entropy 370
- Visual Modeling 371
- XML DTDs from Other Sources 375
- Modeling Relational Databases 376
- Elements or Attributes? 377
- Saving Yourself Typing with Parameter Entities 378
- Modular DTDs 379
- Conditional Markup 381
- Optional Content Models and Ambiguities 382
Why Have a DTD at All?
Recall from Chapter 11, Introduction to XML, that an XML Document Type Definition (DTD) is simply a set of rules that explains how to use XML markup. As long as an XML document is well formed, no need exists to have a DTD at all. In fact, as will be shown later in this chapter, it is possible to derive a DTD just by looking at the XML document. However, some important restrictions apply to an XML document that does not have a DTD.
If you want to be able to validate an XML document without a DTD:
- All the attribute values in the XML document must be specified; you cannot have default values for them.
- No references to entities can be in the XML document (except of course amp, lt, gt, apos, and quot).
- No attributes can be present whose values are subject to normalization.
- In elements whose content consists of only elements, there can be no whitespace (space, tab, or other whitespace characters) between the starting tag of the container element and the start tag of the first element contained in it. The following, for example, would be illegal:
<CHAPTER> <SECTION>………………… </SECTION></CHAPTER>
This is a complicated point, but without the help of a DTD to tell the XML processor whether this whitespace is to be treated as meaningful (as PCDATA or a preserved whitespace), it has no way of knowing whether to delete it.
DTDs and Validation
The DTD describes a model of the structure of the content of an XML document. This model says what elements must be present, which are optional, what their attributes are, and how they can be structured in relation to each other. Although HTML has only one DTD, XML enables you to create your own DTDs for your applications, which gives you complete control over the process of checking the content and structure of the XML documents created for that application. This checking process is called validation. Depending on what you, as the DTD developer, want to achieve, you can exercise almost complete control over the structure and create a strict DTD. When you validate XML documents that were created using this strict DTD, you can insist that certain elements be present, and you can enforce the set order you require. You can check that certain attribute values have been set and, to a limited degree, you can even check that these attribute values are of the right general type.
On the other hand, you can also make almost everything optional and create a loose DTD. You could even have parallel versions of the same DTD, one that enables you to create draft versions of the XML that arent complete and another that rigidly checks that everything is present. It is even possible to insert switches into a DTD that can be used to turn the degree of strictness on and off.
Based on what you have declared in the DTD, when the completed XML document is then validated, what is allowed, and what is not will be completely determined by the choices you made in designing the DTD. The author of the document can then be warned, for example, if elements are not in the right place, as shown in Figure 14.1, or if required elements are missing, as shown in Figure 14.2. (Youll learn more about the application that generated these messages in Chapter 16, XML DTD and Document Validation.)
FIGURE 14.1 Faulty structure warning.
FIGURE 14.2 Missing XML element warning.
Document Type Declarations
After you have decided to use a DTD, the first step is to associate it with an XML document with a document type declaration. The document type definition (DTD) is an XML description of the content model of a type (or class) of documents. The document type declaration is a statement in an XML file that identifies the DTD that belongs to the document, and if an external DTD file is used, it identifies where the DTD entity (the file) can be found.
At its very simplest, a document type declaration looks like the following:
<!DOCTYPE DTD.name [ internal.subset ]>
DTD.name is the name of the DTD. When we come to the topic of validity later on, you will discover that the DTD name should be the same as the root element of the document. So, a DTD designed for a document would be called book, or something similar, and the root element in the document would also be book. Dont forget that XML is case sensitive; if you call the DTD BooK, then you should have a root element BooK.
internal.subset is the contents of the internal DTD subset, the part of the DTD that stays in the XML document itself. We will investigate the internal DTD subset shortly; it contains local element, attribute, and entity declarations. Without the internal DTD subset, there wouldnt really be much point in including a document type declaration.
Internal DTD Subset
For an XML document to be well formed, all the external entities must be declared in the DTD. If you design your application carefully, it may be possible for you to put all the declarations in the internal DTD subset. With all the declarations in the internal DTD subset, the XML processor would not need to read and process external documents.
Note that having an internal subset does not affect the XML documents status as a standalone document. This can be a little confusing at first. When you start off the XML document, the first line is the XML declaration, which can include a standalone document declaration:
<?xml version=1.0" standalone=yes?>
|