home account info subscribe login search My ITKnowledge FAQ/help site map contact us


 
Brief Full
 Advanced
      Search
 Search Tips
To access the contents, click the chapter and section titles.

Platinum Edition Using HTML 4, XML, and Java 1.2
(Publisher: Macmillan Computer Publishing)
Author(s): Eric Ladd
ISBN: 078971759x
Publication Date: 11/01/98

Bookmark It

Search this book:
 
Previous Table of Contents Next


When you use enumerated notations in an attribute declaration such as this, every notation you name must have been declared in the DTD before the XML processor reaches this part of the DTD. This means, for example, that if you use notations in the internal DTD subset, you must declare the notation in the internal DTD subset, too, and not in the external DTD subset (as you may remember, the internal DTD subset is read before the external DTD subset).

As you will have noticed, no doubt, typing in the public identifiers for notations can quickly become a very tedious and error-prone business. It is a good idea to collect all your notation declarations into one file. It will also help you to remember what the file is if you give it an obvious name, such as “graphics.ent”—and then reference that file in all the DTDs you create by using an external entity declaration that points to this file:

<?xml version=“1.0” standalone=“no”?>
<!DOCTYPE chapter SYSTEM “chapter.dtd” [
   <!ENTITY % myentities SYSTEM “mysymbols.ent”>
   %myentities;
]>
<chapter>
   <number/>
   …
</chapter>

If for no other reason, keeping all your notation declarations in a separate file will mean that you will have to edit the file only once if something changes, instead of having to edit every separate DTD, and you are less likely to forget a declaration you need.

Entities

Without getting too involved in the precise technicalities of the terminology, XML brings markup a step closer to the world of object orientation, as in object-oriented programming. The basic object in XML’s world is the entity, be it the XML document entity itself, the elements it contains, or the internal and external entities that it references.

The DTD itself is, of course, also an external entity (a special type of parameter entity, in fact) that the document references, but the relationship is a little more complex than this. The DTD describes a class of XML document entities, of which the actual XML document is an instantiation.

The entities (excluding the XML document entity because it really isn’t of any further interest in this context) are divided into three types: character entities, general entities, and parameter entities. General entities can then be further subdivided into two other types: internal entities and external entities. To confuse things a little more, external entities are subdivided into parsed entities (these contain character data) and unparsed entities (these usually contain binary data). This hierarchy is shown in Figure 15.1. Finally, common usage plays with the terms so that parsed general entities are usually referred to as internal and external text entities, and unparsed external general entities are often just called binary entities.

You might find it easier, however, to just think of text entities, which can be either internal or external, and binary entities, which have to be external. This classification is actually helped by the way that you declare the entities. An internal text entity declaration looks like this:

<!ENTITY namereplacement text”>

and an external text entity declaration looks like this:

<!ENTITY name SYSTEM “system.identifier”>
<!ENTITY name PUBLIC public.identifiersystem.identifier”>

But character entity declarations (they are a special case of an internal text entity) look like this:

<!ENTITY name&#code;”>

and text and character entity references (they are identical) look like this:

&name;

Note that an entity reference must not contain the name of an unparsed entity. Unparsed entities may be referred to only in attribute values declared to be of the type ENTITY or ENTITIES.


FIGURE 15.1  XML entity types follow a specific hierarchy.

Internal Entities

Internal entities are entities whose definitions contain their values. No separate physical storage object (file) exists, and the content of the entity is given in the declaration, although it may be necessary for the XML processor to resolve any entity and character references in the entity value to produce the correct replacement text. Internal entities are parsed and must not contain references to themselves, either directly or indirectly.

Binary Entities

Binary entities contain unparsed data (graphics data, sound data, and so on). When they are declared, they must be identified as a notation. The notation must also have been declared in the DTD:

<!NOTATION notation.namepublic.identifier” “helper.application”>
<!ENTITY entity.name NDATA notation.name>

Binary entities can only be referenced in the value of an attribute that has been declared to be of type ENTITY or ENTITIES in the DTD.

<!ELEMENT element.name EMPTY>
<!ATTLIST element.name
   attribute.name NDATA notation.name>

And this binary would then be referred to in the XML document like this:

<element.name attribute.name=“entity.name”/>

System Identifiers

A system identifier can be either a relative path to a filename (for example, ..\..\graphics\home.gif) or an absolute path to a filename (for example, C:\Program Files\LView\lview.exe).

A system can also be a universal resource name (URN). The URN is an enhancement of the URL (universal resource locator) system; you are probably used to seeing URLs in the form of World Wide Web addresses: http://www.rpi.edu/~odonnj/index.html

In this case, www.rpi.edu is my service provider’s Web server, and ~odonnj is a pointer that the UNIX system translates into my login (home) directory. The Web server then directs the Web browser to the designated Web page directory coupled to my login name, and index.html is the name of the file. URIs are a type of URN, and URNs are a kind of superset. As far as we are concerned, the two are more or less synonymous, so I will simply call it a URL and be done with it. The syntax for a full URL looks like the following:

scheme://login-name:password@host:port//path

scheme is a “protocol” and most of host information (login-name, password, and port) is only entered when it is really needed. The scheme could be http (Hypertext Transfer Protocol), ftp (File Transfer Protocol), gopher (the Gopher protocol), news (Usenet news; this one breaks the rule because protocol is actually nntp, which stands for Net News Transfer Protocol), wais (for Wide Area Information Servers), or file (for local file access). Several more exist, but many of them, such as mailto, wouldn’t make much sense for retrieving information.

Public Identifier Resolution

The system identifier is reasonably straightforward. The public identifier is far more complicated, but in practice it is a lot simpler.


Previous Table of Contents Next


Products |  Contact Us |  About Us |  Privacy  |  Ad Info  |  Home

Use of this site is subject to certain Terms & Conditions, Copyright © 1996-2000 EarthWeb Inc.
All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.