Introduction to XML For Web Developers The Basic DTD

The Prolog and The Body As we mentioned earlier, all documents are made up of a prolog and a body. The document prolog contains the XML Declaration and the document body contains the actual marked up document. Recall from previous sections that we had developed a CONTACTS XML document that looked something like the following:

          <!--Beginning of prolog-->

     <?xml version = "1.0" encoding="UTF-8" standalone = "yes"?>

          <!--End of prolog-->
          <!--Beginning of body-->

     <CONTACTS>

     <CONTACT>
     <NAME>Gunther Birznieks</NAME>
     <EMAIL>gunther@bobsfishstore.com</EMAIL>
     <PHONE>662-9999</PHONE>
     </CONTACT>

     <CONTACT>
     <NAME>Susan Czigonu</NAME>
     <EMAIL>susan@eudora.org</EMAIL>
     <PHONE>555-1234</PHONE>
     </CONTACT>

     </CONTACTS>

          <!--End of body-->

What we did not say earlier was that the prolog also holds the DTD. The Basic DTD The simplest usage of a DTD involves actually adding the DTD into the prolog portion of your XML document, just after the XML declaration.

The skeleton (not quite valid) of a DTD looks something like the following:

     <?xml version = "1.0" encoding="UTF-8" standalone = "yes"?>
     <!DOCTYPE ROOT_ELEMENT_NAME [
	ELEMENT_DEFINITIONS_GO_HERE
	]>
     BODY_GOES_HERE.......

In this case we declare a document with a root element called ROOT_ELEMENT_NAME. For example, we might use the following syntax for our CONTACTS document

     <?xml version = "1.0" encoding="UTF-8" standalone = "yes"?>
     <!DOCTYPE CONTACTS [
	ELEMENT_DEFINITIONS_GO_HERE
	]>
     BODY_GOES_HERE.......
The Document Type Definition and ETDs As we mentioned parenthetically, the above DTD is "not quite valid". The above DTD really only says that the parser should expect a document with a root element named CONTACTS. It does not say anything about the contents or structure of that document. However, to be valid, a document's DTD must specify every detail of its structure!

To specify the structure, we must populate the "[ELEMENT_DEFINITIONS_GO_HERE]" portion of the DTD with a Document Type Definition. Document Type Definitions declare all of the valid document elements using Element Type Declarations (ETDs).

ETDs specify the name of elements and whether or not those elements may have any children. Elements may have several types of children ranging from none, to plain parsed character data, to other elements, to other elements with their own children, to any of the above.

ETD's follow the generic syntax of

	<!ELEMENT ELEMENT_NAME CHILDREN_DECLARATION>

In the case of our CONTACTS element we might see something like the following:

    <?xml version = "1.0" encoding="UTF-8" standalone = "yes"?>
    <!DOCTYPE CONTACTS [
	<!ELEMENT CONTACTS ANY>
	]>

    <CONTACTS>
    </CONTACTS>

In this case, the DTD defines an XML document containing a single root element named CONTACTS (don't forget XML is case sensitive) that may contain ANY (case sensitive) type of child, including parsed character data or other elements.

Note however, that though CONTACTS "could" contain other elements, no element other than CONTACTS is actually allowed by the DTD since no other elements are defined. All elements in an XML document must be defined in the DTD. Thus, the following XML, though well-formed, is invalid!

    <?xml version = "1.0" encoding="UTF-8" standalone = "yes"?>
    <!DOCTYPE CONTACTS [
	<!ELEMENT CONTACTS ANY>
	]>

    <CONTACTS>

    <CONTACT>
        <NAME>Roger Kaplan</NAME>
    </CONTACT>

    </CONTACTS>

NOTE: Unlike elements, parsed character data within an "ANY" declaration, does not need to be defined...thus, the following XML document would be valid:

    <?xml version = "1.0" encoding="UTF-8" standalone = "yes"?>
    <!DOCTYPE CONTACTS [
	<!ELEMENT CONTACTS ANY>
	]>

    <CONTACTS>

    <CONTACT>
    Here is some plain parsed character data.
    </CONTACT>

    </CONTACTS>

For the document to be valid, you must also define the <CONTACT> and <NAME> elements.

    <?xml version = "1.0" encoding="UTF-8" standalone = "yes"?>
    <!DOCTYPE CONTACTS [
	<!ELEMENT CONTACTS ANY>
	<!ELEMENT CONTACT (NAME)>
	<!ELEMENT NAME (#PCDATA)>
	]>

    <CONTACTS>

    <CONTACT>
        <NAME>Roger Kaplan</NAME>
    </CONTACT>

    </CONTACTS>

In this case, we see that we have defined an XML document with a single root element named CONTACTS. CONTACTS may contain parsed character data or child elements (ANY). In particular, CONTACTS may contain the child element CONTACT, CONTACT contains its own child element named NAME (NAME), and NAME contains parsed character data (#PCDATA)!

NOTE: It is bad form to use the ANY keyword for any element other than the root element. Generally, you should try to be as conservative as the DTD wants to be. Think in terms of everything being denied besides what you specifically allow.

Also, note that the order in which you specify ETDs does not matter. Thus,

	<!ELEMENT NAME (#PCDATA)>
	<!ELEMENT CONTACTS ANY>
	<!ELEMENT CONTACT (NAME)>

would work just as well as

	<!ELEMENT CONTACTS ANY>
	<!ELEMENT CONTACT (NAME)>
	<!ELEMENT NAME (#PCDATA)>

Finally, note that you may not specify elements with the same name but with different definitions such as:

	<!ELEMENT CONTACTS ANY>
	<!ELEMENT CONTACT (NAME)>
	<!ELEMENT CONTACT (EMAIL)>
	<!ELEMENT NAME (#PCDATA)>

The double definition of CONTACT would cause an error.

The ANY and #PCDATA keywords are pretty straightforward. And in this case, the definition of the NAME element as a child of CONTACT was pretty simple as well.

However, as we mentioned before, the regular expression functionality offered through DTD's allows you to get very flexible with the definition/declaration of elements and their children.

Let's take a look.....

Previous Page | Next Page | Table of Contents