eXtropia: the open web technology company
Technology | Support | Tutorials | Development | About Us | Users | Contact Us
 ::   Tutorials
 ::   Presentations
Perl & CGI tutorials
 ::   Intro to Perl/CGI and HTML Forms
 ::   Intro to Windows Perl
 ::   Intro to Perl 5
 ::   Intro to Perl
 ::   Intro to Perl Taint mode
 ::   Sherlock Holmes and the Case of the Broken CGI Script
 ::   Writing COM Components in Perl

Java tutorials
 ::   Intro to Java
 ::   Cross Browser Java

Misc technical tutorials
 ::   Intro to The Web Application Development Environment
 ::   Introduction to XML
 ::   Intro to Web Design
 ::   Intro to Web Security
 ::   Databases for Web Developers
 ::   UNIX for Web Developers
 ::   Intro to Adobe Photoshop
 ::   Web Programming 101
 ::   Introduction to Microsoft DNA

Misc non-technical tutorials
 ::   Misc Technopreneurship Docs
 ::   What is a Webmaster?
 ::   What is the open source business model?
 ::   Technical writing
 ::   Small and mid-sized businesses on the Web

Offsite tutorials
 ::   ISAPI Perl Primer
 ::   Serving up web server basics
 ::   Introduction to Java (Parts 1 and 2) in Slovak


Introduction to XML For Web Developers
Introducing the Valid XML Document and the DTD  

In the last section, we reviewed the process of creating a "well-formed" XML document.

As you saw, there are many rules you must follow in order to assure that your XML document is well-formed. But even when you write well-formed XML documents, you're not quite out of the woods! Making your document well-formed is only half the battle. You must also make sure that the document is valid

A valid document by definition, is a well-formed XML document. But validity goes one step further. A valid XML document is also a well-formed SGML document, and as such, can be read and interpreted as one.

To pass the SGML validity test, an XML document must conform to the specifications defined by a Document Type Definition (DTD). You can think of the DTD as defining the overall structure and syntax of the document. The DTD is in fact the meat of the "meta-markup" concept. The DTD defines the grammar and vocabulary of a markup language. In short, the DTD specifies everything a parser needs to know in order for that parser to interpret a well-formed XML document.

This "specification" can be as simple as listing all the valid elements (such as elements, tags, attributes, entities) that an XML document may contain, or can be as complex as specifying relationships between those elements (such as element X must contain either Element Y or Element Z but never both).

For example, do you remember our CONTACT XML document from previous sections?

A CONTACT DTD might specify that every CONTACT has an <ADDRESS> element that must define <STREET>, <CITY>, <STATE>, and <ZIP> elements, in that particular order. Further, the DTD could specify that <ADDRESS> elements may contain multiple <STREET> elements (though they must at least contain one).

To help you get a feel for the difference between well-formed XML and valid XML, consider the following well-formed English:

brown jumped
the the fox.
quick over dog

As you can see, all the words and punctuation represent well-formed elements of English. However, unless you are into absurdist poetry, the words and punctuation are virtually meaningless, and difficult to interpret (especially by a computer).

To be valid English, the words must conform to a standard grammatical structure. For example,

The quick brown fox jumped over the lazy dog.

In the case of the markup languages defined by XML, the DTD provides the grammatical structure to bring order to the elements of the language.

To specify grammatical rules, DTDs take advantage of a set of regular expressions that match for specified patterns within the XML document in order to determine whether or not the document is valid. Matching is done conservatively so that anything not specifically allowed by the DTD is forbidden.

Okay, enough about what DTD's are....let's look at how you'll build them.

Previous Page | Next Page | Table of Contents