eXtropia: the open web technology company
Technology | Support | Tutorials | Development | About Us | Users | Contact Us
 ::   Tutorials
 ::   Presentations
Perl & CGI tutorials
 ::   Intro to Perl/CGI and HTML Forms
 ::   Intro to Windows Perl
 ::   Intro to Perl 5
 ::   Intro to Perl
 ::   Intro to Perl Taint mode
 ::   Sherlock Holmes and the Case of the Broken CGI Script
 ::   Writing COM Components in Perl

Java tutorials
 ::   Intro to Java
 ::   Cross Browser Java

Misc technical tutorials
 ::   Intro to The Web Application Development Environment
 ::   Introduction to XML
 ::   Intro to Web Design
 ::   Intro to Web Security
 ::   Databases for Web Developers
 ::   UNIX for Web Developers
 ::   Intro to Adobe Photoshop
 ::   Web Programming 101
 ::   Introduction to Microsoft DNA

Misc non-technical tutorials
 ::   Misc Technopreneurship Docs
 ::   What is a Webmaster?
 ::   What is the open source business model?
 ::   Technical writing
 ::   Small and mid-sized businesses on the Web

Offsite tutorials
 ::   ISAPI Perl Primer
 ::   Serving up web server basics
 ::   Introduction to Java (Parts 1 and 2) in Slovak


Introduction to XML For Web Developers
What is XML  

Like HTML, XML (also known as Extensible Markup Language) is a markup language which relies on the concept of rule-specifying tags and the use of a tag-processing application that knows how to deal with the tags.

"The correct title of this specification, and the correct full name of XML, is "Extensible Markup Language". "eXtensible Markup Language" is just a spelling error. However, the abbreviation "XML" is not only correct but, appearing as it does in the title of the specification, an official name of the Extensible Markup Language.

The name and abbreviation were invented by James Clark; other options under consideration had included MGML, (Minimal Generalized Markup Language), MAGMA (Minimal Architecture For Generalized Markup Applications), and SLIM (Structured Language for Internet Markup)" - Extensible Markup Language (XML) 1.0 Specs, The Annotated Version.

However, XML is far more powerful than HTML.

This is because of the "X". XML is "eXtensible". Specifically, rather than providing a set of pre-defined tags, as in the case of HTML, XML specifies the standards with which you can define your own markup languages with their own sets of tags. XML is a meta-markup language which allows you to define an infinite number of markup languages based upon the standards defined by XML.

"The design goals for XML are:
  1. XML shall be straightforwardly usable over the Internet.
  2. XML shall support a wide variety of applications.
  3. XML shall be compatible with SGML.
  4. It shall be easy to write programs which process XML documents.
  5. The number of optional features in XML is to be kept to the absolute minimum, ideally zero.
  6. XML documents should be human-legible and reasonably clear.
  7. The XML design should be prepared quickly.
  8. The design of XML shall be formal and concise.
  9. XML documents shall be easy to create.
  10. Terseness in XML markup is of minimal importance."
- Extensible Markup Language (XML) 1.0 Specs, The Annotated Version.

Let's consider a very simple example. Let's create a new markup language called SCLML (Selena's Client List Markup Language). This language will define tags to represent contact people and information about contact people.

The set of tags will be simple. However, they will be expressive. Unlike <UL> and <LI> XML tags can be immediately understood just by reading the document.

<NAME>Gunther Birznieks</NAME>
<COMPANY>Bob's Fish Store</COMPANY>
<STREET>1234 4th St.</STREET>
<CITY>New York</CITY>
<ZIP>Zip: 10024</ZIP>

<NAME>Susan Czigonu</NAME>
<STREET>9876 Hazen Blvd.</STREET>
<CITY>San Jose</CITY>

Note that the use of XML is not limited to text markup. The very extensibility of XML means that it could just as easily be applied to sound markup or image markup. A tag such as <EMPHASIZE> might be displayed textualy as being bold but audibly as a louder voice!

What you see above is a very simple "XML document". As you can see, it looks pretty similar to an HTML document.

But don't forget, as we said before, it is not enough to simply encode (markup) the data. For the data to be decoded by someone or something else, the encoding markup languages must follow standard rules including:

  1. The syntax for marking up
  2. The meaning behind the markup

In other words, a processing application must know what a valid markup is (perhaps a tag) and what to do with it if it is valid? After all, how would Netscape know what to do with the above document? What in the world is a <PHONE> tag? Is it a legal tag? How should it be displayed? Our markup language must somehow communicate the syntax of the markup so that the processing application will know what to do with it.

In XML, the definition of a valid markup is handled by a Document Type Definition (DTD) which communicates the structure of the markup language. The DTD specifies what it means to be a valid tag (the syntax for marking up).

We'll discuss the details of DTDs later. For now, just get comfortable with the idea of a DTD as a separate component to the equation.

Yet we must also communicate the meaning of the markup as well as the syntax.

To specify what valid tags mean, XML documents are also associated with "style sheets" which provide GUI instructions for a processing application like a web browser. A style sheet, the details of which we will discuss later, might specify display instructions such as:

  1. Anytime you see a <CONTACT>, display it using a <UL> tag. Similarly, </CONTACT> tags should be converted to </UL>
  2. All <NAME> tags can be substituted for <LI> tags and </NAME> tags should be ignored.
  3. All <EMAIL> tags can be substituted for <LI> tags and </EMAIL> tags should be ignored.

In this example, the style sheet utilizes the functionality of HTML to define the formatting of SCLML. But if the XML document was being processed by a program other than a web browser, the HTML translation step might be bypassed.

Processing applications combine the logic of the style sheet, the DTD, and the data of the SCLML document and display it according to the rules and the data.

But wait, isn't this quite complex? Now instead of a single HTML document which defines the data and the rules to display the data, we have an SCLML document, a DTD, AND a style sheet. That's three pieces as opposed to just one.

Further, we need a processing agent that can do the work of putting the DTD, style sheet, and SCLML document together. Remember, web browsers are made to read a specific markup language (like HTML), not any markup language. That means we have three documents to pull together plus one processing program to write or buy. What a mess.

"A software module called an XML processor is used to read XML documents and provide access to their content and structure. It is assumed that an XML processor is doing its work on behalf of another module, called the application." - Extensible Markup Language (XML) 1.0 Specs, The Annotated Version.

Actually however, though there are a few more hurdles to jump in order to use XML, there are several reasons why all this is worth it. Let's take a look at them. . . .

Previous Page | Next Page | Table of Contents