Like HTML, XML (also known as Extensible Markup Language) is a markup
language which relies on the concept of rule-specifying tags and the
use of a tag-processing application that knows how
to deal with the tags.
"The correct title of this specification, and the correct full name of XML, is "Extensible Markup
Language". "eXtensible Markup Language" is just a spelling error. However, the abbreviation "XML" is
not only correct but, appearing as it does in the title of the specification, an official name of the Extensible
Markup Language.
The name and abbreviation were invented by James Clark; other options under consideration had
included MGML, (Minimal Generalized Markup Language), MAGMA (Minimal Architecture For Generalized Markup Applications), and SLIM (Structured Language for Internet Markup)" - Extensible Markup Language (XML) 1.0 Specs, The Annotated Version.
|
However, XML is far more powerful than HTML.
This is because of the "X". XML is "eXtensible". Specifically, rather
than providing a set of pre-defined tags, as in the case of HTML, XML
specifies the standards with which you can define your own markup
languages with their own sets of tags. XML is a meta-markup language
which allows you to define an infinite number of markup languages
based upon the standards defined by XML.
"The design goals for XML are:
- XML shall be straightforwardly usable over the Internet.
- XML shall support a wide variety of applications.
- XML shall be compatible with SGML.
- It shall be easy to write programs which process XML documents.
- The number of optional features in XML is to be kept to the absolute
minimum, ideally zero.
- XML documents should be human-legible and reasonably clear.
- The XML design should be prepared quickly.
- The design of XML shall be formal and concise.
- XML documents shall be easy to create.
- Terseness in XML markup is of minimal importance."
- Extensible Markup Language (XML) 1.0 Specs, The Annotated Version.
|
Let's consider a very simple example. Let's create a new markup
language called SCLML (Selena's Client List Markup Language). This
language will define tags to represent contact people and information about
contact people.
The set of tags will be simple. However, they will be expressive. Unlike
<UL> and <LI> XML tags can be immediately understood just by
reading the document.
<CONTACT>
<NAME>Gunther Birznieks</NAME>
<ID>001</ID>
<COMPANY>Bob's Fish Store</COMPANY>
<EMAIL>gunther@bobsfishstore.com</EMAIL>
<PHONE>662-9999</PHONE>
<STREET>1234 4th St.</STREET>
<CITY>New York</CITY>
<STATE>New York</STATE>
<ZIP>Zip: 10024</ZIP>
</CONTACT>
<CONTACT>
<NAME>Susan Czigonu</NAME>
<ID>002</ID>
<COMPANY>Netscape</COMPANY>
<EMAIL>susan@eudora.org</EMAIL>
<PHONE>555-1234</PHONE>
<STREET>9876 Hazen Blvd.</STREET>
<CITY>San Jose</CITY>
<STATE>California</STATE>
<ZIP>90034</ZIP>
</CONTACT>
Note that the use of XML is not limited to text markup. The very extensibility of XML means that it could just as easily be applied to sound markup or image markup. A tag such as <EMPHASIZE> might be displayed textualy as being bold but audibly as a louder voice!
|
What you see above is a very simple "XML document". As you can see, it looks pretty similar to an HTML document.
But don't forget, as we said before, it is not enough to simply encode
(markup) the data. For the data to be decoded by someone or something
else, the encoding markup languages must follow standard rules
including:
- The syntax for marking up
- The meaning behind the markup
In other words, a processing application must know what a valid
markup is (perhaps a tag) and what to do with it if it is valid?
After all, how would Netscape
know what to do with the above document? What in the world is a
<PHONE> tag? Is it a legal tag? How should it be displayed? Our markup language must somehow communicate the
syntax of the markup so that the processing application will know
what to do with it.
In XML, the definition of a valid markup is handled by a
Document Type Definition (DTD) which communicates the structure of
the markup language. The DTD specifies what it means to be a valid
tag (the syntax for marking up).
We'll discuss the details of DTDs later. For now, just get comfortable
with the idea of a DTD as a separate component to the equation.
Yet we must also communicate the meaning of the markup as well as the
syntax.
To specify what valid tags mean, XML documents are also associated with
"style sheets" which provide GUI
instructions for a processing application like a web browser. A style
sheet, the details of which we will discuss later, might specify display
instructions such as:
- Anytime you see a <CONTACT>, display it using a <UL>
tag. Similarly, </CONTACT> tags should be converted to </UL>
- All <NAME> tags can be substituted for <LI> tags and
</NAME> tags should be ignored.
- All <EMAIL> tags can be substituted for <LI> tags and
</EMAIL> tags should be ignored.
etc.....
In this example, the style sheet utilizes the functionality of HTML to
define the formatting of SCLML. But if the XML document was being processed by a program other than a web browser, the HTML translation step might be bypassed.
Processing applications combine the logic of the style sheet,
the DTD, and the data of the SCLML document and display it according to
the rules and the data.
But wait, isn't this quite complex? Now instead of a single HTML
document which defines the data and the rules to display the data, we
have an SCLML document, a DTD, AND a style sheet. That's three pieces
as opposed to just one.
Further, we need a processing agent that can do the work of putting the
DTD, style sheet, and SCLML document together. Remember, web browsers
are made to read a specific markup language (like HTML), not any
markup language. That means we have three documents to pull together plus one processing program to write or buy. What a mess.
Actually however, though there are a few more hurdles to jump in order to use XML, there are several reasons why all this is worth it. Let's take a look at them. . . .
Previous Page |
Next Page |
Table of Contents
|