However cool the idea of escaping the limitations of
a basic tag set (like HTML) sounds, it isn't even close
to the best thing about XML?
The real power of XML comes from the fact that with XML, not only
can you define your own set of tags, but the rules specified by
those tags need not be limited to formatting rules. XML allows you
to define all sorts of tags with all sorts of rules, such as tags
representing business rules or tags representing data description
or data relationships.
Consider again the case of the contact list in SCLML.
Using standard HTML, a developer might use something like the following:
<UL>
<LI>Gunther Birznieks
<UL>
<LI>Client ID: 001
<LI>Company: Bob's Fish Store
<LI>Email: gunther@bobsfishstore.com
<LI>Phone: 662-9999
<LI>Street Address: 1234 4th St.
<LI>City: New York
<LI>State: New York
<LI>Zip: 10024
</UL>
<LI>Susan Czigonu
<UL>
<LI>Client ID: 002
<LI>Company: Netscape
<LI>Email: susan@eudora.org
<LI>Phone: 555-1234
<LI>Street Address: 9876 Hazen Blvd.
<LI>City: San Jose
<LI>State: California
<LI>Zip: 90034
</UL>
</UL>
While this may be an acceptable way to store and display
your data, it is hardly the most efficient or powerful. As you are
probably aware, there are many potential problems associated with
marking up your data using HTML. Three particularly serious problems
come to mind:
- The GUI is embedded in the data. What happens if
you decide that you like a table-based presentation better than a
list-based presentation? In order to change to a table-based
presentation, you must recode all your HTML! This could mean editing
many of pages.
- Searching for information in the data is tough. How would you get
a quick list of only the clients in California? Certainly, some
type of script would be necessary. But how would that script work? It
would probably have to search through the file word for word looking
for the string "California". And even if it found matches, it
would have no way of knowing that California might have a relationship
to "New York" - that they are both states. Forget about the
relationships between pieces of data which are crucial to power
searching.
- The data is tied to the logic and language of HTML. What happens
if you want to present your data in a Java applet? Well,
unfortunately, your Java applet would have to parse through the HTML
document stripping out tags and reformat the data. Non-HTML
processing applications should not be burdened with extraneous work.
With XML, these problems and similar problems are solved. In XML, the
same page would look like the following:
<CLIENT>
<NAME>Gunther Birznieks</NAME>
<ID>001</ID>
<COMPANY>Bob's Fish Store</COMPANY>
<EMAIL>gunther@bobsfishstore.com</EMAIL>
<PHONE>662-9999</PHONE>
<STREET>1234 4th St.</STREET>
<CITY>New York</CITY>
<STATE>New York</STATE>
<ZIP>Zip: 10024</ZIP>
</CLIENT>
<CLIENT>
<NAME>Susan Czigonu</NAME>
<ID>002</ID>
<COMPANY>Netscape</COMPANY>
<EMAIL>susan@eudora.org</EMAIL>
<PHONE>555-1234</PHONE>
<STREET>9876 Hazen Blvd.</STREET>
<CITY>San Jose</CITY>
<STATE>California</STATE>
<ZIP>90034</ZIP>
</CLIENT>
As you can see, custom tags are used to bring meaning to the data being
displayed. When stored this way, data becomes extremely portable
because it carries with it its description rather than its display.
Display is "extracted" from the data and as we will see later,
incorporated into a "style sheet".
Let's consider some of the benefits.
- With XML, the GUI is extracted. Thus, changes to display
do not require futzing with the data. Instead, a separate
style sheet will specify a table display or a list display.
- Searching the data is easy and efficient. Search engines can simply
parse the description-bearing tags rather than muddling in the data.
Tags provide the search engines with the intelligence they lack.
- Complex relationships like trees and inheritance can be
communicated.
- The code is much more legible to a person coming into the environment
with no prior knowledge. In the above example, it is obvious that
<ID>002</ID> represents an ID whereas <LI>002 might
not. XML is self-describing.