What is XML and why use it?
HTML is the foundation of the WWW and is perfect for presenting a multitude of web pages. Problems arise when large sites need a consistent look and feel with variable content. Creating 30 almost identical HTML files can be straightforward – maintaining each one and making sure that changes in one are reflected in all others is the most laborious and error-prone problem of web site design.
Enter XML. CSS stylesheets began the process of separating the data (HTML) from the presentation (CSS) and the use of external CSS files common to a range of HTML files greatly improved the situation. However, this still left changes to the HTML meaning changes to every single file. XML continues the separation of data from presentation, to the point that one XML stylesheet contains all the common HTML code and a separate CSS stylesheet contains the formatting code. The XML file itself only needs to contain the code that is specific to that page.
The CodeHelp XML site uses XML to reduce the total site size by 50% by removing the need to duplicate the basic HTML structure of the page. This HTML code is held in the XML stylesheet – one file that is used to create all other XML files that are linked to it. One file to update, one file to check. The larger the site, the greater the benefits of XML. Each page is then constructed from the stylesheet with only the customised data loaded from the XML file. Custom written tags provide total control over where and how the data is included. The XML files used in CodeHelp contain only 20 or so tags. All the rest of the code – backgrounds, main index page links, positioning code, mailto links, other common images, all are constructed on the fly from the stylesheet. Processing time is reduced because each file uses the same file from the browser cache instead of downloading another 12kb of repeated data.
A note about XML, standards and browsers
The CodeHelp site uses XSL – eXtensible Stylesheet Language which is a transformational language, not a simple formatting language like CSS. Microsoft Internet Explorer 5 uses formatting with XSL. CodeHelp uses links both to a CSS and an XSL stylesheet, IE picks the XSL version. Other XML capable browsers (like Opera4) only use the CSS stylesheet. XSL is a W3C standard which comes in two parts – a “transformation language” used for preparing documents for display, and a “formatting object set” that is used for actual visual styling. The formatting object set should still be considered a work in progress. However, the transformation language is the main use of XML within the CodeHelp site. It is the ability to transform the XML that provides the benefits of reducing the total size of the site (by reducing duplication of code) and the ability to write new pages in XML (less typing and less errors) and export in accurate, reliable and precise HTML4.
Within the CodeHelp site, the main difference between XML with CSS and XML with XSL is the lack of hyperlinks in the CSS version – the CSS cannot transform the XML data into a <a href></a> tag, it can only format the contents of the href, title and descriptive text which the XML contains. Strangely, there is a way of asking IE5 to create a hyperlink in a CSS/XML combination using the html: namespace. However, this appears not to function in Opera. If anyone finds an XML site which has functioning links when displayed in Opera (CSS/XML or XSL/XML), please let me know at contact me
The XML language syntax
Watch those capitals – XML is very particular
HTML does not mind if you use <p> and then follow
with </P>, the result will be the same because case-sensitive
tags are not a problem in HTML. XML is completely case-sensitive. The
above example will cause an XML error.
XML errors are also different from HTML – an error in
the actual XML code in an XML page will cause the page to simply not
load. You’ll get a message saying where the error was found (albeit a
little cryptically sometimes) and nothing more will happen. Whereas an
leaving the page itself still displayed, an XML error prevents anything
being displayed except the error.
The most common error is case. Every tag must match the
definition and the template. If you define a custom tag as
<MyTag> in the stylesheet (.xsl file) and then use <MYTAG>
in the XML file, you will be shown the error. Reduce this error to a
minimum by only using CAPITALS for all tags.
The next error affects closing tags. If you are in the
habit of encoding HTML paragraphs as TEXT<p> instead of
<p>TEXT</p> then you will be in for a lot of work
correcting your converted XML. Netscape already has problems
implementing CSS in pages where the closing tag </p> is not used.
Tags that do not have a closing tag defined by HTML (like <meta>
and <br> must be expressly coded as such in the XML by adding a /
to the tag as follows: <br/>.
Finally, XML even requires ordinary HTML tags to become
Opening HTML tags used in the stylesheet must match the corresponding
closing tag. This can be one of the hardest errors to correct. Consider:
This causes TWO XML errors – the UL does not match ul
and the li does not match LI. Note the use of the </li> tag that
is otherwise optional in HTML.
A final error concerns the use of <, >, ", ‘,
& characters. These have special significance in a lot of web based
languages and you need to use the coded versions to prevent mysterious
For < use <
For > use >
For & use &
For " use "
For ' use '
Converting an existing HTML site to XML
The benefits of planning.
Your first XML site can seem daunting, but by upgrading
an existing site, you have a head start. Planning is just as important
in XML as in any good web design. XML focuses on the entire site,
instead of each individual page, so examine the pages within the site
and identify all those that contain repeated HTML code. Identify all
sections within those pages that contain data unique to that page.
Note that XML can be very specific. There’s no need to
repeat any data, XML can change tags one attribute at a time. The first
CodeHelp XML site used the <object> tag to load external HTML
content but as this tag has a long string of attributes, there was no
need to include the whole tag in the XML file. The position, codetype
and border settings are also common to all files. So the XML file only
contained the location of the specific file to load for that XML page.
Other data – like the customised links for each page,
add only the href and title attributes. XML is not restricted to adding
or altering tag attributes – the text used for the link itself is also
imported from the XML page. Other text can also be included in this way.
Now you are ready to design your first custom tags.
Start with your main
tag, the equivalent of the <html> tag – use a name that includes
all facets of your XML site. Now, using your list of unique data
attributes, plan a unique name for each type of setting. e.g.
The main tag: <CODEHELP>
Tag to set customised links: <NAVIGATE>
Tag to set customised content: <PAGE>
The final part of this process is to define how your
data fits into these custom tags. e.g. NAVIGATE needs to hold data for
the href attribute, data for the title attribute and data for the text
link itself. I use FILE, DESC and CLICK respectively. PAGE only needs
one piece of data, the location of the page to load, LOAD.
Creating your XML stylesheet
First, convert the HTML.
Now, edit the file that contains the HTML for what will be the first page of your new XML site.
1. Save the file with an .xsl extension instead of .html
2. Delete any reference to a <!DOCTYPE … and all references to
<meta data. The DOCTYPE will be changed to XML later in the XML file
itself. Meta data is not needed for this file.
3. Add the following code at the very top of the document, above the <html> tag:
4. At the end of the file, after the closing </html> tag, add the following code:
Load the xsl file into IE5 and prepare for lots of
errors! Even the best HTML pages will fail once parsed by XML. Take
note of each error and check the XML syntax page at CodeHelp. Or keep
reading and check out the test files at the end of this
If you come across errors that just don’t seem to go
away, try removing that section of HTML. Better to get a partial but
working page displayed than a series of errors. Once the page is
correct, IE5 will display the .xsl data in a tree like structure (hence
the need for those closing tags) with all the data contained within the
next layer of the tree.
Adding custom data to the page
Applying an XML template
In the .xsl file, insert the following line in place of
the entire tag that contains the data to be loaded from the XML file.
e.g. for NAVIGATE, I removed the <li><a href=""
title="">text</a></li> line in order to load customised
settings for the href, title and text. Insert:
Make sure you specify the correct selection, replace CODEHELP/NAVIGATE with your main tag and the desired custom tag.
Now you need to create the template to use. At the end
of the .xsl file, after the final </xsl:template> line add and
customise the following extract from codehelp.xsl to match your
Note: The <a> tag is not left half open at the
start (<a ). XML adds the attribute values inside the tag itself by
matching the attribute with the value-of the XML data. To insert plain
text, outside the tag, omit the xsl:attribute tags, as with the CLICK
Where’s the data?
Creating the first XML file.
Now create a simple text file with an .xml extension and insert the following start code:
<?xml-stylesheet type="text/xsl" href="codehelp.xsl"?>
Remember to specify your .xsl file in place of codehelp.xsl
The XML file only contains the tags you entered in the
.xsl stylesheet. Start with your main tag, making sure all other tags
are enclosed within it. Then include each tag, matching the layout you
<DESC>All the HTML help your website needs</DESC>
To get this sample .xml file to operate, try a cut down
version of codehelp.xsl: (Reproduced here only for comparison. There is
a link to the file itself on the next page.)
title="home page?">Your home page?</a></li>
<xsl:apply-templates select="CODEHELP/NAVIGATE" />
title="the mandatory link page">Links</a></li>
The Test Files
A reduced stylesheet and one XML file.
To help you with your development, these are the two
files that have been described in the previous pages. Note that not all
browsers will render these files as intended – showing the source code.
You may need to view the source code manually.
The .xsl link will display the file in IE5 using the tree structure
described earlier. Save the file as an .xsl file in order to work on the xml file offline.
The .xml link will load the XML test file using the cut-down stylesheet
on the server. Save this file as an .xml file in the same directory as the .xsl stylesheet.
Now load either file in IE5 to view the files offline.
Recommended XML sites for further study