///Tip: Batch Processing XML with XSLT 2.0

Tip: Batch Processing XML with XSLT 2.0

Use Directory Listings in XML to Drive XSLT 2.0 Processing

A common problem with XSLT is that it takes only a single XML file as input. You can use a cross-platform Java™ tool to create an XML directory listing, then use XSLT to process every file in the directory from that listing. This tip covers installation and use of such a tool, as well as the corresponding XSL that processes multiple files from the directory listing.

Don’t you wish that XSLT processors like Saxon could use more than one file as input? Often, you’re faced with a directory of XML files that require conversion into HTML. You could run Saxon on each of them, but what if you want another file at the end that has an index to all the HTML files you’ve created?

What you need is an XML version of the directory listing. Then, you could use that XML file as the single input file to XSLT and process each file using XSLT. It would be wonderful if you could do the directory processing in XSLT directly. Unfortunately, with all the power of XSLT — and particularly XSLT 2.0 — the language still doesn’t have directory operations.

HXDLG to the Rescue!

While surfing the Web, I found an obscure little Java program called the HTML/XML Directory List Generator (HXDLG) on SourceForge (see Resources). One of the functions of HXDLG is to create either HTML or XML representations of directory listings. I downloaded the tool and ran the statement in Listing 1 from the command line.

Listing 1. Code to create an XML directory using HXDLG

java -jar hdlg.jar XML

/Users/jherr/Projects/ibm_xml_tips/filelist/testfiles/
/Users/jherr/Projects/ibm_xml_tips/filelist/files.xml

The program takes three arguments. The first argument is the output type — either XML or HTML. The second argument is the directory path. The third argument is the path of the output XML file. The result looks something like the code in Listing 2.

Listing 2. The directory in XML

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE hdlg:filesystem SYSTEM
"http://www.hdlg.info/XML/filesystem.dtd">
<hdlg:filesystem
xmlns:hdlg="http://www.hdlg.info/XML/filesystem">
<hdlg:folder name="testfiles"
url="file:/Users/jherr/Projects/ibm_xml_tips/filelist/testfiles/">
<hdlg:file name="test1.xml" size="179"
type="unknown"
url="file:/ibm_xml_tips/filelist/testfiles/test1.xml">
</hdlg:file>
<hdlg:file name="test2.xml" size="181"
type="unknown"
url="file:/ibm_xml_tips/filelist/testfiles/test2.xml">
</hdlg:file>
<hdlg:file name="test3.xml" size="181"
type="unknown"
url="file:/ibm_xml_tips/filelist/testfiles/test3.xml">
</hdlg:file>
</hdlg:folder>
</hdlg:filesystem>

That’s some high-end stuff. It has a Document Type Definition (DTD) and uses namespaces, and also has the file names and URLs that you’re looking for. With absolute paths, to boot!

Test It Out

To test this system, I’m using a sample set of test results in three different XML files: test1.xml, test2.xml, and test3.xml. I want to read them all and create corresponding HTML files for each one. Listing 3 shows one such sample test file.

Listing 3. A test file in XML

<?xml version="1.0" encoding="UTF-8"?>

<testrun run="test1">
<test name="foo" pass="true" />
<test name="bar" pass="true" />
<test name="baz" pass="true" />
</testrun>

The first step is to run HXDLG to get the directory listing in XML. This directory listing contains the URLs of the test files and will be the input to the XSL stylesheet.

Reading from multiple files in XSL

For the first pass, I’m just going to read the files and print the test name (see Listing 4). Doing so ensures that I can parse the directory structure and read the target files.

Listing 4. Printout of test names

<?xml version="1.0" encoding="UTF-8"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<xsl:output method="text" indent="no"/>

<xsl:template match="/">
<xsl:for-each select="//*:file">
<xsl:variable select="document(@url)" name="contents" />
<xsl:value-of select="$contents/testrun/@run" /><xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:template>

</xsl:stylesheet>

The first thing the XSLT engine does with the directory listing, which is the input, is match it to the template. The template then iterates through each file tag using the for-each XSL tag. The fun stuff happens when I use the XSL variable tag to call document, which reads the contents of the specified XML file into the variable. XSL makes reading XML documents a snap.

Now, with the contents of the XML test file in hand, I use the value-of tag to print the name of the test run followed by a carriage return with the xsl:text tag (see Listing 5).

Listing 5. The output of the first XSL template

test1

test2
test3

The output shows three files and three tests. So, the tool’s working so far. Now all I have to do is build the HTML for each test result. To do that, I’m going to use the xsl:result-document tag, a new feature of XSLT 2.0. (That’s why in Listing 6, the version attribute on the stylesheet tag has been bumped to 2.0.)

Listing 6. The stylesheet that creates the HTML files

<?xml version="1.0" encoding="UTF-8"?>

<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">

<xsl:output method="text" indent="no"/>
<xsl:output method="html" indent="yes" name="html"/>

<xsl:template match="/">
<xsl:for-each select="//*:file">
<xsl:variable select="document(@url)" name="contents" />
<xsl:variable select="replace(@url,'[.]xml','.html')"
name="newfile" />
Creating <xsl:value-of select="$newfile" />
<xsl:result-document href="{$newfile}" format="html">
<html><body>
Test run: <xsl:value-of select="$contents/testrun/@run" />
</body></html>
</xsl:result-document>
</xsl:for-each>
</xsl:template>

</xsl:stylesheet>

Where I used to print the test run name, I now use a variable tag to build a new file name for the HTML. Using the XPath function replace, I take the original URL and replace the .xml extension with .html to create the new file name.

Next, I print the name of the file to let the user know what I’m creating. This is always a good idea because otherwise you would see nothing and have no idea whether the stylesheet did anything.

After I print the message, I use the xsl:result-document tag to create the new file, with some HTML that gives the name of the test run. One thing to notice here is that I had to use a format statement to specify that the output file should be HTML. If I hadn’t done this, the file that I created would be in text format and all the HTML tags would have been ignored.

Summary

Batch processing in XSLT 2.0 is simple if you have a directory listing utility that exports XML and know how to use the xsl:result-document tag to redirect the output of the engine. With these tools in hand, you no longer need fear the directory of XML files that you once might have merged into one mega-file to ease processing.

Resources

• Visit the XSL standards site at the W3C, a handy reference to XSL technologies and standards.

• Check out the XPath page at the W3C, which provides version and standard information.

• Download Saxon, the popular XSL processor that was used in the creation of this article.

• Read Michael Kay’s XSLT 2.0 Programmer’s Reference, the bible of XSLT. It’s a fantastic introduction and a valuable reference work.

• While you’re at it, pick up XPath 2.0 Programmer’s Reference by Michael Kay — the ultimate reference by the man who wrote the W3C specification.

• See the HXDLG Web site for more information about the command-line tool and to download it.

• Read Code Generation in Action by Jack D. Herrington, which covers generating code for a wide variety of targets not limited to database access.

• Find out how XHTML 2.0 goes to great lengths to balance machine processing ability with authoring convenience in Micah Dubinko’s article ” Linking in XHTML 2.0” (developerWorks, March 2005).

• Find hundreds more XML resources on the developerWorks XML zone.

• Learn how you can become an IBM Certified Developer in XML and related technologies.

2010-05-26T11:23:55+00:00 May 4th, 2005|XML|0 Comments

About the Author:

An engineer with with more than 20 years of experience, Jack Herrington is currently Editor-in-Chief of the Code Generation Network. He is the author of Code Generation in Action . You can contact him at jack_d_herrington@codegeneration.net.

Leave A Comment