Tip: Convert from HTML to XML with HTML Tidy
By Benoit Marchal2003-12-16
Further Processing
Because XHTML documents are valid XML documents, you can insert them into an XML workflow. More specifically, you can post-process them with regular XML tools (XSL, parsers, and the like).
Indeed, I am not very happy with the XHTML vocabulary for this application. Because it's a publishing vocabulary, XHTML has very little structure, and I prefer to maintain photo galleries through the ad hoc XML vocabulary shown in Listing 3 (originally introduced in my tip, Divide and conquer large XML documents). To illustrate an XML workflow, I have written a small XSL stylesheet (see Listing 4) that retrieves the titles, file names, dates, and descriptions from the XHTML document. The stylesheet generates a more structured version of the document that is easier to work with.
Tutorial Pages:
» Preserve Legacy Web Sites With This Handy Utility
» Tool Of The Trade
» Listing 1. index.html (an excerpt)
» Tidying Up
» Listing 2. index.xml (an excerpt)
» Further Processing
» Listing 3. index-transform.xml (an excerpt)
» Listing 4. cleanup.xsl
» Conclusion
First published by IBM developerWorks
