Helping ordinary people create extraordinary websites!
HOME TUTORIALS SCRIPTS WEB HOSTING BLOG FORUM
Get Our Newsletter
Email:

Tip: Convert from HTML to XML with HTML Tidy

By Benoit Marchal
2003-12-16


Further Processing

Because XHTML documents are valid XML documents, you can insert them into an XML workflow. More specifically, you can post-process them with regular XML tools (XSL, parsers, and the like).

Indeed, I am not very happy with the XHTML vocabulary for this application. Because it's a publishing vocabulary, XHTML has very little structure, and I prefer to maintain photo galleries through the ad hoc XML vocabulary shown in Listing 3 (originally introduced in my tip, Divide and conquer large XML documents). To illustrate an XML workflow, I have written a small XSL stylesheet (see Listing 4) that retrieves the titles, file names, dates, and descriptions from the XHTML document. The stylesheet generates a more structured version of the document that is easier to work with.

Tutorial Pages:
» Preserve Legacy Web Sites With This Handy Utility
» Tool Of The Trade
» Listing 1. index.html (an excerpt)
» Tidying Up
» Listing 2. index.xml (an excerpt)
» Further Processing
» Listing 3. index-transform.xml (an excerpt)
» Listing 4. cleanup.xsl
» Conclusion


First published by IBM developerWorks


Related Tutorials:
» Starting with XML
» Performing Client-Side XSL Transformations
» Create a Google Sitemap for your Web Site
» XML and Scripting Languages
» Parsing Comma-Separated Values
» XML Security Suite: Increasing the Security of E-Business

Ask A Question
characters left.