Tip: Convert from HTML to XML with HTML Tidy
By Benoit Marchal2003-12-16
Preserve Legacy Web Sites With This Handy Utility
Level: Introductory
This tip demonstrates how to convert HTML documents to XML (or more specifically, XHTML) with a simple, open source tool, HTML Tidy. This conversion is useful for webmasters who are migrating to XML. It can also help XML converts who have to interface with legacy HTML tools.
One the challenges that webmasters face when converting from pure HTML to XML/XSL is the preservation of their legacy Web sites. Because it would be too costly to dump the old site and start again from scratch, some sort of automated procedure that brings the HTML site to XML is required.
Even XML converts have to deal with HTML files: Many products have added an option for exporting HTML documents -- an option you might want to integrate into your Web site.
This tip discusses HTML Tidy, a powerful tool to help convert old HTML pages to newer standards, such as XML. Tidy is distributed as open source.
Tutorial Pages:
» Preserve Legacy Web Sites With This Handy Utility
» Tool Of The Trade
» Listing 1. index.html (an excerpt)
» Tidying Up
» Listing 2. index.xml (an excerpt)
» Further Processing
» Listing 3. index-transform.xml (an excerpt)
» Listing 4. cleanup.xsl
» Conclusion
First published by IBM developerWorks
