Helping ordinary people create extraordinary websites!
HOME TUTORIALS SCRIPTS WEB HOSTING BLOG FORUM
Get Our Newsletter
Email:

XML and Scripting Languages

By Parand Tony Darugar
2005-05-18


Tree-based processing

The methodologies we have discussed so far are based on processing the XML document as a stream -- in the course of parsing the file, handlers are called as each tag is encountered. This provides an efficient means of processing XML, both in terms of memory usage and processing time. Certain tasks, however, are somewhat difficult to do. Imagine, for example, needing to move or rearrange certain segments of the document, or sorting items within the document. Because we receive the document as a stream, we would need to store the components before sorting or rearranging them. A mechanism that would store the components automatically would make such tasks substantially easier.

XML documents are required to be well balanced, making it easy to store them as trees. A popular technique for working with XML documents is to first parse them into a tree data structure, and then to operate on the tree. The Document Object Model (DOM), as well as Grove and Twig (see Resources), use this model. This enables a great deal of flexibility in dealing with the documents: the components of the document can be accessed in random order, rearranged, added, or removed.

Tree-based methodologies do have some drawbacks, however. They require the parsing of the entire XML document, as well as the creation of the tree data structure, before the processing and business logic take place. Since the tree data structure is generally stored in memory, these methods have much larger memory footprints than stream based methods. The problem is exacerbated by the fact that storing the document in memory as a tree takes several times as much storage as the original XML document did. For larger documents both of these can be significant -- the parsing and tree creation time become substantial, and the memory requirements can overrun the available resources.

Tree-based processing of XML documents will be discussed in a future article. The remainder of this article will use stream-based processing, as described above.



Tutorial Pages:
» Converting XML to HTML
» Simple substitution
» Function-based substitution
» Tree-based processing
» Active XML documents
» Storing tag contents
» Retrieving the rules
» Acting on the rules
» Next steps
» Resources


First published by IBM DeveloperWorks


 | Bookmark
Related Tutorials:
» Starting with XML
» Performing Client-Side XSL Transformations
» Create a Google Sitemap for your Web Site
» Parsing Comma-Separated Values
» XML Security Suite: Increasing the Security of E-Business
» Servlets and XML: Made for Each Other