Converting XML to HTML
For the purposes of this article, we will use a stock quote, expressed as XML, as our input file:
|
This simple encoding captures information typically found in a stock quote. The formatting demonstrates certain XML features, such as attributes and empty tags. The actual XML file used in this article contains several stock_quote elements, to form a portfolio of stocks.
This XML file was created using a script to convert the Spreadsheet Format stock quotes provided by the finance.yahoo.com Web site into XML.
Simple substitution
A simple method for transforming the XML source into HTML is to define pieces of HTML to be substituted for each XML tag. Using the popular XML::Parser module for Perl (see Resources), based on James Clark’s Expat parser, we can parse the XML document and define callback routines for performing the substitutions.
Here is a simple invocation of the XML::Parser:
|
This parses the given file, invoking the function start_handler each time a tag is started, and end_handler each time a tag is ended. The contents of the tag are processed by the char_handler function.
Given these callback functions, we can implement our simple substitution algorithm. First, we define a few substitutions:
|
And now we write the handlers to perform the substitutions:
|
The start_handler function simply prints the value to be substituted for the given tag. char_handler outputs the data it receives, which is the content of the tags. (The full program, with a few additions to handle attributes, is listed separately.) Running the program on our XML file, we get the following output:
|
The full output is available. Using this methodology, we can make simple XML to HTML transformations by defining substitutions.
Function-based substitution
Substitution-based transformations are easy to implement and understand, but don’t give us the ability to implement logic. We may want to take different actions based on the contents or attributes of a tag, or connect to a database to compare the contents of the tag with the stored value. We need more than simple, one-to-one substitutions; we need the ability to perform functions for each tag.
XML::Parser provides a method for invoking functions for each tag in the XML document. For each tag, the parsing module calls a function with the tag’s name. Thus we can define a set of functions that perform the transformations, connect to databases, and implement our business logic.
To enable the function callbacks based on tag names, we need to invoke the parser with the "Subs" style. We also need to specify which namespace the function callbacks reside in, via the "Pkg" option:
|
This will cause a series of function callbacks based on the tags and the contents of the XML file. The start of a tag will invoke a function with the same name as the tag in the SubHandlers namespace. The contents of the tag will be handled by the char_handler function, and the end of the tag will invoke a function with the same name as the tag, with an "_" appended (for example, for the end of the tag symbol, the function SubHandlers::symbol_() will be called).
Our XML file will cause the following sequence of function calls:
|
Now, it is a simple matter to write the transformation functions. We can still perform simple substitutions:
|
which use the stock symbol, contained in the contents of the symbol tag, to insert an image of the same name in the resulting HTML page. We can also implement more complicated logic:
|
The first parameter passed to the function is a handle to the parser itself, followed by the name of the tag (element), optionally followed by the tag attributes as attribute_name, attribute_value pairs. In the above case we print a different label based on the type attribute of the price tag.
The full program is available as a separate listing. Running the program on our XML file produces much more attractive output. A sample looks like:
|
Tree-based processing
The methodologies we have discussed so far are based on processing the XML document as a stream — in the course of parsing the file, handlers are called as each tag is encountered. This provides an efficient means of processing XML, both in terms of memory usage and processing time. Certain tasks, however, are somewhat difficult to do. Imagine, for example, needing to move or rearrange certain segments of the document, or sorting items within the document. Because we receive the document as a stream, we would need to store the components before sorting or rearranging them. A mechanism that would store the components automatically would make such tasks substantially easier.
XML documents are required to be well balanced, making it easy to store them as trees. A popular technique for working with XML documents is to first parse them into a tree data structure, and then to operate on the tree. The Document Object Model (DOM), as well as Grove and Twig (see Resources), use this model. This enables a great deal of flexibility in dealing with the documents: the components of the document can be accessed in random order, rearranged, added, or removed.
Tree-based methodologies do have some drawbacks, however. They require the parsing of the entire XML document, as well as the creation of the tree data structure, before the processing and business logic take place. Since the tree data structure is generally stored in memory, these methods have much larger memory footprints than stream based methods. The problem is exacerbated by the fact that storing the document in memory as a tree takes several times as much storage as the original XML document did. For larger documents both of these can be significant — the parsing and tree creation time become substantial, and the memory requirements can overrun the available resources.
Tree-based processing of XML documents will be discussed in a future article. The remainder of this article will use stream-based processing, as described above.
Active XML documents
Converting XML documents for display is a typical first task in working with XML, and serves as a good introduction to the machinery involved. The real power of XML, however, lies in its ability not only to transmit information, but also to trigger actions based on the transmitted information. We will examine a sample application that uses these active documents to implement simple stock trading rules.
The basic scenario is as follows: a stock quote service will periodically send an XML document with the latest prices and volume for our chosen stocks (in the format of the XML file we have been using thus far). Our application will decide whether to buy or sell based on the stock quotes and a set of rules stored in our database.
For this simple application we will only buy or sell, using the asking price and volume as the criteria. The price and volume will be received from the XML file, and the rules will be retrieved from a MySQL database (see Resources). The rules will be evaluated, and if buying or selling is required, the corresponding command will be issued.
Storing tag contents
In the earlier display-oriented applications, we only needed to output the tag contents. For our stock application, we need to access and store the contents of certain tags (such as price and volume) to compare them with the buy/sell criteria.
Our strategy for storing the contents of the tags using the stream-based processing model will be: prepare a storage place for the contents when the start of the tag is encountered, and store the contents of the tag via the char_handler function. Since the document is processed as a stream, first the start tag will be encountered, allowing us to set the stage for the storage of the contents. Next the contents of the tag will be encountered, and stored in their prepared location. Finally the end of the tag will be encountered, allowing any necessary cleanup and closeup of the storage location.
The storage location will be set up in the tag start function, and closed in the tag end function:
|
volume defines the store_contents variable, setting the storage location for the contents of the tag. volume_ subsequently undefines store_contents, making sure contents of other tags do not get stored in the same location.
char_handler needs to be modified to allow storage of the contents:
|
This checks if the variable store_contents is defined, and, if so, stores the data in the storage hash. The ::state namespace is used to separate the storage and state variables from the parser and handler namespaces.
Using this technique we can store the contents of the tags we are interested in. In the case of the price tag, the values of interest are expressed as attributes of the tag. We can store these as we encounter them:
|
Retrieving the rules
Our buy/sell rules are stored in the following table:
|
symbol is the stock symbol. field describes which field will be used in the criterion (in this case either price or volume). value is the value of field which would trigger an action. action describes the type of action to take (in this case either buy or sell).
Thus the following row from the rules table:
|
means if the price of the IBM stock is greater than 120, issue a buy order. And
|
means if the trading volume of Microsoft stock is over 65000000, issue a sell order.
Retrieving these rules from the database is a simple matter using the Perl DBI/DBD extensions. The connection to the database can be created at the start of processing, and kept open until the end. For each stock, the applicable rules can be retrieved by selecting from the rules tables based on the stock symbol.
The tag stock_quotes is the outermost tag, meaning its start will trigger the first handler callback, and its end the last. This provides the perfect place for establishing and closing the database connection.
|
The rules can be retrieved by selecting based on the stock symbol:
|
Acting on the rules
Each stock quote is contained within a stock_quote tag. By the time the end tag for stock_quote is reached, all of the necessary information has been stored (the stock symbol, price, and volume). Thus we can act on the rules in the stock_quote_ function:
|
The applicable rules for the given stock symbol are retrieved, and the comparison is performed. If the rule applies, the take_action function is called, which in this case is simply a stub.
The complete program is available as a separate listing, as well as the schema for creating the rules table. Running the program with the original XML file produces the following output:
|
Next steps
You can apply these techniques to larger projects, yielding fast and flexible XML-based systems. Solutions built using scripting languages as the transformation and command language — with high-performance C/C++-based parsers handling the parsing of the XML document — offer a best-of-breed approach. This approach provides the speed of lower level languages while providing the ease of scripting.





No Responses to “XML and Scripting Languages”