All about JAXP, Part 1
By Brett McLaughlin
2005-07-15
Dealing with DOM
If you think you need to take a break to gear up for the challenge of DOM, you can save yourself some rest. Using DOM with JAXP is nearly identical to using it with SAX; all you do is change two class names and a return type, and you are pretty much there. If you understand how SAX works and what DOM is, you won't have any problem.
The primary difference between DOM and SAX is the structures of the APIs themselves. SAX consists of an event-based set of callbacks, while DOM has an in-memory tree structure. With SAX, there's never a data structure to work on (unless the developer creates one manually). SAX, therefore, doesn't give you the ability to modify an XML document. DOM does provide this functionality. The org.w3c.dom.Document class represents an XML document and is made up of DOM nodes that represent the elements, attributes, and other XML constructs. So JAXP doesn't need to fire SAX callbacks; it's responsible only for returning a DOM Document object from parsing.
A look at the DOM parser factory
With this basic understanding of DOM and the differences between DOM and SAX, you don't need to know much more. The code in Listing 3 looks remarkably similar to the SAX code in Listing 1. First, a DocumentBuilderFactory is obtained (in the same way that SAXParserFactory was in Listing 1). Then the factory is configured to handle validation and namespaces (in the same way that it was in SAX). Next, a DocumentBuilder instance, the analog to SAXParser, is retrieved from the factory (in the same way . . . you get the idea). Parsing can then occur, and the resultant DOM Document object is handed off to a method that prints the DOM tree:
Listing 3. Using the DocumentBuilderFactory
import java.io.File;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.Writer;
// JAXP
import javax.xml.parsers.FactoryConfigurationError;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
// DOM
import org.w3c.dom.Document;
import org.w3c.dom.DocumentType;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
public class TestDOMParsing {
public static void main(String[] args) {
try {
if (args.length != 1) {
System.err.println ("Usage: java TestDOMParsing " +
"[filename]");
System.exit (1);
}
// Get Document Builder Factory
DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();
// Turn on validation, and turn off namespaces
factory.setValidating(true);
factory.setNamespaceAware(false);
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(new File(args[0]));
// Print the document from the DOM tree and
// feed it an initial indentation of nothing
printNode(doc, "");
} catch (ParserConfigurationException e) {
System.out.println("The underlying parser does not " +
"support the requested features.");
} catch (FactoryConfigurationError e) {
System.out.println("Error occurred obtaining Document " +
"Builder Factory.");
} catch (Exception e) {
e.printStackTrace();
}
}
private static void printNode(Node node, String indent) {
// print the DOM tree
}
}
|
Two problems can arise with this code (as with SAX in JAXP): a FactoryConfigurationError and a ParserConfigurationException. The cause of each is the same as it is with SAX. Either a problem is present in the implementation classes (resulting in a FactoryConfigurationError), or the parser provided doesn't support the requested features (resulting in a ParserConfigurationException). The only difference between DOM and SAX in this respect is that with DOM you substitute DocumentBuilderFactory for SAXParserFactory, and DocumentBuilder for SAXParser. It's that simple. (You can view the complete code listing, which includes the method used to print out the DOM tree; see Download.)
Working with the DOM parser
Once you have a DOM factory, you can obtain a DocumentBuilder instance. The methods available to a DocumentBuilder instance are very similar to those available to its SAX counterpart. The major difference is that variations of the parse() method do not take an instance of the SAX DefaultHandler class. Instead they return a DOM Document instance representing the XML document that was parsed. The only other difference is that two methods are provided for SAX-like functionality:
setErrorHandler(), which takes a SAX ErrorHandler implementation to handle problems that might arise in parsing setEntityResolver(), which takes a SAX EntityResolver implementation to handle entity resolution
Listing 4 shows examples of these methods in action:
Listing 4. Using the JAXP DocumentBuilder class
// Get a DocumentBuilder instance
DocumentBuilder builder = builderFactory.newDocumentBuilder();
// Find out if validation is supported
boolean isValidating = builder.isValidating();
// Find out if namespaces are supported
boolean isNamespaceAware = builder.isNamespaceAware();
// Set a SAX ErrorHandler
builder.setErrorHandler(myErrorHandlerImpl);
// Set a SAX EntityResolver
builder.setEntityResolver(myEntityResolverImpl);
// Parse, in a variety of ways
// Use a file
Document doc = builder.parse(new File(args[0]));
// Use a SAX InputSource
Document doc = builder.parse(mySaxInputSource);
// Use an InputStream
Document doc = builder.parse(myInputStream, myDefaultHandlerInstance);
// Use a URI
Document doc = builder.parse("http://www.newInstance.com/xml/doc.xml");
|
If you're a little bored reading this section on DOM, you're not alone; I found it a little boring to write because applying what you've learned about SAX to DOM is so straightforward.
Tutorial Pages:
»
XML processing toolkit facilitates parsing and validation
»
JAXP: API or abstraction?
»
Starting with SAX
» Dealing with DOM
»
Performing validation
»
Changing the parser
»
Summary
»
Resources
First published by IBM developerWorks
|
