Helping ordinary people create extraordinary websites!
GET OUR NEWSLETTER
Your Email:
 

All about JAXP, Part 1

By Brett McLaughlin
2005-07-15


Starting with SAX

SAX is an event-driven methodology for processing XML. It consists of many callbacks. For example, the startElement() callback is invoked every time a SAX parser comes across an element's opening tag. The characters() callback is called for character data, and then endElement() is called for the element's end tag. Many more callbacks are present for document processing, errors, and other lexical structures. You get the idea. The SAX programmer implements one of the SAX interfaces that defines these callbacks. SAX also provides a class called DefaultHandler (in the org.xml.sax.helpers package) that implements all of these callbacks and provides default, empty implementations of all the callback methods. (You'll see that this is important in my discussion of DOM in the next section, Dealing with DOM.) The SAX developer needs only extend this class, then implement methods that require insertion of specific logic. So the key in SAX is to provide code for these various callbacks, then let a parser trigger each of them when appropriate. Here's the typical SAX routine:
  1. Create a SAXParser instance using a specific vendor's parser implementation.
  2. Register callback implementations (by using a class that extends DefaultHandler, for example).
  3. Start parsing and sit back as your callback implementations are fired off.

JAXP's SAX component provides a simple means for doing all of this. Without JAXP, a SAX parser instance either must be instantiated directly from a vendor class (such as org.apache.xerces.parsers.SAXParser), or it must use a SAX helper class called XMLReaderFactory (also in the org.xml.sax.helpers package). The problem with the first methodology is obvious: It isn't vendor neutral. The problem with the second is that the factory requires, as an argument, the String name of the parser class to use (that Apache class, org.apache.xerces.parsers.SAXParser, again). You can change the parser by passing in a different parser class as a String. With this approach, if you change the parser name, you won't need to change any import statements, but you will still need to recompile the class. This is obviously not a best-case solution. It would be much easier to be able to change parsers without recompiling the class.

JAXP offers that better alternative: It lets you provide a parser as a Java system property. Of course, when you download a distribution from Sun, you get a JAXP implementation that uses Sun's version of Xerces. Changing the parser -- say, to Oracle's parser -- requires that you change a classpath setting, moving from one parser implementation to another, but it does not require code recompilation. And this is the magic -- the abstraction -- that JAXP is all about.

Sneaky SAX developers
I'm hedging a bit. With a little clever coding, you can make a SAX application pick up the parser class to use from a system property or a properties file. However, JAXP gives you this same behavior without any work at all, so many of you are better off going the JAXP route.

A look at the SAX parser factory
The JAXP SAXParserFactory class is the key to being able to change parser implementations easily. You must create a new instance of this class (which I'll look at in a moment). After the new instance is created, the factory provides a method for obtaining a SAX-capable parser. Behind the scenes, the JAXP implementation takes care of the vendor-dependent code, keeping your code happily unpolluted. This factory has some other nice features, as well.

In addition to the basic job of creating instances of SAX parsers, the factory lets you set configuration options. These options affect all parser instances obtained through the factory. The two most commonly used options available in JAXP 1.3 are to set namespace awareness with setNamespaceAware(boolean awareness), and to turn on DTD validation with setValidating(boolean validating). Remember that once these options are set, they affect all instances obtained from the factory after the method invocation.

Once you have set up the factory, invoking newSAXParser() returns a ready-to-use instance of the JAXP SAXParser class. This class wraps an underlying SAX parser (an instance of the SAX class org.xml.sax.XMLReader). It also protects you from using any vendor-specific additions to the parser class. (Remember the discussion about the XmlDocument class earlier in this article?) This class allows actual parsing behavior to be kicked off. Listing 1 shows how you can create, configure, and use a SAX factory:

Listing 1. Using the SAXParserFactory



import java.io.OutputStreamWriter;
import java.io.Writer;

// JAXP
import javax.xml.parsers.FactoryConfigurationError;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParserFactory;
import javax.xml.parsers.SAXParser;

// SAX
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class TestSAXParsing {
public static void main(String[] args) {
try {
if (args.length != 1) {
System.err.println ("Usage: java TestSAXParsing [filename]");
System.exit (1);
}
// Get SAX Parser Factory
SAXParserFactory factory = SAXParserFactory.newInstance();
// Turn on validation, and turn off namespaces
factory.setValidating(true);
factory.setNamespaceAware(false);
SAXParser parser = factory.newSAXParser();
parser.parse(new File(args[0]), new MyHandler());
} catch (ParserConfigurationException e) {
System.out.println("The underlying parser does not support " +
" the requested features.");
} catch (FactoryConfigurationError e) {
System.out.println("Error occurred obtaining SAX Parser Factory.");
} catch (Exception e) {
e.printStackTrace();
}
}
}

class MyHandler extends DefaultHandler {
// SAX callback implementations from ContentHandler, ErrorHandler, etc.
}

In Listing 1, you can see that two JAXP-specific problems can occur in using the factory: the inability to obtain or configure a SAX factory, and the inability to configure a SAX parser. The first of these problems, represented by a FactoryConfigurationError, usually occurs when the parser specified in a JAXP implementation or system property cannot be obtained. The second problem, represented by a ParserConfigurationException, occurs when a requested feature is not available in the parser being used. Both are easy to deal with and shouldn't pose any difficulty when using JAXP. In fact, you might want to write code that attempts to set several features and gracefully handles situations where a certain feature isn't available.

A SAXParser instance is obtained once you get the factory, turn off namespace support, and turn on validation; then parsing begins. The SAX parser's parse() method takes an instance of the SAX HandlerBase helper class that I mentioned earlier, which your custom handler class extends. See the code distribution to view the implementation of this class with the complete Java listing (see Download). You also pass in the File to parse. However, the SAXParser class contains much more than this single method.

Working with the SAX parser
Once you have an instance of the SAXParser class, you can do a lot more than just pass it a File to parse. Because of the way components in large applications communicate, it's not always safe to assume that the creator of an object instance is its user. One component might create the SAXParser instance, while another component (perhaps coded by another developer) might need to use that same instance. For this reason, JAXP provides methods to determine the parser's settings. For example, you can use isValidating() to determine if the parser will -- or will not -- perform validation, and isNamespaceAware() to see if the parser can process namespaces in an XML document. These methods can give you information about what the parser can do, but users with just a SAXParser instance -- and not the SAXParserFactory itself -- do not have the means to change these features. You must do this at the parser factory level.

You also have a variety of ways to request parsing of a document. Instead of just accepting a File and a SAX DefaultHandler instance, the SAXParser's parse() method can also accept a SAX InputSource, a Java InputStream, or a URL in String form, all with a DefaultHandler instance. So you can still parse documents wrapped in various forms.

Finally, you can obtain the underlying SAX parser (an instance of org.xml.sax.XMLReader) and use it directly through the SAXParser's getXMLReader() method. Once you get this underlying instance, the usual SAX methods are available. Listing 2 shows examples of the various uses of the SAXParser class, the core class in JAXP for SAX parsing:

Listing 2. Using the JAXP SAXParser class



// Get a SAX Parser instance
SAXParser saxParser = saxFactory.newSAXParser();
// Find out if validation is supported
boolean isValidating = saxParser.isValidating();
// Find out if namespaces are supported
boolean isNamespaceAware = saxParser.isNamespaceAware();
// Parse, in a variety of ways
// Use a file and a SAX DefaultHandler instance
saxParser.parse(new File(args[0]), myDefaultHandlerInstance);
// Use a SAX InputSource and a SAX DefaultHandler instance
saxParser.parse(mySaxInputSource, myDefaultHandlerInstance);
// Use an InputStream and a SAX DefaultHandler instance
saxParser.parse(myInputStream, myDefaultHandlerInstance);
// Use a URI and a SAX DefaultHandler instance
saxParser.parse("http://www.newInstance.com/xml/doc.xml",
myDefaultHandlerInstance);
// Get the underlying (wrapped) SAX parser
org.xml.sax.XMLReader parser = saxParser.getXMLReader();
// Use the underlying parser
parser.setContentHandler(myContentHandlerInstance);
parser.setErrorHandler(myErrorHandlerInstance);
parser.parse(new org.xml.sax.InputSource(args[0]));

Up to this point, I've talked a lot about SAX, but I haven't unveiled anything remarkable or surprising. JAXP's added functionality is fairly minor, especially where SAX is involved. This minimal functionality makes your code more portable and lets other developers use it, either freely or commercially, with any SAX-compliant XML parser. That's it. There's nothing more to using SAX with JAXP. If you already know SAX, you're about 98 percent of the way there. You just need to learn two new classes and a couple of Java exceptions, and you're ready to roll. If you've never used SAX, it's easy enough to start now.



Tutorial Pages:
» XML processing toolkit facilitates parsing and validation
» JAXP: API or abstraction?
» Starting with SAX
» Dealing with DOM
» Performing validation
» Changing the parser
» Summary
» Resources


First published by IBM developerWorks


 | Bookmark
Related Tutorials:
» Make Database Queries Without the Database
» Load List Values for Improved Efficiency
» 2 Ways To Implement Session Tracking
» A Simple Way to Read an XML File in Java
» Develop Aspect-Oriented Java Applications with Eclipse and AJDT
» Java Validation With Dynamic Proxies

Advertise with Us!


Tutorials Scripts Web Hosting Developer Manuals
Resources