Helping ordinary people create extraordinary websites!

All about JAXP, Part 1

By Brett McLaughlin
2005-07-15

JAXP: API or abstraction?
Strictly speaking, JAXP is an API, but it is more accurately called an abstraction layer. It doesn't provide a new means of parsing XML, nor does it add to SAX or DOM, or give new functionality to Java and XML handling. (If you're in disbelief at this point, you're reading the right article.) Instead, JAXP makes it easier to use DOM and SAX to deal with some difficult tasks. It also makes it possible to handle some vendor-specific tasks that you might encounter when using the DOM and SAX APIs, in a vendor-neutral way.

Going bigtime
In earlier versions of the Java platform, JAXP was a separate download from the core platform. With Java 5.0, JAXP has become a staple of the Java language. If you've got the latest version of the JDK (see Resources), then you've already got JAXP.

Without SAX, DOM, or another XML parsing API, you cannot parse XML. I have seen many requests for a comparison of SAX, DOM, JDOM, and dom4j to JAXP, but making such comparisons is impossible because the first four APIs serve a completely different purpose from JAXP. SAX, DOM, JDOM, and dom4j all parse XML. JAXP provides a means of getting to these parsers and the data that they expose, but doesn't offer a new way to parse an XML document. Understanding this distinction is critical if you're going to use JAXP correctly. It will also most likely put you miles ahead of many of your fellow XML developers.

If you're still dubious, make sure you have the JAXP distribution (see Going bigtime). Fire up a Web browser and load the JAXP API docs. Navigate to the parsing portion of the API, located in the javax.xml.parsers package. Surprisingly, you'll find only six classes. How hard can this API be? All of these classes sit on top of an existing parser. And two of them are just for error handling. JAXP is a lot simpler than people think. So why all the confusion?

Sitting on top of the world
Even JDOM and dom4j (see Resources), like JAXP, sit on top of other parsing APIs. Although both APIs provide a different model for accessing data from SAX or DOM, they use SAX internally (with some tricks and modifications) to get at the data they present to the user.

Sun's JAXP and Sun's parser
A lot of the parser/API confusion results from how Sun packages JAXP and the parser that JAXP uses by default. In earlier versions of JAXP, Sun included the JAXP API (with those six classes I just mentioned and a few more used for transformations) and a parser, called Crimson. Crimson was part of the com.sun.xml package. In newer versions of JAXP -- included in the JDK -- Sun has repackaged the Apache Xerces parser (see Resources). In both cases, though, the parser is part of the JAXP distribution, but not part of the JAXP API.

Think about it this way: JDOM ships with the Apache Xerces parser. That parser isn't part of JDOM, but is used by JDOM, so it's included to ensure that JDOM is usable out of the box. The same principle applies for JAXP, but it isn't as clearly publicized: JAXP comes with a parser so it can be used immediately. However, many people refer to the classes included in Sun's parser as part of the JAXP API itself. For example, a common question on newsgroups used to be, "How can I use the XMLDocument class that comes with JAXP? What is its purpose?" The answer is somewhat complicated.

What's in a (package) name?
When I first cracked open the source code to Java 1.5, I was surprised at what I saw -- or rather, at what I did not see. Instead of finding Xerces in it's normal package, org.apache.xerces, Sun relocated the Xerces classes to com.sun.org.apache.xerces.internal. (I find this a little disrespectful, but nobody asked me.) In any case, if you're looking for Xerces in the JDK, that's where it is.

First, the com.sun.xml.tree.XMLDocument class is not part of JAXP. It is part of Sun's Crimson parser, packaged in earlier versions of JAXP. So the question is misleading from the start. Second, a major purpose of JAXP is to provide vendor independence when dealing with parsers. With JAXP, you can use the same code with Sun's XML parser, Apache's Xerces XML parser, and Oracle's XML parser. Using a Sun-specific class, then, violates the point of using JAXP. Are you starting to see how this subject has gotten muddied? The parser and the API in the JAXP distribution have been lumped together, and some developers mistake classes and features from one as part of the other, and vice versa.

Now that you can see beyond all the confusion, you're ready to move on to some code and concepts.





Tutorial pages:

First published by IBM developerWorks


 2 Votes

You might also want to check these out:


Leave a Comment on "All about JAXP, Part 1"
You must be logged in to post a comment.

Link to This Tutorial Page!


GET OUR NEWSLETTERS