Thinking XML: State of the Art in XML Modeling
By Uche Ogbuji2005-05-16
Formal schemata, informal transparency
One common misconception about XML is that if you just define a schema, others will know how to process the XML instances and interoperate with your system. This may be true, depending on how the schema is authored, but generally not as a result of features of the schema language itself. Listing 1 is a sample RELAX NG schema (compact syntax) snippet:
Listing 1. Sample RELAX NG schema using annotations to provide semantic clues
|
For those not familiar with RELAX NG, the first line is a namespace declaration for Dublin Core, which is a popular vocabulary for metadata elements such as titles, descriptions, attributions, and other library-like properties. The second line defines an element named purchase-order. The line beginning dc:description is an annotation using the namespace prefix declared earlier to indicate that the intent of the annotation is to provide information that conforms to the Dublin Core description element. The next four lines define an attribute named id, with a plain text value. This attribute definition has an annotation of its own, giving the intended meaning of the attribute. The line after all that is a comment. Notice that in this example I use annotations to provide information that's important to understanding the semantics of the schema, whereas I use the comment to convey incidental information. An example of a document that conforms to this schema is: .
If Listing 1 is the purchase order schema that Acme Organization comes up with, then Zenith Organization, acting separately, might come up with the schema in Listing 2.
Listing 2. Sample RELAX NG schema similar to Listing 1 |
Notice that the annotations are similar, but the actual element and attribute names are different. A corresponding example document might be: . A person can look at the two schemata above and recognize from the annotations the equivalence of the purchase-order element in one to the po element in the other, and the id attribute in one to the number attribute in the other. In this way, semantic transparency is achieved through informal means. A person has to use imprecise natural language skills to make sense of the annotations, rather than some strict and unambiguous definition.
The problem is scalability of this process. The above example has simple, one-to-one mappings between data elements in the two vocabularies, and annotations that you can readily compare in a casual reading. More realistic situations involve more complex schemata with less predictable mappings and subtler differences in annotations and other such informal descriptions. In such cases, it might be very difficult to achieve semantic transparency through natural language schema annotations.
DTDs do not provide directly for annotation, but other popular schema languages do: RELAX NG, W3C XML Schema (WXS), and Schematron. In these languages, you can structure annotations themselves for machine consumption, providing more reliable routes to semantic transparency; I'll cover some such techniques in future articles. Unfortunately, such techniques are not very well taught, discussed, or even analyzed, partly because many people involved with XML mistakenly believe that semantic transparency is not a pressing concern, or that it is something that XML in itself already provides for. In my own biased view, one particular distraction has interfered with the focus on semantic transparency.
Tutorial Pages:
» What do developers need to know about the various approaches to semantic transparency?
» Formal schemata, informal transparency
» A prominent red herring
» Wrap-up
» Resources
First published by IBM DeveloperWorks
