How to make sure information gets to the right people
All Knowledge Management solutions face the challenge of putting the right information in front of the right people. It’s possible to confront this challenge with the right technology. Todd Sundsted demonstrates how to use Java technology, the Java Message Service (JMS) API, and XML to build a messaging infrastructure that routes messages based on their content
Information comes in many forms. It may be physical — paper-based forms and memoranda — or purely electronic — e-mail, e-fax, USENET and knowledge-base postings, and other archives. And information covers many domains, from change notifications, status reports and updates, and process changes to casual communication.
Much of the electronic information is conveniently packaged, transferred, and stored in the form of a message. A key limitation of this message-based electronic information is the way it is addressed. Most of this information is either addressed to specific persons or, in the case of most general knowledge-base information, is addressed to no one in particular.
What happens if the specified address is no longer valid? Perhaps the intended recipient is no longer with the company. Presumably someone still handles their function. Who should handle the message now? What if others besides the intended recipient are interested in the message’s content? What if no destination is specified?
One possible solution to these problems is content-based routing. Content-based routing seeks to route messages, not by a specified destination, but by the actual content of the message itself. In a typical application, a message is routed by opening it up and applying a set of rules to its content to determine the parties interested in its content.
Content-based routing and filtering networks are extremely flexible and very powerful. When built upon established technologies such as MOM (Message Oriented Middleware), JMS (Java Message Services), and XML (Extensible Markup Language) they are also reasonably easy to implement.
In this article I will demonstrate how to use Java technology, the JMS API, and XML to build a messaging infrastructure that routes messages based on their content.
Messaging and the JMS API
I’m going to assume you’re familiar with the basic functionality, capabilities, and behavior of MOM products and generic APIs such as JMS. If you’re not, I suggest reading the articles mentioned in the Resources. I’m also going to assume you’re familiar with XML and its uses. If not, the same advice applies.
MOM products and content-based routing are not a perfect match. Most MOM products, and JMS in particular, do not support content-based routing out-of-the-box. As you will see, bridging the gap is merely a matter of programming.
A simple content-based routing system with agents
The figure below illustrates a simple content-based routing design.
This content-based routing system is built around two principal types of entities: routers (of which there is usually one) and agents (of which there is usually more than one).
Agents are the ultimate consumers of messages. An agent typically works on behalf of a user. An agent is responsible for publishing a user’s interests with the routers of the system, and forwarding messages it receives from routers to the user.
Routers, as their name suggests, route messages. They examine the content of the messages they receive, apply rules to that content, and forward the messages as the rules dictate.
In addition to routers and agents, a system may also include harvesters. Although this article will not discuss them in detail, harvesters specialize in finding interesting information, packaging it up as a formatted message, and sending it to a router. Harvesters mine many sources of information including mail transfer agent (Sendmail or Microsoft Exchange Server) message stores, news servers, databases, and other legacy systems. Harvesting is an interesting subject in its own right, but I won’t say more about it in this article.
The system described above is built on top of JMS. The JMS API provides a feature-rich substrate for building distributed message-based applications. Like most middleware, it provides functionality that would be challenging to write by hand.
At the heart of the system, represented in the figure by the oval, is the message broker. It manages multiple message queues on behalf of both the routers and the agents. It ensures that messages are delivered reliably and promptly to the interested entities.
Although JMS itself doesn’t care, both the agents and the routers expect messages to be well-formed XML. The routers in particular use their understanding of the structure of well-formed XML to examine the messages for meaningful content.
XML and XPath
XML is, as I’m sure most of you know, an excellent language for representing many kinds of structured information. Its use as a message format allows us to leverage an existing body of XML-related technology to implement the rule and routing functions.
Although XPath has been a W3C recommendation since November 1999, most people aren’t familiar with its role and syntax. Therefore, I’ll stop for a moment to explain its use.
Recall that an XML document is logically a tree of elements. Because it’s acyclic, starting from the root node of the tree, there is only one path to any other node of the tree. XPath provides a simple, clean language for specifying and selecting subsets of the nodes in the tree based on the path to those nodes from the root.
The following examples, while not nearly indicative of the power of XPath, will give you a feel for the syntax:
- Select all elements of type CHILD that are children of the root element ROOT:
- Select all elements of type NODE that appear anywhere in the XML document:
- Select the first element of type CHILD that is a child of the root element ROOT:
- Select all elements of type NODE that have an attribute named attr:
- Select all elements of type NODE that have an attribute named attr with the value ‘x’:
- Select all elements the name of which starts with the letter ‘X’:
To route the messages that arrive at the router, the router needs information in the form of rules that specify who is interested in what types of information.
The simplest technology for useful rules might be regular expressions. However, since the messages use XML, a better solution is to use XPath notation. The result is almost as easy to implement.
Let’s consider a typical rule. A router rule comprises two pieces of information: an XPath string to be matched against received XML documents, and a destination specifying where to send the XML document if the XPath expression is satisfied.
The router applies the rules it has on hand to the XML documents it receives, and passes those that match to the appropriate destination.
Let’s close by taking a look at parts of the code for the agent and the router classes. The complete source is available for download (see Resources).
The following code contains the functional component of the router class.
Routers listen for JMS messages on two queues: a control queue, which is used by agents to register themselves and their interests with the router, and a data queue, which is used by harvesters to send messages to the router.
StartService() method above is called by the router’s
main() method after the JMS layer has been initialized. It starts two services: one that manages the control queue and one that manages the data queue.
The following code contains the functional component of the agent class.
Agents are pretty simple. The method above is at the heart of an agent. It is called by the class’s
main() method after the basic JMS layer has been initialized. When an agent is started, it must be given a rule template, in the form of an XPath expression.
While not a complete solution from beginning to end, the design presented here is a reasonable example of the use of the JMS API and XML in a content-based message routing system. By building the system on top of Java technology, the JMS API, and XML, a working system can be built with a minimum of programming effort.