Helping ordinary people create extraordinary websites!
HOME TUTORIALS SCRIPTS WEB HOSTING BLOG FORUM
Get Our Newsletter
Email:

Working XML: Fundamentals of Web publishing with XML

By Benoit Marchal
2004-06-28


Getting started

To get started, download Eclipse and the XM plug-in for Eclipse (see Resources for links). XM is a project of the Working XML column that enhances Eclipse to support Web publishing with XML and XSL. XM is also available as standalone software that is ideal for batch processing. To prepare this column, I used Eclipse 2.1 and XM 0.9. Follow the instructions on the Eclipse and XM Web sites to install the software.

Launch Eclipse, then click Project from the File > New menu. In the dialog box that opens (see Figure 1), select ananas.org and XM Project, then click Next. Enter a project name, such as mysite, then click Finish.

Figure 1. Creating a new project


The new project appears in the navigator. When you open the project, you see that it contains three directories: publish, rules, and src, as shown in Figure 2. If you don't see the navigator, click Navigator from the Window > Show View menu.

Figure 2. The new project in the navigator


Source directory
The src (source) directory holds your XML documents as well as your images and other support files. The plug-in creates a sample file to get you started. You should edit it to insert your own content and add as many other XML files as needed. Every XML file in the src directory becomes an HTML page on the Web site.

The XML editors section introduces the tools to write XML documents. For the time being, just open the XML document in a text editor, such as Eclipse. The sample document uses a simplified version of DocBook with the following tags:

• article: The root of the document
• articleinfo: Contains bibliographical information
• sect1: A document section
• sect1info: Contains the section title
• title: May appear under articleinfo or sect1info as a title
• copyright: Holds the copyright information as one or more year tags and one holder tag
• simpara: A paragraph
• ulink: A hyperlink

You can use other tags, but you need to edit the stylesheet accordingly.

As I mentioned, the sample document is derived from DocBook. However, it uses a different namespace to indicate it's not the real thing. DocBook is a standard vocabulary for technical documentation. It was originally developed by O'Reilly and it is maintained by OASIS, an international association of XML users.

You might find DocBook is a good choice to get started because it's available, it works, it's a standard, and it's popular (mostly because it's popular). Hundreds of existing XML tools work with DocBook -- obviously, more tools on the market means less work for you.

Other popular XML vocabularies for Web sites include NewsML from the International Press and Telecommunication Council (IPTC), the Web page DTD from Norman Walsh (Norman Walsh also maintains the DocBook vocabulary), and the Apache Cocoon DTD.
DTD or schema?
Should you use DTDs or schemas? In practice, it does not really matter. Both validate your document against a given vocabulary. Schemas offer more control than DTDs, but the new features have been introduced primarily for e-business and are less important in a publishing application. Since modern editors work equally well with DTDs and schemas, you can use whichever you like best.

Rules directory
The rules directory contains the stylesheets. Most Web sites need only one stylesheet. The XM plug-in for Eclipse applies the default.xsl stylesheet to every document, unless it is told otherwise. Consequently, if your site has only one stylesheet, save it as rules/default.xsl. If your site needs more stylesheets, save them under rules and add the following processing instructions to those documents that do not use the default:

<?xml-stylesheet href="listing.xsl" type="text/xsl"?>

Beware! The processing instruction needs both parameters: href points to the stylesheet (you can just enter the file name -- the XM plug-in automatically looks under the rules directory), and type must have the text/xsl value. Also remember that the processing instruction applies to documents (in the src directory), not to stylesheets (in the rules directory).

Publish directory
Last but not least is the publish directory, where the plug-in generates your Web site. Your next step is to upload the content of this directory to your Web server.

Another warning: You should never try to edit or modify the files in the publish directory. If you're not happy with a Web page, change the XML document (in the src directory) or the stylesheet (in the rules directory), but never try to edit anything in the publish directory. Your goal is to automate publishing chores -- editing the site directly defeats that goal. Furthermore, the plug-in may overwrite your changes the next time it regenerates the site.

Tutorial Pages:
» Fundamentals of Web publishing with XML
» Getting started
» Enhancing the site
» Moving forward


First published by IBM developerWorks


 | Bookmark
Related Tutorials:
» Starting with XML
» Performing Client-Side XSL Transformations
» Create a Google Sitemap for your Web Site
» XML and Scripting Languages
» Parsing Comma-Separated Values
» XML Security Suite: Increasing the Security of E-Business

Ask A Question
characters left.