//, XML/Parsing XML using PHP4

Parsing XML using PHP4

Parsing XML using PHP4

This tutorial will explain how to parse (that is, read and interpret) information from a XML file using PHP. I will discuss the very basics of XML (mainly structure), and then jump right in to the reading and parsing of XML files. This is not a tutorial on XML itself, just parsing XML with PHP.

First, the pre-requisites. Stuff you will need to ensure you have running.

  • Web server with PHP installed. No special extensions are needed for this tutorial.
  • Ability to save files on the webserver (either upload, or save directly via FTP)
  • A decent text editor. Preferrably one that has syntax highlighting support for PHP

XML File Structure

eXtensible Markup Language or XML as its commonly called is primarily used to facilitate the interchage of information between environments that are not compatible natively (that is, they don’t support each other’s default file format). An example here would be a database server that doesn’t import Access files and has its own propriety data format.

The key word in XML is extensible. This means that the structure of the file is left entirely up to the creator. There are simply a few rules that you must follow to create XML (one such rule being that there can be only one root element). Other than that, the end user has free reign as to the tags that he may use, attributes, etc. One more thing about XML that we must understand is that there is no tag set for XML. Like HTML, which has a set of tags (<p>, etc.), XML has no pre-set tags. It is up to the end user to define tags for a file. In XML, a tag can be almost anything :

XML:

  1. <jim>info about jim</jim>
  2. <address type="home">555 Main Street</address>

A few more rules about XML and we will be on our way to our PHP code. XML documents must be well-formed. This means that there can be only one root element (the top most element), all child elements must be nested properly <p>foo<b>bar</b></p> not <p>foo<b>bar</p></b>, and all elements must have end tags.

In XML, an element is also referred to as a node.

Once these simple rules are understood, we can start creating a XML document that a XML parser will understand.

Creating our XML File

Almost all the XML examples that I’ve seen use an address book example, but just to be different (and to make things interesting), we are going to create a XML file that contains information about the images in a folder and we will use this to create a very simple gallery.

The first step is to decide what information we want to store about the image. There are the usual suspects (the file name, the name of the image, the size) and we want to make sure this information is in our XML file. As I mentioned above, all XML documents contain a root tag. This will be the tag that will start and end our XML file. Lets call this tag <imageinfo>

The next step is to decide on what attributes a tag will have. An attribute is a property of the tag. For example, the <img> tag in HTML has the attribute src which tells user agent (browser) where to find the image.

For our example, we will create a size tag and give it the attributes width and height.

Most of the time developers are more concerned with parsing XML files rather than writing them, however, knowing the basics of what to expect in a XML file helps when trying to debug a parser.

Let write our very basic XML file :

XML:

  1. < ?xml version="1.0"?>
  2. <imageinfo>
  3.     <image>
  4. <filename>AmericanPie.jpg</filename>
  5. <size width="300" height="300" />
  6. <name>Mmm…Pie</name>
  7. </image>
  8. <image>
  9. <filename>Cantaloupe.jpg</filename>
  10. <size width="300" height="300" />
  11. <name>Cantaloupe</name>
  12. </image>
  13. <image>
  14. <filename>CitrusSlices.jpg</filename>
  15. <size width="300" height="300" />
  16. <name>Citrus For Summer</name>
  17. </image>
  18. </imageinfo>

< ?xml version="1.0"?> must be the first line in a XML file. It is called the xml declaration and identifies the file as a XML file to a parser.

Parsing with PHP

< ?xml version="1.0"?> must be the first line in a XML file. It is called the xml declaration and identifies the file as a XML file to a parser.

Parsing with PHP

The PHP engine comes with built-in functions to enable XML parsing using the expat library written by James Clark.
These functions allow us to create our own XML parser. The XML functions we will use (linked to their php manual references) are :

Other (non xml) PHP function that we will use are :

Creating our Parser

The first step is to create and setup our parser. The xml_parser_create() function will create the parser for us, and return us a handle to that parser. We will then have to setup the different handlers so that the parser knows what to do with each type of information (be it an opening tag, a closing tag, stuff between the tag, etc). Lets first check to make sure that we can create our parser, which is perhaps the least confusing line of code :

PHP:

  1. if (! ($xmlparser = xml_parser_create()) )
  2. {
  3. die ("Cannot create parser");
  4. }

This code simply checks to see if we can create a parser. It will quit with an appropriate message, since if we can’t create the parser, there is no use in going any further. If your script quits here with the error message, then your PHP installation isn’t setup with the expat library. Most Unix/Linux based servers have the expat library as part of their PHP install. Check the xml reference section of the PHP manual for instructions on installing the expat library. Alternately, you can also send a support request to your host/ISP’s help desk.

Once we have created our parser, it is time to configure it to handle our XML file. The xml_set_element_handler() function takes three arguments. The first one is a handle to our xml_parser (which is $xmlparser). The next argument is the name of a function that the parser will call when it finds an open tag, and the last argument is the name of a function that the parser will call when it reaches an ending tag. We are going to write the functions that will be called for each open and close tag. Sounds scary, but it really is very straightforward.

Setting up tag handlers

First, lets write out function that will be called for an open tag. The name of the function can be any valid PHP function name. The function must accept three arguments, and they must be $parser, $name, $attribs.

$parser = handle to our parser
$name = name of the current tag
$attrib = an array containing any attributes of the current tag

We don’t have to worry about calling the function, the parser does that automatically as it goes through our XML file. With that in mind, lets write our start tag function, which we will call start_tag (how creative, I know).

PHP:

  1. function start_tag($parser, $name, $attribs) {
  2.    echo "Current tag : ".$name."<br />";
  3.    if (is_array($attribs)) {
  4.       echo "Attributes : <br />";
  5.       while(list($key,$val) = each($attribs)) {
  6.          echo "Attribute ".$key." has value ".$val."<br />";
  7.        }
  8.     }
  9. }

Next, we will write our function that will be called when an ending tag is reached. This function, like our opening tag function, can be of any name that’s valid in PHP. The ending tag function must take these parameters $parser, $name.

$parser = handle to our parser
$name = name of the current tag

Lets write our ending tag function (which we will call end_tag):

PHP:

  1. function end_tag($parser, $name) {
  2.    echo "Reached ending tag ".$name."<br /><br />";
  3. }

We have now taken care of all the requirements for the xml_set_element_handler function, and now we can call it :

PHP:

  1.  
  2. xml_set_element_handler($xmlparser, "start_tag", "end_tag");
  3.  

Setting up content (data) handlers

We have taken care of our starting and ending tags, so now we must deal with the acutal content of a tag. The xml_set_character_data_handler function sets up the character data handling functions for the parser. Since we know that our data is going to be character based, we will use this function. There are different xml_set functions for different types of data. You can view the list of different data handler functions in the php manual.

The xml_set_character_data_handler function takes two arguments. One is a handle to the parser, and the other is the name of the function to call for character data. Like the opening and closing tag functions, we have to write the character data handling function. Our function must accept these two arguments $parser, $data :

PHP:

  1. function tag_contents($parser, $data) {
  2.    echo "Contents : ".$data."<br />";
  3. }

Once the function is written, we can setup the parser to use it :

PHP:

  1. xml_set_character_data_handler($xmlparser, "tag_contents");

Our functions will just print out the information about our tag. We will later modify them so that we can acutally do something useful with our information. At this stage we just want to check to make sure that our parser is working correctly.

Starting up the parser

Now that the parser is setup and configured, we are ready to feed it our XML file and let it parse the information. This is the complicated part of the program, so extra attention is requested.

The first step is to open the xml file :

PHP:

  1. $filename = "sample.xml";
  2. if (!($fp = fopen($filename, "r"))) { die("cannot open ".$filename); }

This simple code will check to see if our program can open the file or not. It will quit with an appropriate message if it cannot.

Once the file is open, we must read it and feed it to the XML parser. One thing we are going to do before we send the file to the XML parser is we are going to get rid of any whitespace using a regular expression and the eregi_replace function :

PHP:

  1. while ($data = fread($fp, 4096)){
  2.    $data=eregi_replace(">"."[[:space:]]+"."< ",">< ",$data);
  3.    if (!xml_parse($xmlparser, $data, feof($fp))) {
  4.       $reason = xml_error_string(xml_get_error_code($xmlparser));
  5.       $reason .= xml_get_current_line_number($xmlparser);
  6.       die($reason);
  7.    }
  8. }
  9. xml_parser_free($xmlparser);

Lets step through this code :

  1. The fread() function reads the data from the xml file (given by the $fp handle), and stores it in $data.
  2. We use the eregi_replace function to get rid of the whitespace in $data
  3. We then check to see if the data was parsed or not, if it isn’t, we use the built-in xml error reporting functions to print out an informative error message.
  4. At the end, we free the parser (destory it)

Once we have verified that our parser is working properly, we are ready to actually do something with the data.

Creating the gallery

Now that we have verified that our parser works, we are ready to modify our parser to actually make use of our information. In order to do this, we only have to deal with our custom functions that handle the data.

Lets print out a nice little gallery using our images. Our gallery will just print the image with its dimentions, and a caption that is the name of the image. I will type out the modified functions, and then explain the code :

PHP:

  1. $current = "";
  2. function start_tag($parser, $name, $attribs) {
  3.    global $current;
  4.    $current = $name;
  5.    if ($name == "IMAGEINFO") { echo "<table border=\"1\" width=\"50%\">"; }
  6.    if ($name == "IMAGE") { echo "<tr><td>"; }
  7.    if ($name == "FILENAME") { echo "<img src=\""; }
  8.    if ($name == "NAME") { echo "</div/></td></tr><tr><td><div align=\"center\">"; }
  9.    if ($name == "SIZE") {
  10.    if (is_array($attribs)) {
  11.       while(list($key,$val) = each($attribs)) {
  12. echo strtolower($key)."=\"".$val."\"";
  13.       }
  14.    }
  15. }
  16. function end_tag($parser, $name) {
  17.    if ($name == "NAME") { echo "</div></td></tr>"; }
  18.    if ($name == "FILENAME") { echo " />"; }
  19.    if ($name == "IMAGEINFO") { echo "</table>"; }
  20. }
  21. function tag_contents($parser, $data) {
  22.    global $current;
  23.    if ($current == "FILENAME") { echo $data; }
  24.    if ($current == "NAME") { echo $data; }
  25. }

Since the tag_contents() functions doesn’t get the name of the current tag from the parser, we have to manually provide it that information. In our start_tag() function, we set a global variable $current to the current tag name. The rest of the code is just checks to see which tag we are on, and print out the appropriate tags.

That’s it! Now you have a "skeleton" parser that you can modify to use with XML files (such as RSS feeds).

Notes

You’ll note that I am comparing tag names in upper case. The parser by default converts all tags to upper case. This behavior can be changed by passing arguments to the xml_parser_create() function.

2010-05-25T23:09:51+00:00 August 16th, 2005|PHP, XML|0 Comments

About the Author:

Leave A Comment