Helping ordinary people create extraordinary websites!
HOME TUTORIALS SCRIPTS WEB HOSTING BLOG FORUM
Get Our Newsletter
Email:

Parsing Comma-Separated Values

By Doug Tidwell
2005-05-18


Parsing the comma-separated value (CSV) file

After I had the data file, I started looking around for some code to parse comma-separated values. I didn't find exactly what I wanted, but I did stumble across the Java class StreamTokenizer. This class lets you design a rudimentary parser fairly easily. You select the delimiter between tokens, and it parses the file, converts data to strings and integers, and does some other nice things. You can view the code I wrote or download) it.
Color-coding our colorful coding

This article features colorized code listings, something we're experimenting with here at dW. To generate our color-coded listings, I'm using a couple of open-source tools. First, I load the document (Java, HTML, XML, whatever) into Emacs. Emacs defines colors for keywords, comments, function names, and other programming language constructs -- about a dozen in all. After Emacs has loaded and colored a file, I use the HTMLize package, an open-source utility written in the ever-popular Emacs Lisp language. HTMLize takes a listing exactly as it appears in Emacs, then converts it to HTML. The result is a fully color-coded file that highlights keywords, comments, function names, and so on.

Let us know what you think about these new and improved code listings.

If you'd like to do this kind of thing yourself, see Resources for the appropriate links.

To use this file, I typed:

java csvParser test.csv output.xml

That command opens and parses the CSV file and then converts it to XML. The XML for one employee looks like Listing 1. (You can also download output.xml.)

Listing 1. The XML output for one employee record from the CSV data sample

<?xml version="1.0"?>

<document>
<row>
<column1>000010</column1>
<column2>CHRISTINE</column2>
<column3>I</column3>
<column4>HAAS</column4>
<column5>A00</column5>
<column6>3978</column6>
<column7>19650101</column7>
<column8>PRES</column8>
<column9>18</column9>
<column10>F</column10>
<column11>19330824</column11>
<column12>52750</column12>
<column13>1000</column13>
<column14>4220</column14>
</row>
<row>
<column1>000020</column1>
...
</document>

When I started working on this, I thought the first line of the CSV file would contain the column names from DB2. I was going to use those names as the XML tag names. I didn't immediately find a way to get DB2 to export data in this format, so I just made up the column names, using the imaginative naming scheme you see in Listing 1.



Tutorial Pages:
» Getting the data
» Parsing the comma-separated value (CSV) file
» Converting the generated XML
» Resources


First published by IBM DeveloperWorks


 | Bookmark
Related Tutorials:
» Starting with XML
» Performing Client-Side XSL Transformations
» Create a Google Sitemap for your Web Site
» XML and Scripting Languages
» XML Security Suite: Increasing the Security of E-Business
» Servlets and XML: Made for Each Other