• Home

Logo

Navigation
  • Home
  • Articles
    • Content Writing
    • Design
    • General
    • Internet Marketing
    • Social Media
    • Tools and Tips
    • Usability
    • Web Hosting Articles
  • Tutorials
    • AJAX Tutorials
    • ASP Tutorials
    • C# Tutorials
    • CGI and Perl Tutorials
    • CSS Tutorials
    • Flash Tutorials
    • HTML Tutorials
    • Illustrator Tutorials
    • Java Tutorials
    • JavaScript Tutorials
    • Linux Tutorials
    • Miscellaneous Tutorials
    • MySQL Tutorials
    • Photoshop Tutorials
    • PHP Tutorials
    • Python Tutorials
    • Wireless Tutorials
    • WordPress Tutorials
    • XML Tutorials
  • Scripts
    • AJAX Scripts
    • ASP Scripts
    • ASP.NET Scripts
    • CGI & Perl Scripts
    • Flash Scripts
    • Java Scripts
    • JavaScript Scripts
    • PHP Scripts
    • Python Scripts
    • Remotely Hosted
    • Tools and Utilities
    • XML Scripts
  • Answers
  • Online Services
  • Tools

Parsing Comma-Separated Values

By Doug Tidwell | on May 18, 2005 | 0 Comment
XML Tutorials
  • Tweet
  • Share
  • Tweet
  • Share

Getting the data

To make a comma-separated value (CSV) file to work with, I used DB2′s EXPORT command for the sample database of employee records that ships with the product. The command I used was:

EXPORT to test.csv of DEL select * from employees

That generated the file test.csv , which you can download.

Parsing the comma-separated value (CSV) file

After I had the data file, I started looking around for some code to parse comma-separated values. I didn’t find exactly what I wanted, but I did stumble across the Java class StreamTokenizer. This class lets you design a rudimentary parser fairly easily. You select the delimiter between tokens, and it parses the file, converts data to strings and integers, and does some other nice things. You can view the code I wrote or download) it.

Color-coding our colorful coding

This article features colorized code listings, something we’re experimenting with here at dW. To generate our color-coded listings, I’m using a couple of open-source tools. First, I load the document (Java, HTML, XML, whatever) into Emacs. Emacs defines colors for keywords, comments, function names, and other programming language constructs — about a dozen in all. After Emacs has loaded and colored a file, I use the HTMLize package, an open-source utility written in the ever-popular Emacs Lisp language. HTMLize takes a listing exactly as it appears in Emacs, then converts it to HTML. The result is a fully color-coded file that highlights keywords, comments, function names, and so on.

Let us know what you think about these new and improved code listings.

If you’d like to do this kind of thing yourself, see Resources for the appropriate links.

To use this file, I typed:

java csvParser test.csv output.xml

That command opens and parses the CSV file and then converts it to XML. The XML for one employee looks like Listing 1. (You can also download output.xml.)

Listing 1. The XML output for one employee record from the CSV data sample

<?xml version="1.0"?>

<document>
<row>
<column1>000010</column1>
<column2>CHRISTINE</column2>
<column3>I</column3>
<column4>HAAS</column4>
<column5>A00</column5>
<column6>3978</column6>
<column7>19650101</column7>
<column8>PRES</column8>
<column9>18</column9>
<column10>F</column10>
<column11>19330824</column11>
<column12>52750</column12>
<column13>1000</column13>
<column14>4220</column14>
</row>
<row>
<column1>000020</column1>
...
</document>

When I started working on this, I thought the first line of the CSV file would contain the column names from DB2. I was going to use those names as the XML tag names. I didn’t immediately find a way to get DB2 to export data in this format, so I just made up the column names, using the imaginative naming scheme you see in Listing 1.

Converting the generated XML

Now that I had this XML-tagged data, I needed to write an XSLT style sheet that would convert the XML above into a more useful (and understandable) tag set. You can view the style sheet listing in a separate window, and you can also download it and view it locally.

To convert the generated XML document using the rules in the style sheet and then write the output to the file called employees.xml, I typed:

java org.apache.xalan.xslt.Process -in output.xml -xsl csv-stylesheet.xsl 

-out employees.xml

If you want to run the transform code, you’ll need the Xerces XML parser and version 1.0.0 of the Xalan style sheet processor (see Resources), both of which are free and available at xml.apache.org. The resulting XML looks like Listing 2. (You can also download employees.xml and employees.dtd.)

Listing 2. XML output of the document converted using csv-stylesheet.xsl



<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE employees SYSTEM "employees.dtd">
<employees>
<employee sex="F">
<serial_number>000010</serial_number>
<name>
<first_name>CHRISTINE</first_name>
<middle_initial>I</middle_initial>
<last_name>HAAS</last_name>
</name>
<department>A00</department>
<phone>3978</phone>
<date_of_hire year="1965" month="01" day="01"/>
<job_title>PRES</job_title>
<years_of_education>18</years_of_education>
<date_of_birth year="1933" month="08" day="24"/>
<salary>52750</salary>
<bonus>1000</bonus>
<commission>4220</commission>
</employee>
<employee sex="M">
...
</employee>
</employees>


Figure 1. Excerpt of the HTML table converted from the CSV data via the style sheet employee-table.xsl

Employee Sex Serial Number Department Job Title Date of Hire Years of Education
Phone Date of Birth Salary Bonus Commission Total Compensation
ADAMSON, BRUCE M 000150 D11 DESIGNER 02/12/1972 16
4510 05/17/1947 $25,280.00 $500.00 $2,022.00 $27,802.00
BROWN, DAVID M 000200 D11 DESIGNER 03/03/1966 16
4501 05/29/1941 $27,740.00 $600.00 $2,217.00 $30,557.00
GEYER, JOHN B. M 000050 E01 MANAGER 08/17/1949 16
6789 09/15/1925 $40,175.00 $800.00 $3,214.00 $44,189.00
GOUNOT, JASON R. M 000340 E21 FIELDREP 05/05/1947 16
5698 05/17/1926 $23,840.00 $500.00 $1,907.00 $26,247.00
HAAS, CHRISTINE I. F 000010 A00 PRES 01/01/1965 18
3978 08/24/1933 $52,750.00 $1,000.00 $4,220.00 $57,970.00

You can view the entire table in a separate window, and also view the HTML source.

I hope this helps. I’ve certainly learned some things along the way, and I hope you have too.

Resources

If you want to transform the XML documents, you’ll need the Xerces XML parser and the Xalan style sheet processor, both of which are free and available at xml.apache.org.

Share this story:
  • tweet

Author Description

Doug Tidwell is a Senior Programmer and Cyber Evangelist at IBM. He has more than a seventh of a century of programming experience and has been working with XML-like applications for several years. His work as a Cyber Evangelist is basically to look busy, and to help customers evaluate and implement XML technology. Using a specially designed pair of zircon-encrusted tweezers, he holds a Masters Degree in Computer Science from Vanderbilt University and a Bachelors Degree in English from the University of Georgia. He can be reached at dtidwell@us.ibm.com.

No Responses to “Parsing Comma-Separated Values”

You must be logged in to post a comment.

Connect With Us

RSSSubscribe 0Followers 492Likes
  • Popular
  • Recent
  • Comments
  • Creating Energy Spheres in Photoshop

    Apr 15, 2008 - 96 Comments
  • Easy Screen Scraping in PHP with the Simple HTML DOM Library

    Aug 6, 2008 - 20 Comments
  • Calculating date difference more precisely in PHP

    Mar 7, 2008 - 13 Comments
  • When Does Hosting Your Website in the Cloud Make Sense?

    Oct 8, 2010 - 2 Comments
  • Fun with the Microsoft Managed Extensibility Framework Part 2

    Oct 6, 2010 - 0 Comment
  • Fun with the Microsoft Managed Extensibility Framework Part 1

    Sep 22, 2010 - 0 Comment
  • Website Management on the go with the iPad

    I appreciated your post, but I was looking for something I didn't...
    November 24, 2012 - drmoderator
  • Creating Energy Spheres in Photoshop

    I'm a little stuck down here especially at the step of creating the...
    November 23, 2012 - sarah
  • Running background processes in PHP

    Can you give an example? As see it, you can use this only when you...
    November 16, 2012 - Shaked Klein Orbach
Developer Resources
  • Tutorial Directory
  • Learn HTML
  • Learn PHP
  • Learn CSS
  • Learn AJAX
  • Learn JavaScript
  • Learn Pear
  • White Papers
  • Resources
    • NetVisits Web Directory
    • Realtor Pixels
    • Answers On The Run
    • Ask A Geek
  • Recent Posts

    • When Does Hosting Your Website in the Cloud Make Sense?
    • Fun with the Microsoft Managed Extensibility Framework Part 2
    • Fun with the Microsoft Managed Extensibility Framework Part 1
    • Website Management on the go with the iPad
    • Code Contracts in C# 4.0 – Part 1

    Calendar

    May 2013
    M T W T F S S
    « Oct    
     12345
    6789101112
    13141516171819
    20212223242526
    2728293031  

    Recent Comments

    • drmoderator on Website Management on the go with the iPad
    • sarah on Creating Energy Spheres in Photoshop
    • Shaked Klein Orbach on Running background processes in PHP
    • Thomas Cuvillier on How To Upload Files Using PHP
    • rizal aditya on Extracting text from Word Documents via PHP and COM
    • Home
    © 2003 - 2013 DeveloperTutorials.com. All Rights Reserved. Privacy Policy.