///Cultured Perl: Reading and Writing Excel Files with Perl

Cultured Perl: Reading and Writing Excel Files with Perl

Using the Spreadsheet::WriteExcel and Spreadsheet::ParseExcel Modules

Only recently have the doors been open to Microsoft Excel, the most popular spreadsheet application for the desktop. This article takes a look at reading and writing Excel files in Windows and Linux, using Perl and a few simple modules. The author of this article, Teodor Zlatanov, is an expert in Perl who has been working in the community since 1992 and who specializes in, among other things, open source work in text parsing.

Parsing Excel files presents a conundrum any way you look at it. Until last year, UNIX modules were completely unavailable, and data from Excel files for Windows could only be retrieved with the Win32::OLE modules. But things have finally changed, thanks to two Perl hackers and a lot of volunteer help and contributions!

Spreadsheet::WriteExcel and Spreadsheet::ParseExcel

In 2000, Takanori Kawai and John McNamara produced the Spreadsheet::WriteExcel and Spreadsheet::ParseExcel modules and posted them on CPAN, which made it possible, though not easy, to extract data from Excel files on any platform.

As we’ll see later, Win32::OLE still offers a simpler, more reliable solution if you’re working with Windows, and is recommended by the Spreadsheet::WriteExcel module for more powerful manipulations of data and worksheets. Win32::OLE comes with the ActiveState Perl toolkit, and can be used to drive a lot of other Windows applications through OLE. Note that to use this module, you still need to have the Excel engine (usually installed with Excel itself) installed and licensed on your machine.

The applications that need to parse Excel data number in the thousands, but here are a few examples: exporting Excel to CSV, interacting with a spreadsheet stored on a shared drive, moving financial data to a database for reporting, and analyzing data not provided in any other format.

To follow along with the examples given here, you must have Perl 5.6.0 installed on your system. Preferably, your system should be a recent (2000 or later) mainstream UNIX installation (Linux, Solaris, BSD). Although the examples may work with earlier versions of Perl and UNIX, and with other operating systems, you should consider cases where they fail to function as exercises to solve.

Windows Example: Parsing

This section applies to Windows machines only. All the other sections apply to Linux.

Before you proceed, install ActiveState Perl (version 628 used here) or the ActiveState Komodo IDE for editing and debugging Perl. Komodo comes with a free license for home users, which you can get in a matter of minutes. (See Resources later in this article for the download sites.)

Installing the Spreadsheet::ParseExcel and Spreadsheet::WriteExcel modules using the ActiveState PPM package manager is difficult. PPM has no history, options are hard to set, help scrolls off the screen, and the default is to install modules ignoring dependencies. You can invoke PPM from the command line by typing “ppm” and issuing the following commands:

ppm> install OLE::Storage_Lite

ppm> install Spreadsheet::ParseExcel
ppm> install Spreadsheet::WriteExcel

The module install will fail in this case, because IO::Scalar is not yet available, so you may want to give up trying to find the problem with PPM, and switch to the built-in Win32::OLE module. However, by the time you read this, ActiveState may have released a fix for this problem.

With Win32::OLE from the ActiveState toolkit, you can dump a worksheet, cell by cell, using the code listed below:

#!/usr/bin/perl -w


use strict;
use Win32::OLE qw(in with);
use Win32::OLE::Const 'Microsoft Excel';

$Win32::OLE::Warn = 3; # die on errors...

# get already active Excel application or open new
my $Excel = Win32::OLE->GetActiveObject('Excel.Application')
|| Win32::OLE->new('Excel.Application', 'Quit');

# open Excel file
my $Book = $Excel->Workbooks->Open("c:/komodo projects/test.xls");

# You can dynamically obtain the number of worksheets, rows, and columns
# through the Excel OLE interface. Excel's Visual Basic Editor has more
# information on the Excel OLE interface. Here we just use the first
# worksheet, rows 1 through 4 and columns 1 through 3.

# select worksheet number 1 (you can also select a worksheet by name)
my $Sheet = $Book->Worksheets(1);

foreach my $row (1..4)
{
foreach my $col (1..3)
{
# skip empty cells
next unless defined $Sheet->Cells($row,$col)->{'Value'};

# print out the contents of a cell
printf "At ($row, $col) the value is %s and the formula is %s\n",
$Sheet->Cells($row,$col)->{'Value'},
$Sheet->Cells($row,$col)->{'Formula'};
}
}

# clean up after ourselves
$Book->Close;

Note that you can assign values to cells very easily in the following way:

$sheet->Cells($row, $col)->{'Value'} = 1;

Linux Example: Parsing

This section applies to UNIX, and specifically Linux. It has not been tested under Windows.

It would be difficult to give a better example of parsing with Linux than the one provided in the documentation for the Spreadsheet::ParseExcel module, so I will show that example and then explain how it works.

Listing 3: parse-excel.pl

#!/usr/bin/perl -w


use strict;
use Spreadsheet::ParseExcel;

my $oExcel = new Spreadsheet::ParseExcel;

die "You must provide a filename to $0 to be parsed as an Excel file" unless @ARGV;

my $oBook = $oExcel->Parse($ARGV[0]);
my($iR, $iC, $oWkS, $oWkC);
print "FILE :", $oBook->{File} , "\n";
print "COUNT :", $oBook->{SheetCount} , "\n";

print "AUTHOR:", $oBook->{Author} , "\n"
if defined $oBook->{Author};

for(my $iSheet=0; $iSheet < $oBook->{SheetCount} ; $iSheet++)
{
$oWkS = $oBook->{Worksheet}[$iSheet];
print "--------- SHEET:", $oWkS->{Name}, "\n";
for(my $iR = $oWkS->{MinRow} ;
defined $oWkS->{MaxRow} && $iR <= $oWkS->{MaxRow} ;
$iR++)
{
for(my $iC = $oWkS->{MinCol} ;
defined $oWkS->{MaxCol} && $iC <= $oWkS->{MaxCol} ;
$iC++)
{
$oWkC = $oWkS->{Cells}[$iR][$iC];
print "( $iR , $iC ) =>", $oWkC->Value, "\n" if($oWkC);
}
}
}

This example was tested with Excel 97. If it does not work, try converting to the Excel 97 format. The perldoc page for Spreadsheet::ParseExcel claims Excel 95 and 2000 compatibility as well.

The spreadsheet is parsed into a top-level object called $oBook. $oBook has properties to aid the program, such as “File,” “SheetCount,” and “Author.” The properties are documented in the perldoc page for Spreadsheet::ParseExcel, in the workbook section.

The workbook contains several worksheets; iterate through them by using the workbook SheetCount property. Each worksheet has a MinRow and MinCol and corresponding MaxRow and MaxCol properties, which can be used to figure out the range the worksheet can access. The properties are documented in the perldoc page for Spreadsheet::ParseExcel, in the worksheet section.

Cells can be obtained from a worksheet through the Cells property; that’s how the $oWkC object is obtained in Listing 3. Cell properties are documented in the perldoc page for Spreadsheet::ParseExcel, in the Cell section. There does not seem to be a way, according to the documentation, to obtain the formula listed in a particular cell.

Conclusion

If you are using a Windows machine, stick with the Win32::OLE modules unless you don’t have Excel at all on your machine. Win32::OLE is the easiest way to get Excel data right now, although the Spreadsheet::WriteExcel and Spreadsheet::ParseExcel modules are catching up.

On UNIX, especially Linux, go with the Spreadsheet::WriteExcel and Spreadsheet::ParseExcel modules for programmatic access to Excel data. But be forewarned that these are fairly young modules, and they may not be perfect for you if you need stability.

You may also consider packages like Gnumeric and StarOffice (see Resources), which are freely available and offer a full GUI interface and import/export capabilities for Excel files. These are useful if you don’t need programmatic access to the Excel data. I have used both applications and find them wonderful for day-to-day tasks.

Resources

• Read Ted’s other Perl articles in the “Cultured Perl” series on developerWorks.

ActiveState produces the wonderful ActiveState Perl toolkit and the Komodo development environment.

• Visit CPAN for all the Perl modules you ever wanted.

Gnumeric offers a full GUI interface and import/export capabilities for Excel files.

Perl.com offers abundant Perl information and related resources.

perldoc.com has Perldoc information online.

• The Spreadsheet::WriteExcel modules have made it possible to extract data from Excel files on any platform. More information is also available from CPAN.

Spreadsheet::ParseExcel works in conjunction with Spreadsheet::WriteExcel . More information is also available from CPAN.

“Programming Perl Third Edition” by Larry Wall, Tom Christiansen, and Jon Orwant (O’Reilly & Associates, 2000) is the best guide to Perl today, up-to-date with 5.005 and 5.6.0 now.

• Learn about IBM’s ExcelAccessor Bean Suite Project, which allows you to access the properties of a cell in an Excel workbook.

• Browse more Linux resources on developerWorks.

• Browse more Open source resources on developerWorks.

2010-05-26T17:02:46+00:00 May 1st, 2004|CGI and Perl|0 Comments

About the Author:

Teodor Zlatanov graduated with an M.S. in computer engineering from Boston University in 1999. He has worked as a programmer since 1992, using Perl, Java, C, and C++. His interests are in open source work, Perl, text parsing, three-tier client-server database architectures, and UNIX system administration. Contact Ted with suggestions and corrections at tzz@bu.edu.

Leave A Comment