Using locked DBM files with CGI-driven forms saves client data without DBMS overkill
Many Web-deployed applications are written within elaborate database-driven server-side development frameworks such as PHP and Java™ servlets, but for simple applications (for example, where the entire dataset fits comfortably within a Web server’s RAM), data persistence can be easily accomplished using locked DBM files in conjunction with the Perl
MLDBM module. This article presents a real-world example — a Web-based voting application — that highlights the use of minimal external modules, forgoes using client-based cookies, and takes advantage of CGI attributes.
It’s no secret that software is becoming increasingly complex and that we are seeing additional layers added to systems in an effort to keep software components modular. The primary result should be that these systems are now easier to maintain and more scalable, yet sometimes these techniques are simply overkill that results in over-designed software. In other cases, developers choose an overly complex but well-known technology rather than investigate a simpler but less familiar alternative.
After all, if all one has is a hammer, then every problem looks like a nail.
I was recently asked to design a small program to tally election ballots for a university student organization. It was a simple project insofar as it was limited to serving no more than 500 students over the course of one week; afterwards, the results would be tallied and released.
Since this project required an almost trivial level of service, there would be no benefit to passing queries to and from an external database. Instead, the script could rapidly read and write data structures directly. Still, I wanted to put something together that was a little bit better designed than a few pages of spaghetti code. I wanted a well-thought out, self-contained design that would also offer simplified deployment.
CGI considerations: Simplicity vs. complexity
Perl seemed an obvious language to choose for this project — it is supported on most typical platforms, and there are many convenient libraries available in the Perl library repository, CPAN.
As to underlying architectures, the Common Gateway Interface (CGI) was the first widely used approach to extend Web servers to provide interactive content. Developers often speak of the superiority of newer standards such as JSP, .NET, mod_perl, PHP, and ISAPI, and they are correct to point out CGI’s deficiencies. But in the case of this project, a CGI script that counts votes for several hundred users hardly constitutes a large-scale application, since all ballot information can easily be held in the system RAM of the Web server. This allows the entire lookup table to be loaded into memory each time a user submits a request to read or write data.
Furthermore, the logical sequence of drafting a ballot, confirming a ballot, and tallying the results lends itself to splitting up the logical data into three different physical files. This would minimize attempts to open locked files.
There would also be no real penalty for a transaction failing occasionally due to a locked file. Whether a transaction failed due to network problems or a locked file, the result would be the same: The user would simply click a second (or third) time, then the vote would most likely be counted on one of those attempts. This behavior should be kept in mind, however, as a different application might not be as forgiving of an inability to process concurrent transactions.
For this project, CGI offers several advantages:
- It needs no special Web server enhancements.
- It requires no database engine (in this simple case).
- It can be incrementally developed.
- It can even be upgraded later through the use of accelerators such as mod_perl.
Keep in mind, however, that due to platform constraints, CGI applications (those that create new processes) run much slower on Win32 systems. Also, the Apache Web server is still considered a Linux™/UNIX®-hosted application, although it may run fine under Windows®. The Resources section offers information on another (non-IIS) Web server alternative for Win32 systems, as well as a classic explanation of the original CGI specification on the original National Center for Supercomputing Applications (NCSA) site.
Functional design considerations
Let’s dive right into a major concern for this simple project: the functionality of the design.
Here’s the idea. The user is presented with an initial screen requesting that she enter her e-mail address and select several candidates from a Web form. The submission of the selections is recorded as a draft ballot entry locally, then an e-mail verification is sent to the supplied e-mail address. In this case, I am assuming that a verified e-mail address is sufficient to establish a user’s identity.
This brings up the issue of multiple voting. Practically speaking, I can think of no way to make it impossible for users to vote multiple times using multiple e-mail addresses, but we can limit the tallying of votes so that only one vote is allowed per e-mail account. This verification e-mail contains a link back to the originating CGI script, which allows comparison of the link to the record saved in a local DBM file. If the two records match, a ballot entry is made in the cast ballot table and the vote is tallied. If records do not match, no entry is made and the vote is not scheduled for confirmation. Instead, a new confirmation e-mail is generated, with a new verification record in the database. This overwrites any draft ballot entry for the associated e-mail address, effectively starting the process again from the beginning.
If the records match, the voter may confirm the draft ballot. At this point, if the voter changes her mind, she could simply return to the Web form and enter a new draft ballot, which would replace the previous one. This design presents a reasonably secure system; as long as each voting user has one and only one acceptable e-mail account, there is reasonable assurance that users could not vote twice. (I’ll return to this later.)
Let’s get into the system details.
Details: Hash keys
The use of hashed keys to create associative arrays in Perl allows on-the-fly development of complex data structures. When you combine this feature with the ability to store these (arbitrarily complex) structures in a binary DBM file, you can develop what amounts to a tiny database system. The missing component that allows all of this to work is provided by the
MLDBM module allows complex Perl hashes to be almost seamlessly stored in a local file. The
MLDBM::Sync module makes it possible to safely lock these file, using the
$sync->ReadLock methods. After loading or saving the structure of interest, a subsequent call to the
UnLock() method flushes I/O and unties the variable. (See the Perl documentation regarding the
MLDBM::Sync module for more information —
man 3 MLDBM::Sync.)
Essentially, the logic flow is as simple as that shown in Listing 1.
Once I had decided what the underlying conditional flow would be, the only remaining task was to construct objects that would fit into this sequence of actions. As I noted previously, the necessary hash data structures were retrieved and updated using
tie‘d variables and
MLDBM file locking. The objects used were more like smart structures than full-fledged objects; the data was passed between them in a way that paralleled the progress of a ballot from an initial draft to a final counted vote.
In other words, the list of ballots was used to construct a
DraftBallot, which in turn was used to create the
BallotBox classes. In this way there was minimal coupling to the main ballot CGI program.
On another tangent, while I understand that it is generally considered poor practice to have constructors that rely on external resources such as files (since they could fail and wind up in an unpredictable state), the code in this case was much easier to understand by doing just that. Since Perl doesn’t rely on pointers, I can see no reason not to take advantage of this simplification.
Details: E-mail gotchas
Allowing users to send e-mail from your Web server is a risky move since spammers could potentially exploit your host to send unsolicited e-mail. In order to minimize the possibility of such an exploit, the script always checks to determine if the e-mail is being sent to an acceptable address. You could further tighten the system by modifying the validation method
voter_is_okay() within the
DraftBallot class to consult a list of acceptable e-mail addresses. Effectively, this would require users to register in advance to vote.
Other methods of preventing duplicate votes could involve collecting IP addresses or setting cookies on the client, but I rejected these approaches as it is likely that many students would use shared public terminals on campus.
Details: Not-so-secret ballots
The call to the
$castBallot->dumpHTMLentrys() method would echo back a detailed accounting of who voted for whom. In practice I would comment this call out, scheduling the Web server to be shut down when elections were over by using the Linux
at batch command.
With the server off, you can uncomment this section and restart with the Web server temporarily set to listen only on the localhost address. The full results would be then be echoed back to anyone clicking on a previously submitted link, which can be collected through a copy sent to a dedicated free e-mail account.
When using SASL authentication, you have two options: Either point the script to a machine that can forward the e-mail using correct credentials, or rewrite the script to contact the external SMTP server directly using the Perl
As I thought about this workflow scheme, I realized that the necessity of using a
GET-based verification link and the use of unencrypted verification links made it a trivial exercise to read these links and construct a false confirmation ballot based on a specific e-mail address and some known verification links. In order to discourage this, while still allowing the easy debugging via unencrypted links, I decided to add one more twist to the verification process: adding a unique identifier to each draft ballot.
This identifier was based on the operating system process identifier (PID) of the executing script. This was combined with a random number in order to make it difficult to predict the URL to validate a draft ballot. I was concerned about this because a malicious individual might be able to deconstruct the very visible URL patterns in order to create false confirmation ballots. This is the one part of the code that would not translate directly to a
mod_perl version, since it relies on the use of the PID of the running Perl instance concatenated with random digits. If the form is generated from a reused
mod_perl instance, the PID number will not necessarily change between invocations.
In hindsight, I realize that the best way to obfuscate this link would be to use an MD5-generated hash value, effectively hiding all voter information. This would have the dual benefit of being pretty tough to forge, while remaining portable to
mod_perl-based scripts. The drawback would be that the code is slightly more difficult to debug by inspecting the exchange of information between the client and the server.
Details: File layout
Installation requires three types of directories on the Web server:
- A writable directory for saving user submissions
- An area for the CGI to run from
- An area where static data (such as CSS, logo images, and possibly a file containing more detailed instructions) can reside
Also note that permissions will most likely need to be tweaked so that the Web server can write to the directory for the DBM files.
Listing 2 shows the creation of some typical directories on the Web server.
Strictly speaking, only the cgi-bin (/var/www/cgi-bin) and DBM (/var/www/db) directories are absolutely essential, since they hold the script executable and the voting data, respectively. The layout shown in Listing 1 is specific to Linux, and the user and group names of the Web server process may vary, but the essential point is that there are several components that need to be placed in the correct area of the file system to be available to the Web server. After copying the support files into their respective directories, be sure to update any aliases in the Web server configuration files such as httpd.conf.
After creating the directories as described in Listing 2, copy the files from the ZIP file into the analogous subdirectories on your system. Most importantly, the ballot, DraftBallot.pm, BallotBox.pm, and CastBallot.pm files should be in the cgi-bin directory. Only three nonstandard Perl modules are required; the process of installing them is presented in Listing 3 (see the modules’ README files for more details).
Details: Static vs. dynamic DNS
While it would have been possible for me to set up this service from a site with an assigned domain on a static IP address, I felt that a dynamic DNS would offer several security advantages. Normally a server is not reachable from the Web at large without a static IP address, but a dynamic DNS service allows me to set up a temporary resolvable machine name under another top-level domain. This allows me to quickly appear and disappear from the Internet, minimizing my exposure to black hats. Best of all, the service is free.
It is also worth noting that it may be advisable to configure the server to listen on a nonstandard high-numbered port such as 8000 since many ISPs block incoming connection requests on port 80. A client (voter) could then typically be referred to the vote server only through a link from a well-known static address, such as a school-supplied Web page. When voting is completed, the server offering the Web application can be completely removed from the Web remotely without shutting it down or reconfiguring it. There would be no possibility of any vulnerabilities affecting the referring page (which may be administered by someone else). This is an especially important consideration in politically sensitive environments. (See the Resources section for further details on using a dynamic DNS service.)
Details: Is GET harmful?
Browsers can maintain state by transferring data to referred pages using the
POST methods, as well as through cookie information contained within the headers passed to the server. In order to confirm that a ballot was sent from a real person (or at least from an active e-mail account), draft ballots would be sent to an e-mail address for confirmation. Additionally, cc: or bcc: messages could be used later for reference. As I mentioned earlier, the most straightforward way to accomplish this is to send an
HTTP GET structured link to the voter. Still, some authors claim that
GETs that update records are simply bad form. In this case, though, any user subsequently clicking on a link will simply receive an update as to the current tally of votes for each candidate, so no harm is done.
Other possible improvements
Other security considerations could and should be taken into account when using this script. Any program that allows an external entity to input data is vulnerable to malicious activity, such as buffer overflows and embedded control characters. Conversely, the use of a dedicated routine to read and write local DBM files has at least one benefit: There is no possibility of SQL injection when there is no SQL back end to access.
In a nod to the need to filter incoming data, I set the variables
$CGI::POST_MAX to very strict values. Additionally I recommend the following:
- String all incoming variables of all unexpected characters and truncate the length to a reasonable limit.
- A lot of runtime data is kept inside the script. The advantage to this practice is that there are fewer files to distribute and to set permissions for. The disadvantage is that users may not want to edit code, and the code becomes less clear. A possible compromise may be to take advantage of kludges such as the
DATApseudo-file handle to tuck data at the end of a script.
- File locking is a very tricky issue and race conditions abound. It seemed that for every guideline I found spelling out the correct way to lock files, a subsequent correction was posted. I tried to minimize the time files were open and leverage the locking mechanism provided for the
- Perl modules are not put in their own path away from the CGI, so they could theoretically be executed from the cgi-bin directory. It is recommended that these modules not be set as executable.
- PHP is practically ubiquitous on Linux platforms, so I would consider porting this script in PHP if the need to reimplement it arose. However, I am not sure there is a PHP equivalent to the
- The layout of the voting form is considered unfair by some since it presents the first candidate as a default.
- I did not use perldoc. I should.
Having the opportunity to set up this system, while attempting to keep it simple and self contained, allowed me to discover some very useful Perl modules. I found the process of refining features and developing functional specifications for such a simple project to be both interesting and enjoyable. I hope the list of considerations when crafting this type of system will help you with similar projects.
mod_perldocumentation offers excellent advice on file locking, race conditions, and other gotchas.