///Mondo Math Libs

Mondo Math Libs

A look at some of the math libraries for Linux

¬†Programmers generally fall into two groups when it comes to using math. One group doesn’t use floating point much, if at all, and typically needs integers only for the usual mundane purposes like loop control variables, counters, address arithmetic, and other simple calculations. This group’s math needs rarely require the use of anything more exotic than a 32-bit signed integer. They deal with floating point arithmetic only when necessary. And when floating point arithmetic cannot be avoided, they tend to head for the path of least resistance, using whichever FP format is handy or forced on them. This group includes most system programmers, as well as non-scientific and non-financial coders.

The second group includes the financial, scientific, and hobbyist programmers who don’t just crunch numbers but mercilessly grind them until their CPU glows cherry red. They use online handles like "mantissaMan" and "sqrt-neg-one," and tell jokes with scientific notation punch lines. If you’re wondering whether you are in this second group, you probably aren’t.

This sort of thinking let me to write about multiple-precision math (hereafter MPM) libraries. The fact that there’s a seemingly endless list of implementations of MPM libs available on the net for Linux makes this topic valuable to both groups in our little programmer taxonomy.

Ground rules

Before I get into the specifics of various libraries and where to find them, let’s agree on some terminology.

The most widely quoted and used standard for floating point numbers is IEEE-754, which defines several data formats and exactly how various bit patterns in the fields are interpreted. (See Resources later in this column for an online guide to IEEE-754.)

Mantissa and exponent refer to the two parts of binary-format floating point numbers as they are normally represented. The mantissa is the "base" part of the number, and the exponent is the power of ten (usually) the base is raised to. In decimal, the number 5.12e+3 has a mantissa of 5.12 and an exponent of 3.

Precision measures a number’s "size" in terms of the exact number of digits it can represent. This can be deceptive to the floating-point neophyte, however. Don’t think that because an IEEE-754 single precision number can store a maximum number of 3.40292347e+38 that it can represent every possible floating point number up to that value (or down to the negative of that value). An IEEE-754 single has a mantissa of only 23 bits, and you can’t possibly represent 38 decimal digits in only 23 bits. This is typical of binary floating point number representations: their exponents exceed their grasp. But precision doesn’t just refer to floating point values; it’s also applicable to integers.

I can’t stress enough that floating point is one area where you should pay attention to exactly how your data is represented, particularly when dealing with software-only implementations, such as MPM libs. In those cases, you’re using an artificial construct that was simply invented by a programmer, and not something supported in hardware.

Binary vs. decimal

This leads us to the next issue: binary vs. decimal representation of floating point numbers. The most commonly used floating point formats, whether IEEE-754 conforming or otherwise, are based on binary representations, just as integers are. Unfortunately, this leads to the endless discussion in newsgroups about unexpected results from some floating point operations. The explanation is simple — there are many fractional values that can be represented precisely in decimal notation yet have no exact form in binary. This most often surfaces after repeated calculations with one or more such numbers, leading the result to drift from the expected answer. Or it surfaces when you print a result at high precision, and instead of getting a nice round number like 74.00000, you get 73.99998. If you think this behavior is inconvenient in your astronomy or ray tracing program, imagine the headache it causes scientists or financial institutions, where even a small mistake can be catastrophic in more ways than one.

The most obvious solution to this problem is simply to represent floating point values in decimal form, as in the BCD (binary-coded decimal)/"packed decimal" numbers used in mainframes for decades, and countless other custom formats cooked up by inventive programmers. This will solve the representation problem, but usually at a cost in runtime performance, because the math operations are all carried out entirely in software (as in the MPM libraries discussed below), instead of in hardware in the form of a floating point unit (FPU).

By contrast, multiple-precision integers are almost universally represented as binary numbers, because there’s no benefit to going decimal, as there is with floating point values. Keeping the components in binary actually improves performance in most cases, because the most common approach is to represent each multiple-precision integer as an array of smaller integers, each of which is a format natively supported by the hardware.

Some contenders

So, you find that you need to use truly huge integers or floating point values in a Linux program. Where do you turn? In this section I’ll give you a brief overview of some of the more interesting and useful MPM library packages available on the Net. By no means an exhaustive list, or even a "best of the Net" list, this is just a good starting place.

If you find any math libraries for Linux that are worthy of mention, please e-mail me at the address at the start and end of this column, because this is one topic that I will likely return to in future installments.

apfloat
This package includes support for arbitrary precision numbers in four forms: integers, floating point numbers, rational numbers, and complex numbers. The apfloat source code includes two archives: a common, base portion plus one tailored to various C++ compilers (including a generic ANSI C++ version), as well as versions optimized for 386/486 or Pentium systems. While the package’s author says apfloat should be usable with nearly every C++ compiler, it includes a makefile for gcc, which makes building the library under Linux very easy.

The supported math functions include the basic four arithmetic operations plus inverse root, trigonometric functions (normal, arc, hyperbolic, and inverse hyperbolic), power, exponentiation, a function to calculate pi to a desired precision, and more. There is also a stream output operator.

The package includes full source code plus several demonstration and test programs.

apfloat

  • Main sites: www.jjj.de/mtommila/apfloat/
  • Download page: http://www.jjj.de/mtommila/apfloat/1.51/
  • Maximum precision: Limited only by memory (including disk space)
  • Language: C++
  • Licensing: Freeware, free for non-commercial use
  • Documentation: The apfloat manual is a 35-page PostScript file available from the downloads page. It provides a good introduction to the library, as well as several appendices on number theory and algorithms.

gmp
gmp stands for GNU Multiple Precision, and is, of course, the GNU entry in the field. It supports arbitrary precision operations on signed integers, rational numbers, and floating point numbers, and is, of course, quite at home with Linux.

One interesting facet of gmp is the degree to which it has been tailored to various hardware platforms. The manual points out that the package contains "carefully optimized assembly code for these CPUs: DEC Alpha, Amd 29000, HPPA 1.0 and 1.1, Intel Pentium and generic x86, Intel i960, Motorola MC68000, MC68020, MC88100, and MC88110, Motorola/IBM PowerPC, National NS32000, IBM POWER, MIPS R3000, R4000, SPARCv7, SuperSPARC, generic SPARCv8, and DEC VAX. Some optimizations also for ARM, Clipper, IBM ROMP (RT), and Pyramid AP/XP."

Not surprisingly, the manual also points out that gmp places "a general emphasis on speed (as opposed to simplicity or elegance)." This is a fair statement; gmp is very fast, but because it is written in C, it provides a lower level interface than the other MPM libraries mentioned here. In particular, the lack of operator overloading places an additional burden on the programmer and anyone modifying his or her code in the future. The lack of destructors is more problematic, in that it opens the door for the all-too-familiar memory leak problems, because the gmp data types use dynamically allocated memory, which must be freed via explicit calls to _clear functions.

The gmp package is also pretty sparse in terms of provided math functions. It supports numerous variations of the basic functions, plus square roots, I/O, and random numbers, but little else.

gmp

  • Main and download page: http://www.swox.com
  • Maximum precision: Limited by memory
  • Language: C (the main Web page mentions plans to design a C++ wrapper, but there doesn’t seem to be any progress on that front yet)
  • Licensing: GPL (GNU public license)
  • Documentation: The manual is online in HTML format at http://www.swox.com/gmp/manual/gmp_toc.html, or you can download it in DVI or PostScript format.

If you experiment with gmp, be sure to download copies of the patches from the main site, because they fix some subtle bugs in the support for rational and floating point numbers.

hfloat
hfloat is a floating-point-only library that was developed under Linux using gcc 2.7.x and also includes the author’s fxt-library, a collection of several FFT implementations. hfloat allows you to choose any number from 2 to 65536 for the radix used in number representations, giving it some notable flexibility. The supported operations include square root, n-th power, some trigonometric functions (not as many as apfloat), and logarithms.

hfloat has a few quirks to watch out for, including initialization syntax (described in "Hints and tips" below), and the way it represents a number’s radix and precision. For example, for a base-10 number you use a radix of 10,000, not 10, and you measure precision in LIMBs, which are 16-bit unsigned integers. These aren’t major problems, particularly for programmers working with this sort of package, but I recommend you pay attention to the author’s descriptions and examples of these items in the manual.

The source code for hfloat is also somewhat unusual in that the author has broken it into several compilation units (the base class, one for the "guts", one for functions, etc.). This will be visible to you only if you intend to modify the package, but it adds a bit to the learning curve as you figure out where everything is.

hfloat

  • Main and download site: http://www.jjj.de/hfloat/hfloatpage.html
  • Maximum precision: Limited only by memory
  • Language: C++
  • Licensing: GPL (GNU Public License)
  • Documentation: The 15-page manual is available as a separate download, in DVI, PostScript, or Acrobat/PDF format. The manual is a bit terse, but it does cover basic usage of the package and its general layout.

Java’s BigInteger and BigDecimal

 

Java coders are probably familiar with the java.math package and two of its more interesting classes, BigInteger and BigDecimal. These are both arbitrary precision numeric classes, the first being binary, and the second being decimal, as you might have guessed from the names.

The good news is that these classes have been part of the standard Java implementation since JDK 1.1, so they’re "free" to programmers and require no extra work to distribute/install them on a user’s system (assuming Java itself is installed).

The primary drawback of these classes is that they’re pretty minimalist in comparison with some of the MPM libraries. They support the main four arithmetic operations and some other minor manipulations, but that’s it. There are no trigonometric, log, exponentiation, or other features available. As such, they make a good foundation for Java programmers, but they’re not enough to support many scientific or advanced applications.

One other interesting note is that IBM has made its own drop-in replacement for Sun’s BigDecimal class available. You can download it and documentation from IBM’s site (see Resources). IBM’s and Sun’s BigDecimal classes differ in two areas. IBM improves in a couple of areas on the Sun version, such as making some conversions to less precise data types throw exceptions instead of simply returning bad data. IBM has also added a greater level of control over rounding, but without sacrificing backwards compatibility with existing code.

Java’s BigInteger and BigDecimal

Hints and tips
You could write several sizable chapters or even an entire book on the benefits and pitfalls of MPM libraries, but here are a few things to watch out for as you experiment.

  • Always remember that MPM libraries are quirky animals: Every one has its own personality, and you should never assume you can pick one up, slap it into an existing project, and start coding away. At a minimum you should count on experimenting with the library at two levels: a very simple, naive one, in which you just exercise features to gain familiarity with the package, and also a more intense and application-specific one that should reveal the library’s oddities and give you a feel for its performance as you’ll use it. (And performance can be a particularly thorny issue, because in many cases the authors of some of these libraries have lavished attention on some parts and left others with minimal and slower implementations.)
  • Consider initialization issues: Virtually all MPM libraries will let you initialize variables in their extended precision formats using as constants either the native environment’s numbers or string values. This makes sense; it would be silly if you couldn’t initialize a multiple-precision variable to anything larger than what a native number will hold. But you can run into trouble with decimal format multiple-precision floating point variables. For example, if you initialize such a variable with the literal constant 0.1, you’ll get a nasty surprise. Your compiler will normally convert 0.1 into a binary format, resulting in a repeating decimal value, which will be used to initialize your variable, so your variable won’t start off containing exactly 0.1. You can avoid this problem by using a string constant of "0.1" instead of a numeric constant. This will get the desired data into your variable and bypass your compiler’s translation efforts.
  • Be careful about semantics: Some libraries, like hfloat, do some unusual things with what seems like "normal" statements. The hfloat manual explains that the line hfloat a = 1234; does not initialize the hfloat a to 1234, but instead sets a’s precision to 1234 digits and initializes it to 0. This is probably not what you would expect, and could lead to some spectacular bugs.
  • Know the package’s resource requirements: Some libraries require a bit more tender loving care in order to work properly. In some cases you can simply use the library as provided or recompile it, and then drop it into your project as you would any other dynamically linked library. But some libraries require environment variables to be set or other libraries to be present. These details aren’t a concern for your own work, but if you plan to distribute a program that uses one of these libraries, it can quickly turn into more trouble than your users are willing to accept.
  • Another aspect of resource requirements is memory usage. Many of the MPM libraries will use all available memory in extreme circumstances, such as dealing with truly huge numbers, but they give you a way to limit the precision of the numbers and the memory used, or to decide when to stage values to disk instead of keeping them in memory. Depending on how widely distributable your application must be, these controls could mean the difference between a successful product rollout and a flop (no pun intended).
  • Handling exceptions, overflows, and underflows: This is another area where MPM libraries vary greatly. Some ignore such issues completely, and others spend considerable effort to give you complete control over and information about how math operations are performed, and the results. Some packages use arbitrary precision representations, as opposed to a fixed precision, eliminating overflow/underflow problems.

Hints and tips

You could write several sizable chapters or even an entire book on the benefits and pitfalls of MPM libraries, but here are a few things to watch out for as you experiment.

  • Always remember that MPM libraries are quirky animals: Every one has its own personality, and you should never assume you can pick one up, slap it into an existing project, and start coding away. At a minimum you should count on experimenting with the library at two levels: a very simple, naive one, in which you just exercise features to gain familiarity with the package, and also a more intense and application-specific one that should reveal the library’s oddities and give you a feel for its performance as you’ll use it. (And performance can be a particularly thorny issue, because in many cases the authors of some of these libraries have lavished attention on some parts and left others with minimal and slower implementations.)
  • Consider initialization issues: Virtually all MPM libraries will let you initialize variables in their extended precision formats using as constants either the native environment’s numbers or string values. This makes sense; it would be silly if you couldn’t initialize a multiple-precision variable to anything larger than what a native number will hold. But you can run into trouble with decimal format multiple-precision floating point variables. For example, if you initialize such a variable with the literal constant 0.1, you’ll get a nasty surprise. Your compiler will normally convert 0.1 into a binary format, resulting in a repeating decimal value, which will be used to initialize your variable, so your variable won’t start off containing exactly 0.1. You can avoid this problem by using a string constant of "0.1" instead of a numeric constant. This will get the desired data into your variable and bypass your compiler’s translation efforts.
  • Be careful about semantics: Some libraries, like hfloat, do some unusual things with what seems like "normal" statements. The hfloat manual explains that the line hfloat a = 1234; does not initialize the hfloat a to 1234, but instead sets a’s precision to 1234 digits and initializes it to 0. This is probably not what you would expect, and could lead to some spectacular bugs.
  • Know the package’s resource requirements: Some libraries require a bit more tender loving care in order to work properly. In some cases you can simply use the library as provided or recompile it, and then drop it into your project as you would any other dynamically linked library. But some libraries require environment variables to be set or other libraries to be present. These details aren’t a concern for your own work, but if you plan to distribute a program that uses one of these libraries, it can quickly turn into more trouble than your users are willing to accept.
  • Another aspect of resource requirements is memory usage. Many of the MPM libraries will use all available memory in extreme circumstances, such as dealing with truly huge numbers, but they give you a way to limit the precision of the numbers and the memory used, or to decide when to stage values to disk instead of keeping them in memory. Depending on how widely distributable your application must be, these controls could mean the difference between a successful product rollout and a flop (no pun intended).
  • Handling exceptions, overflows, and underflows: This is another area where MPM libraries vary greatly. Some ignore such issues completely, and others spend considerable effort to give you complete control over and information about how math operations are performed, and the results. Some packages use arbitrary precision representations, as opposed to a fixed precision, eliminating overflow/underflow problems.

Resources

  • Sun’s Numerical Computation Guide has an extensive introduction to floating point, the IEEE-754 spec, and Sun’s standard math libraries.
  • "What Every Computer Scientist Should Know About Floating Point Arithmetic" is a 78-page article that contains more detail than you probably knew existed about floating point math.
  • Two ftp sites with numerous MPM libraries, papers, and related materials are:

  • William Kahan is a professor at the University of Berkeley, and has several publications on his Web site related to floating point issues. (In particular, see his paper "How Java’s Floating-Point Hurts Everyone Everywhere" for an interesting take on the subject.)
  • 2010-05-26T17:15:44+00:00 August 22nd, 2003|Linux|0 Comments

    About the Author:

    Eric Olson is a software engineer at Lakeview Technologies, which specializes in high availability, disaster recovery, clustering, and data replication infrastructure software and services. He has developed in the Java programming language for over six years, and has worked with many different business object frameworks including EJB technology and IBM WebSphere business components.

    Leave A Comment