A Developer’s Guide to Linux Emulators and How They Operate
Computers have been emulating other computers for a long time, often to access a legacy application or to use applications written for a popular OS on a system with a more stable, responsive OS. As Linux™ grows in popularity, developers need to examine their options when planning binaries that will run on non-Linux systems. This article examines what emulators do and looks at hardware and software emulation issues in detail.
For years, computers have been emulating other computers. A common reason to emulate older computers is nostalgia, and indeed, many emulators can run a broad variety of video games with perfect fidelity. Another reason to emulate another computer is to access application software that exists only on a specific platform.
In general, application emulation targets platforms that possess the larger market shares. For instance, the WINE project attempts to provide a way to run Windows® binaries, because — let’s face it — there are many more cool applications for Windows than there are for Linux (although, as they point out, WINE Is Not an Emulator).
However, in recent years Linux has proven to be a stable and versatile operating system; consequently, its market share has grown. And along with the growth of market share has come a spike in interest in emulating Linux. This article reviews the current state of Linux binary emulation on other systems and highlights some of the issues that Linux developers should keep in mind to make life easier for the people running their binaries in emulation.
The Basic Emulator
The idea of an emulator is simple. Computers are predictable enough. If you want to know exactly what a computer would do if it were given a certain piece of code, you can find out by making a model of that computer. Of course, there’s a certain amount of overhead involved, but if the computer you’re emulating is much older than the computer doing the emulation, the emulation will be faster than the original.
Some emulation layers, such as NetBSD’s Linux emulation layer, merely provide emulation of the software part of an environment, taking system calls from a Linux binary and handing back results that look like a Linux kernel was being used. Others, such as VirtualPC, may emulate the whole computer, including the processor. Emulating the processor is slower but can produce better compatibility.
Emulators as a Distribution Format
Although this article focuses on ways to run Linux binaries on other platforms, distribution of compiled binaries has its place as well. As Linux emulation becomes more widespread, the Linux binary format becomes a viable way to distribute simple programs without giving out source code. Linux binaries can be run on a broad variety of systems, admittedly sometimes at a cost — there are challenges in using the Linux binary format as a general distribution format.
Emulation usually isn’t enough to let you run a shared object built for one system in a program built for another. If your product is mostly distributed as a shared library object, it probably can’t be loaded on other platforms.
There are those who would argue that using the Linux binary format for distribution of code to other platforms is crazy. It may be crazy, but it works. For a few years, my primary Web browser was running under emulation (to say nothing of word processors, document converters, and even credit-card processing software).
Much of the software applications we like to use are commercial, and commercial software vendors benefit greatly from being able to distribute a single binary that runs on a great number of platforms. Given the variety of Linux emulation available, the Linux binary format is starting to look like a real software distribution option.
Oh, and porting source code is a much different task than distribution; frequently, porting is a much easier task.
Full Hardware Emulators
A full hardware emulator simulates an entire machine; not just the processor but the rest of the machine as well. For instance, an emulated computer will act as though it has its own keyboard controller and video card.
Full hardware emulation is especially common for accessing older-machine programs. A popular example is the MAME arcade game emulator, which emulates the hardware of various old arcade machines.
Full hardware emulators are in some ways the simplest way to do emulation. A lot of work goes into building a full hardware emulator, but once you’ve got it, everything should just work. For instance, VirtualPC on the Macintosh started supporting Linux in version 3.
Hardware emulation can get you around problems you can’t easily bypass otherwise. For instance, I once had a BIOS flash utility that was distributed only in the format of a self-extracting image file for DOS. Worse, it only ran on a machine with an actual floppy on a traditional ISA floppy controller (my Windows desktop machine had an LS-120 drive). Emulation to the rescue! I ran the program under an emulator, writing the data to a USB floppy drive plugged into a Mac.
Hardware emulation has its downside, too. A lot of effort goes into making everything work. If you want a network, you need to emulate a network chip well enough for the operating system to run on it. Furthermore, emulating foreign instructions can be very expensive. Often, a system like this will work nearly perfectly, but timing-related functionality may be unreliable.
Full hardware emulators have been in use for a long time, at their best for handling legacy systems and code that can take the speed hit from emulation.
Nonetheless, users who want to run x86 Linux binaries on a Macintosh or any other non-x86 machine may well rely on one of the currently available x86 emulators to try to get it running. Most utility programs will run perfectly well (if slowly, perhaps) on systems like this. The only major concern to worry about is that users of such systems may install smaller or older Linux distributions in the hopes of improving performance. Someone running an emulated machine with 32 MB of memory is unlikely to run the latest version of KDE.
Partial Hardware Emulators
Partial hardware emulators are an intermediate solution: they emulate a computer, but only a computer of the type they’re actually hosted on. Programs like this reduce the cost of emulation by generally performing at speeds comparable to the host machine. Examples include the Serenity Virtual Station and VMWare.
These systems are most useful when you have applications for a variety of systems and need to run them all at once. Like full hardware emulators, systems like this will be running a full Linux OS environment, and your program should be fine as long as it’s reasonably portable across Linux systems. However, once again, portability to older versions of Linux will help a lot. People using a virtual machine may want to run an older, smaller version of Linux on it.
In the world of emulation, software emulators are where life gets interesting. A software emulator is not running your program on a virtual machine — it’s running it on the fly without a virtual machine. These programs work by setting up an environment in which a program’s code can run normally, but attempts by the program to access the operating system get routed through an emulation layer of some sort. WINE is a great example (albeit for Windows), although it is officially not an emulator.
Some software emulators are explicitly invoked by the user, like the lxrun program available for SCO and Solaris systems. Others are built into a UNIX® kernel’s support for loading binary images — if a program doesn’t look to be valid, it can be compared against a table of possible emulators that can look at it to see whether they can run it.
Software emulators often offer the best user experience. There’s no special set up, no large disk images. The programs just run (most of the time). Access to system calls, shared libraries, and file system structures raise a number of issues, though, so we’ll cover them next.
System calls are the easiest and the hardest part of emulation. A system call has a well-defined interface, and the calling mechanism can generally be easily detected and handled — that’s the easy part. The hard part is that the system call may be difficult or impossible to implement reasonably.
Traditionally, the big killer in Linux emulation was the clone() system call. This call provided a brute force way to get simple threading by creating two processes that shared a number of things that could include memory, file descriptors, signal handling — in other words, anything and everything. Unfortunately, if your operating system didn’t provide a good analogue to this, there was simply no way to implement the system call.
Worse, since clone() showed up when POSIX threads were not well or widely supported and was often used as a substitute for them, a lot of programs used it in a variety of exciting, complicated, and (need I say) unexpected ways.
If you want people to run your binaries, try to stay away from OS-specific system calls; favor standard POSIX system calls. This is a good practice in software development.
A kernel-based emulator traps the system calls when they reach it. A user-space emulator such as lxrun waits for the application to try to make a system call. Because the Linux system call facility is not the same as the system call facility on Solaris or SCO UNIX, the result is a segmentation fault. The lxrun program then acts like a debugger, correcting the fault and continuing — but in fact, it has intercepted the system call, made a corresponding system call to the underlying operating system, and patched everything up. Clever!
File system structures
The problem with file systems is often more subtle. It’s easy enough to access the file system. What’s not easy is finding the files there that you expect.
If your program is running in emulation, the file system you access may be substantially different from the file system you had when you were developing the program. For instance, if your program uses the /proc file system (commonly used to get access to kernel status and information), it’s possible that a feature common in more recent kernels will be absent on an older system.
Linux developers have a big advantage here over developers on proprietary systems, because different Linux distributions arrange files differently, so most programmers have a good sense of how to avoid being too dependent on file system layouts. Nonetheless — sometimes — a file name will have a perfectly good reason for being encoded in a program.
A solution to this dilemma, adopted in more than one emulator, is to set up an extra layer of interpretation for file system calls. For instance, in NetBSD’s Linux emulation code, file accesses are checked first against the files in /emul/linux and only after that against the files in the system’s real root directory. This allows the system to provide “overrides” for system files when Linux binaries won’t work with the standard files.
In fact, the main use for this is in libraries and other support files, but a number of system binaries are provided as well. For instance, if a Linux binary were to try to call uname to get a kernel version, it would be very confused if it got back a NetBSD version number. Instead, it gets the Linux version numbers it’s expecting.
As mentioned above, shared libraries are a good candidate for being found by the emulated binaries but not by system binaries. Because the details of shared library formats and ABIs may vary from one system to another, you can’t just assume that all the systems can share a given library. Names will clash — for instance, the current NetBSD and SUSE 7.3 both have a file called libncurses.so.5. Gettting the right one of those is important.
Shared libraries bring up another point for developers. It’s important to know what library version the different systems are using. Right now, NetBSD is using SUSE 7.3 shared libraries for its Linux emulation. There’s code to grab the 9.1 shared libraries, but there’s also a warning that they aren’t stable with the kernel-level emulation.
Emulation packages tend to lag a bit behind the rest of the marketplace. Even if you think that most of your prospective users will have reasonably current Linux distributions, the emulator crowd will almost all be a bit behind the times.
Shared libraries bring up another concern — not every system contains all of them. Emulation packages are often likely not to have every last shared library installed. And, to make it more fun, their users are less likely to be able to easily install a missing package.
In these cases, it’s a good idea to minimize dependencies, both on new features and on non-core shared libraries. Emulator users are likely to run into these issues.
Don’t get tricked into using static libraries as insurance against these problems. A static library can introduce its own new dependencies, and you can’t check them as easily. It doesn’t do any good to rework an algorithm to avoid an unportable system call if you statically link with a library that uses it. Dynamic linking allows you to build a program that will run on a much broader variety of systems.
Programs calling other programs
There is one special case that seems to bite people more than any other, especially with installers. On many systems, the default shell you get by calling /bin/sh is notbash. This means that scripts that assume bash extensions may not work on other systems.
This gets into an especially tricky bit of logic in the emulator. The operating system probably knows enough to check the Linux path for relevant Linux binaries when a binary is executed, and it will likely have a copy of bash installed there. But when you run a script, the kernel doesn’t see this as a Linux binary; it sees a script with an interpreter path, and it’s no longer running in emulation mode when it tries to load the interpreter.
Portable shell scripting techniques pay off here. This is one of the most common issues users face when running emulated applications. The installer may fail to run because it’s a nonportable shell script.
Like Normal Development, Only More So
The things to keep in mind when making life easier for users who might run your program in emulation are the same things you should keep in mind when developing any software:
• Follow applicable standards as much as possible.
• Avoid “special features.”
• Don’t push the envelope.
And don’t build your code to depend on something that was just released a month ago if you can possibly avoid it. That will only serve to shrink your effective target market.
• The lxrun Linux emulator runs on SCO and Solaris systems entirely in user space and does not require kernel modifications.
• Sun maintains a page about using lxrun on Solaris.
• The IBM Redbook Linux Applications on pSeries talks about porting rather than emulation, but the approach to porting — a compatibility library — is useful.
• Emulate legacy operating systems on Linux (developerWorks, June 2003) provides an overview of operating system emulations for Linux systems.
• The article What to watch out for when writing portable shell scripts has more information on portable shell scripting.
• We’ll just offend them by linking to it from an article about emulation, because WINE Is Not an Emulator.
• VMWare is one of the best-known virtual machine emulation solutions.
• Serenity Virtual Station is one of the virtual machine emulation solutions in the marketplace.
• Donn Seeley’s paper at Usenix in 2000, ” LAP: a little language for OS emulation,” discussed the issues encountered in developing Linux emulation for BSD/OS.
• Find more resources for Linux developers in the developerWorks Linux zone.
• Download no-charge trial versions of IBM middleware products that run on Linux, including WebSphere® Studio Application Developer, WebSphere Application Server, DB2® Universal Database, Tivoli® Access Manager, and Tivoli Directory Server, and explore how-to articles and tech support, in the Speed-start your Linux app section of developerWorks.
• Get involved in the developerWorks community by participating in developerWorks blogs.
• Purchase Linux books at discounted prices in the Linux section of the Developer Bookstore.