Go to the source to learn Linux basics and build the right Linux for you
Linux® From Scratch (LFS) and its descendants represent a new way to teach users how the Linux operating systems work. LFS is based on the assumption that compiling a complete operating system piece by piece not only teaches how the operating system works but also allows an independent operator to build systems for speed, footprint, or security.
Many authors have written about UNIX® flavors, delving into the mysteries of scheduling, memory management, multiprocessing and threading, file systems, and the interaction between users and the kernel. The author writing about Linux has an advantage over UNIX authors: The Linux kernel is unlikely to split into competing forks — corporate upheavals notwithstanding — because the GNU Public License (GPL), the existence of a centralized research lab — the Open Source Development Lab (OSDL) — and Linus Torvalds’ unassailable position make Linux, luckily, a slow-moving target.
Why UNIX internals matter
Different Unix kernels do not agree on much apart from what could be described as a certain family resemblance. The various UNIX flavors have an advantage, though, that Linux seems to lack: All UNIX flavors are supposed to be full operating systems. Linux, often described as "just a kernel" (an arbitrary definition if ever there were one), presents a core of common functionality and implementations that do not change fundamentally whether the kernel runs on an underpowered Pentium® II machine or on a Symmetric Multiprocessing (SMP) system. To simplify matters even more, one could say that the further you get from a Linux kernel, the more variety you’re likely to find, while UNIX systems tend to be diverging implementations of various UNIX/POSIX standards.
Things are never quite as simple as that. Inspecting Linux kernel and system-level code is likely to be a time-intensive affair and of somewhat limited use in the real world. The LFS project aims to remedy the problem of limited system-level intelligibility on Linux. The very fact that the kernel needs a large number of libraries and tools to get a Linux system to perform even basic tasks has been commented upon, but what if a somewhat more sophisticated user who has a slim-line Linux distribution does not like to download several gigabytes of binaries that lock him out of any chance to optimize a system and do not allow him to throw out all those pesky, unnecessary tools? What if a very sophisticated user refuses to accept the diktat of various community distributions and wants to run a Linux/Apache/MySQL/PHP (LAMP)-type application stack from a CD? LFS comes to the rescue.
Linux From Scratch
The LFS project is, obviously, based on the source files that are sufficient — but not necessary — to make up a basic Linux system. It has moved beyond the Linux kernel and the device drivers, because to produce a working Linux system, you have to add a complete compiler tool chain, a number of Linux assembler utilities, the glibc system library, system configuration tools, and tools connected to userland shell access. LFS is predicated on the assumption that a Linux or UNIX power user with some knowledge of scripting wants to get to know the workings of a complete usable system without having to delve into the kernel code itself.
To get acquainted with the way a Linux system works, the creators of LFS decided that compiling the system by following the tree of module dependencies would be a natural way to get to know the mechanics of an operating system in general and Linux in particular. After users have mastered the compilation process, they can start eliminating those parts of the dependency tree connected to system components that are irrelevant to supporting the operating system’s primary purpose. It is feasible, for instance, to eliminate the compiler tool chain itself after compilation is complete. Embedded LAMP stacks can make do without a full set of command-line utilities. Configuration utilities might be dropped, as well, and most users can make do with one, instead of the plethora of file systems Linux tends to support.
One important part of the LFS system is the large number of source files available as tar balls. Documentation is another vital part — and the most important. Indeed, it would be perfectly possible to take an up-to-date LFS book file and create an LFS distribution, because each download location and the characteristics of each source file and its dependencies are described in the LFS book in detail. The procedures for compiling each group of source files from the kernel to the compiler to the shell have been written up, and you can find alternative routes — where they are possible — in LFS books describing systems with different characteristics. Another part of the LFS system that is unlikely to be in the toolkit of the average power user is the boot script needed to boot the system after the basic LFS system has been pieced together.
Now for the big caveat of LFS distributions: What a courageous distribution builder needs is a working Linux distribution, including a complete compiler tool chain and a suite of file system-creation utilities. Naturally, all source-based Linux distributions need to be bootstrapped using a particular compiler version, which is by no means identical from distribution release to distribution release. LFS is not the only system in this field, but it is the only system that allows you to work directly with individual source files. Most other source-based Linux systems, such as Sourcemage and MyGeOS, provide a complete download, which users are well advised to use. LFS makes no such assumption, and stripping down the LFS framework is encouraged.
Presuppose a functioning Linux distribution installed on nonexotic hardware, even though LFS is probably less demanding as far as configuration tools and scripting are concerned. To compile LFS, you need to prepare a partition and a file system, and you also need to compile a compiler and system library. It is a fairly nerve-racking procedure if done by hand, but it definitely increases your confidence in dealing with the rest of the installation. The compilation of the whole system tends to take from an hour to four days, depending on the age of the underlying hardware and your command-line dexterity.
If — and this is a fairly big assumption — you’re willing to retain much of the book installation and keep changes to the installation proposed in the LFS book to a minimum, you could also use the automated installation routine to install an LFS-based distribution. The installation routine is not presented in the LFS book, but is available as an XML-based description under the name Automated Linux from Scratch (ALFS). The active installation is available as a C-based script that uses
ncurses to give some semblance of a graphical installation. The installation is also known as nALFS and presents an extremely flexible package installation framework. It needs a functioning Linux system with a working C compiler and XML parser to work. A working LFS system would suffice.
Automated Linux From Scratch
ALFS has a purpose that goes beyond LFS itself. LFS on its own teaches the inner workings of a Linux-based operating system, but it does not include a single graphical user interface (GUI). Neither does LFS permit connecting to a network or, indeed, the Internet. ALFS can simplify extending the system — for example, by adding the libraries enabling Internet access or by installing the X libraries required for graphical desktops.
The creators of LFS recognized the need for other varieties of source-based Linux systems. For those who want to go beyond LFS and add X Window System, GNOME, and networking support, another LFS derivative was created: Beyond Linux From Scratch (BLFS). The trio of LFS books — and let’s not forget that we are talking about books, not distributions — form a triangle standing on one of its angles: The basic LFS build is the foundation for an automated compilation and, if required, for a full source-based Linux distribution. BLFS turns the basic Linux system into a full user-ready Linux system. AFLS simplifies installing and extending a source-based Linux installation. The compilation of the complete source-based system is guided by a script you can leave to run on its own after you have tuned it to the hardware on which it’s running. You can extend the installation sequence easily after you (or the installation engineer) have decided which packages are required to run, say, a particular office application suite. ALFS comes in handy here, too, as it lends itself to network-wide installations from source.
The final member of the LFS family addresses a particularly important aspect of source-based Linux: security. The common-sense approach to security for someone who does not intend to rely on patches delivered from your Linux distribution server of choice would be to track security advisories for selected core libraries and applications. For LFS implementers, the problem is somewhat different: It would be difficult, although not impossible, to audit Linux kernel code, and perhaps a number of libraries and utilities central to the internal functioning of a Linux-based operating system.
Code audits are extremely time-consuming, and adding a large number of patches is advisable only if patch servers are maintained centrally by dedicated staff. It is, however, possible to replace some libraries that have been rewritten from the ground up to reflect new approaches to security problems. A good example is to make it extremely difficult to guess process identifiers by randomly allocating numbers from a reasonable large random number pool. The OpenBSD project has pioneered this method, which has found its way into various UNIX flavors and Linux distributions.
A fairly new project known as Hardened Linux From Scratch (HLFS) takes this approach to security under Linux. The project, which presupposes a fairly decent grasp of LFS and some parts of BLFS, uses several utilities and libraries that do not tend to be standard in most Linux systems.
Possibly the most important addition to HLFS is the Stack-Smashing Protector (SSP), which you enable by using a
gcc directive. SSP was developed to protect against stack-smashing attacks, which belong to the most common class of security threats affecting Linux systems. Other security goodies include a first-class random number generator and the compilation of position-independent executables, where executable code typically turned into statically linked object code appears as shared libraries, and position-independent executables libraries can hide their addresses by randomizing them. Of course, a large number of patches are available and can be sourced from the HLFS Web site.
The growing LFS family
The LFS family of Linux builds is, in many ways, a method for giving back the power to construct Linux-based operating systems to the people who started it all: the hackers. But the most important result for the creators of LFS seems to be that through LFS, all Linux distributions have become intelligible to interested users. By allowing users to build a Linux distribution piece by piece and by helping users see a Linux-based operating system as a system of many parts, alternative approaches to building Linux distributions become possible.
More generally, users do not need to be programmers to change the way a Linux distribution is built: The bit of scripting users learn by building an LFS system is sufficient. An LFS specialist can change and extend the very composition of a Linux distribution without impairing its basic structure. This functionality is particularly important for organizations that have the manpower and expertise to maintain Linux systems, but not the financial wherewithal to buy commercial support from consultancies and corporations. LFS-based Linux systems have been demonstrated for educational purposes and for large networks. It is likely they will be used in other areas, as well.