A Design Process for Standardized Testing of Linux
Automating software testing allows you to run the same tests over a period of time, ensuring that you are really comparing apples to apples and oranges to oranges. In this article, Linux Test Project team members share their methodology and rationale, as well as the scripts and tools they use to stress-test the Linux® kernel.
In testing the stability of Linux kernel releases, there is a need to clearly state and document why the release is stable or unstable. And yet no documented and proven, system-wide stress test exists currently that can test the stability of the Linux kernel in its entirety. This article provides a method for creating a system-wide Linux stress test and proving the legitimacy of the results. Different Linux developers, users, and distributions use their own methods for testing kernel stability. However, information regarding the basis for their decision on which tests to run, the kernel code covered, and stress levels attained are unpublished, which greatly reduces the value of the results.
Using lab machines and tests available for Linux from the Linux Test Project test suite, we developed a combination of tests, based on system resource utilization statistics, to adequately stress the system. We analyzed this combination test to determine which sections of the Linux kernel get exercised during test execution. Afterwards, we modified the combination test to allow the highest percentage of code coverage, while maintaining the high level of system stress desired. The final result is a stress test that covers enough of the Linux kernel to be useful for stability statements, and that has the system usage and kernel code coverage data to support it.
The four steps to this combination test method are: test selection, system resource utilization evaluation, kernel code coverage analysis, and final stress test evaluation.
Test selection involves selecting tests that accomplish two things:
• The tests should allow the attainment of high-resource utilization levels for main kernel areas, such as the CPU(s), memory, I/O, and networking.
• The tests should adequately cover the kernel code to help support the stability statement produced from their results.
Whenever possible, use tests that are automated or easily modified to support automation. Automation allows for quicker and repeatable testing, and helps reduce the risk of human error. Using applications that allow free publication of results is another consideration when selecting suitable tests. It is good to choose tests and test suites that adhere to the open source methodology
Evaluating System Resource Utilization
The combination of selected tests must adequately stress the system’s resources. Four primary areas of the Linux kernel can affect system response and execution time:
• CPU: Time spent processing data on the CPU(s) of the machine
• Memory: Time spent reading and writing data to and from real memory
• I/O: Time spent reading and writing data to and from disk storage
• Networking: Time spent reading and writing data to and from the network
Test designers should use the following two well-known and widely used open source Linux resource monitoring tools to evaluate the resource utilization levels. (For links to download both of these tools, please see Resources later in this article.)
• top: An open source tool maintained by Albert D. Cahalan, which is is included in most Linux distributions and works on the current 2.4 and 2.6 kernels.
• sar. Another open source tool; this one is maintained by Sebastien Godard. This tool is also included in most Linux distributions and works on the current 2.4 and 2.6 kernels.
This system resource utilization evaluation phase of the method usually requires multiple attempts at getting the right combination of tests that will achieve the desired level of utilization. Over-utilization is always a concern when deciding on the combination of tests. For example, choosing a combination that is too I/O bound can create poor results for the CPU, and vice versa. This part of the method consists primarily of a large amount of trial and error, until the desired levels for all resources are attained.
The top tool is useful for quickly determining which resources (CPU, memory, or I/O) each test affects and how much of them it utilizes in a real-time fashion. The sar tool is useful for gathering network utilization statistics and recording snapshots of all utilization data to a file over a period of time.
Once a combination is chosen, a test must be run for an extended amount of time to accurately evaluate the resource utilization. The amount of time to run the test depends on the length of each test. Assuming that multiple tests are being executed concurrently, the amount of time must be long enough to allow the longest of all these tests to complete. The sar tool should also be running during this evaluation. At the conclusion of the evaluation run, you should gather and evaluate the utilization levels for all four resources.
The following example shows sar output for CPU, memory, and network utilization:
Listing 1. Example output from sar
10:48:27 CPU %user %nice %system %iowait %idle
10:48:28 all 0.00 0.00 0.00 0.00 100.00
10:48:29 all 3.00 0.00 1.00 0.00 96.00
10:48:30 all 100.00 0.00 0.00 0.00 0.00
10:48:31 all 100.00 0.00 0.00 0.00 0.00
02:27:31 kbmemfree kbmemused %memused kbswpfree kbswpused %swpused
02:29:31 200948 53228 20.94 530104 0 0.00
02:31:31 199136 55040 21.65 530104 0 0.00
02:33:31 198824 55352 21.78 530104 0 0.00
02:35:31 199200 54976 21.63 530104 0 0.00
02:27:31 IFACE rxpck/s txpck/s rxbyt/s txbyt/s
02:29:31 eth0 738.79 741.66 76025.55 136941.85
02:31:31 eth0 743.30 744.97 76038.82 136907.77
02:33:31 eth0 744.80 745.02 76135.53 136901.38
02:35:31 eth0 742.35 744.34 75947.45 136864.77
Analyzing Kernel Code Coverage
Achieving adequate kernel coverage is another responsibility of a system stress test. Although the chosen combination of tests extensively utilizes the four main resources, it may only be executing a small subset of the kernel. Thus, you should analyze coverage to ensure that the combination lends itself to being a system stress test, and not a system load generator. Currently, two open source tools can help in code coverage analysis of the Linux kernel:
• gcov: An open source tool maintained by the Linux Test Project. This tool analyzes the coverage of the kernel, and reports what lines, functions, and branches are covered and how many times they were hit.
• lcov: An open source tool developed by IBM and maintained by the Linux Test Project. This tool consists of a set of Perl scripts that build on the text-based gcov output to implement HTML-based output. The output includes coverage percentages, graphs, and overview pages that allow quick browsing of coverage data. You can find both tools at the Linux Test Project (LTP) home page (see Resources for a link).
After the gcov module is loaded, all tests run in the system stress test combination must be executed. Although the original system stress test can and should have concurrent executions, this run should be iterative. Each test should be run once to completion, one after another, without repetition of any test. The single, iterative run is an attempt to reduce the amount of unpredictable and untargeted kernel code executions that result from the kernel’s attempt to load balance the multiple, concurrent runs of the system stress test. You should run the gcov analysis after the conclusion of the final test run. As the final step in formulating the data for analysis, run the lcov tool and unload the gcov module.
The lcov tool generates an entire HTML tree that contains every line of code in the kernel and data on how many times, if any, each line was executed. The tool quantifies the coverage data and generates coverage percentage numbers for each section and file of the kernel. The following example show a sample code coverage output:
Figure 1. Example of gcov output
The lcov maintainers defined “adequate coverage” (green), and thus the lcov example is just an opinion. However, the included raw data allows any reviewer to make his or her own judgment. The test creator can now make changes to the combination of tests after reviewing the coverage analysis, to change and/or increase the amount of code covered.
Evaluating the Final Stress Test
Verification of the system stress test is the reason for this final step in the method. Execute the stress test on a kernel believed to be stable; usually the kernel included in a distribution will fill this requirement, but not always. Execute the stress test over an extended period (minimum of 24 hours recommended), with the sar tool running as well, for two reasons:
• The extended run will help find any problems within the combination that would have otherwise gone unnoticed in a short, “sniff test.”
• The data produced from sar forms your baseline for comparison in future test runs.
After the conclusion of the extended run, you are now able to decide, based on all the data gathered, whether or not this test combination is a good candidate for system stress testing.
Figure 2. Summary of design process
The Linux Test Project used this design method when designing the Linux kernel stress test script ltpstress.sh. This application combines multiple tests from different areas of LTP’s test suite, along with memory and network traffic load generators. Before executing, the test adjusts its total memory usage according to how much real and virtual memory exist on the system. This test script is available through the LTP test suite (see Resources). The script was created under controlled laboratory conditions to ensure the accuracy of the results.
The IBM Linux Technology Center Test department uses this stress test, along with other tools and tests, as a relatively quick and easy way to help validate the stability of Linux kernel releases. Tests are conducted under laboratory conditions, as well as under simulated customer scenarios, to help ensure adequate coverage.
• Download the stress test shell script and a passel of other useful tests at the Linux Test Project home page.
• The mission of the IBM Linux Technology Center is to work directly with the Linux development community with a shared vision of making Linux succeed.
• The OSDL’s Linux Kernel Scalable Test Platform (STP) provides a framework where developers can test kernel patches against an online performance and scalability suite.
• Kernel comparison: Improvements in kernel development from 2.4 to 2.6 ( developerWorks , February 2004) takes a look at the tools, tests, and techniques that helped make 2.6 a better kernel than any that have come before it.
• Kernel comparison: Web serving on 2.4 and 2.6 ( developerWorks , February 2004) presents results from the IBM Linux Technology Center’s Web serving testing efforts.
• In Improving Linux kernel performance and scalability ( developerWorks , January 2003), the Linux Technology Center Linux Kernel Performance team discusses how to quantify Linux performance for the purpose of comparing test results over time.
• Putting Linux reliability to the test ( developerWorks , December 2003) documents the test results and analysis of the Linux kernel and other core OS component by the IBM Linux Technology Center.
• Inside the Linux kernel debugger ( developerWorks , June 2003) shows you how to trace kernel execution and examine its memory and data structures.
• Find more resources for Linux developers in the developerWorks Linux zone.
• Purchase Linux books at discounted prices in the Linux section of the Developer Bookstore.
• Develop and test your Linux applications using the latest IBM tools and middleware with a developerWorks Subscription: you get IBM software from WebSphere®, DB2®, Lotus®, Rational®, and Tivoli®, and a license to use the software for 12 months, all for less money than you might think.
• Download no-charge trial versions of selected developerWorks Subscription products that run on Linux, including WebSphere Studio Site Developer, WebSphere SDK for Web services, WebSphere Application Server, DB2 Universal Database Personal Developers Edition, Tivoli Access Manager, and Lotus Domino Server, from the Speed-start your Linux app section of developerWorks. For an even speedier start, help yourself to a product-by-product collection of how-to articles and tech support.