What do you do When Your Tools are Too Fat?
Tuning isn’t always about speed, sometimes other aspects of the application need fixing. When your application needs tuning, your first course of action is normally to monitor the application with a profiler. But profiling is not always practical — sometimes for ironic reasons. In this installment of Eye on performance, Jack and Kirk relate their recent experiences with profiling a fat client — so fat, in fact, that it left no room for a profiler.
We haven’t needed to address the issue of tuning an application’s memory footprint for a while. Usually, the tuning requirements we see that are related to memory involve reducing garbage collection overhead, ideally by tuning the heap sizes and garbage collection algorithms, and where that fails, reducing object churn using various techniques. But sometimes an application takes up too much memory, regardless of the efficiency of allocation and garbage collection.
A Trip to the Fat Farm
Recently we were given the assignment to reduce the memory footprint of a fat client. While the term “fat client” often is used to mean an ordinary GUI client application, in this case the client was nearing obesity. This client was running on the Windows platform, which imposes a limit on process size of 2 GB. After subtracting the executable address space and other spaces needed to include various JNI products, the maximum heap size available to the application was around 1.2 or 1.3 GB. Unfortunately, some of the users of this application were getting very close to that limit because of the volume of data they were pouring through it. One obvious tuning option was to move to Unix machines, but that was ruled out as impractical — the customer preferred instead to slim down the application.
So, our task was set. Profile the fat client and find out what was taking up all the space. Then slim down some of those objects to a size that left headroom for later expansion or higher data volumes. Straightforward, we thought. It might take a while of course, because object reduction is usually more of a long slog than leaps of brilliance, but in a heap of that size there must be lots of extra flab that could be cut. Or so we thought.
The usual procedure
We began our usual footprint-reduction procedure: set up the test environment, specify a reproducible test, fire up the profiler, run the test, analyze the data, and look for tuning opportunities. Tick follows tock, we do this all the time . . . or so we thought. This time, we got to the “run the test” step, and the profiler fell over. So we tried again. And it died again. We changed the configuration of the profiler to minimize the overhead it needed and tried again. And it died again. There was simply not enough JVM heap space left over to accommodate the profiler for the entire run, let alone generate any useful profiling data. And we were using a classy, commercial profiler, usually very reliable, so we were surprised.
Try, try again
Well no matter, there are plenty more fish in the sea, and nowadays there are plenty more profilers to choose from (see Resources for a recent review of available profilers). Another day, another profiler, right? The sequence of testing was, unfortunately, remarkably similar. Profiler number two crashed the JVM at about the same point that profiler number one did. It was even highly configurable, like profiler number one, and configuring it to minimize overheads and data extracted from the profiles got us that little bit further. Just like profiler number one. And it crashed. Just like profiler number one. Profiler number three was, sadly, no different.
A cunning profiler
Profiler number four, however, was cunningly different. For memory analysis of a snapshot of live objects (ignoring object creation and garbage collection, and only looking at a snapshot of live objects at some point), profiler number four imposed no overhead at all on the JVM until the snapshot was required. Success! For the first time, our test ran through to the point where we needed to take our snapshot, with the profiler running. We were happy. Then we triggered the snapshot, and the JVM crashed.
We tried it again, but the profiler needed too much extra process space to generate the snapshot. It just wouldn’t work. We were back to square one! There were still half a dozen more commercial profilers we could try, but the pattern was obvious. It was time for some lateral thinking.
Ironically, it was the sophistication of the profilers themselves that was our problem. We needed something simple. Of course, simple doesn’t always mean lower overhead, but it was worth a try, given the disappointing results we’d had so far with sophisticated profilers. So we started scanning through the open source profilers.
Back to the Beginning
We started with those that seemed targeted at memory analysis. Open source memory profiler number one seemed to be utterly simple — almost too simple. The output would be of very limited use, only a list of classes and counts of objects by class. But that was as good as anywhere to start. It crashed. We had that sinking feeling that we were headed down the same path as before. Open source profiler number two was even simpler than number one, though it actually gave more detailed information — a heap dump with one entry per object, showing each object’s size and class. Like the other profilers, we tried it out at lower scales, and we could see the heap dump was going to be big — roughly the size of the heap, so we were looking at a 1 GB output file. We tried it. It crashed the VM. But it did give us a partial dump.
When you are dealing with large factors like this, it is important to get a sense for the magnitude of the required resources and how long things are going take. To dump output to a 1 GB file could take many minutes. If you’re not prepared for how long an operation is likely to take, you may mistakenly think the process is hung when in fact it is chugging along, just taking as long as is required to dump out a gigabyte of formatted text. This open source profiler was working — but we had neglected to give it enough time in the first test. To compound the problem, we then forced it to give a second dump in the middle of the first — which caused it to crash. Fortunately, we figured out that the problem was us, and not the profiler, and with slightly better awareness, we tried it again and it worked.
The heapprofile profiler
So, which profiler worked? It was “heapprofile” written by Matthias Ernst. It was nothing more than a page of C code using the Java Virtual Machine Profiler Interface (JVMPI) to dump the heap in the simplest format possible. You even have to compile it yourself, no pre-compiled executables are available from the Web site (see Resources). It was the level of simplicity we needed for this problem. No overhead at all. No use of heap or JNI resources beyond the absolute minimum. It did nothing while the program was running, then when I wanted a heap dump it simply walked the heap, dumping each object’s size and class directly to a file — no building up of in-memory structures, which was what made all the other profilers crash the JVM.
Of course, we weren’t done yet. Now we needed to analyze the resulting data, and use it to determine what objects were being used by the application. Fortunately, the output format was easy to parse. Once we found the objects that were causing trouble, we had to find the allocation sites for those objects. To do that in a low overhead way, we used the simple tactic of recompiling those few classes with a stack tracker in the constructor, a technique detailed in Jack’s book (see Resources). This simple technique involves creating — but not throwing — an exception in the constructor. That exception contains a stack trace for the allocation site. Then you tabulate those stacks for all the objects. Because most stacks are the same, you don’t actually have to store very much data — at most a few thousand strings — identifying call stacks and the associated numbers of instances linked to each stack.
Simple But Ugly
These are simple techniques, but they are not hugely productive. We would have much preferred to use a full-featured profiler to get the data out, especially because they present data in ways that make analysis much easier. We would have liked to have viewed the heap from its roots and tracked down the bigger sections until we found objects that referenced a lot of the heap, but we didn’t have that option.
The techniques we used were ugly compared to our normal tuning assignments. The result, though, was that we identified some objects that were completely unnecessary, that could be eliminated using a different implementation for some classes; and other objects that were necessary but that could be slimmed down or collapsed together to reduce their space requirements. As is often the case with object reduction, there was no one solution that could change our fat client to a slim one. As with people, dieting is hard work for Java applications, too! And just like dieting, it always seems to take longer than you’d like to lose that flab. Sadly, even the couple of hundred megabytes that we shaved off this client didn’t leave it thin enough to run with a “real” memory profiler.
The Final Word
We saw a question once in a Unix discussion group that asked, “What do Unix gurus use to edit text?” The ensuing discussion had advocates for vi, Emacs, and many more. But the undoubted correct answer was “Unix gurus edit text with whatever is available.” The Java platform is blessed with some truly excellent profilers. But ultimately, you need data to analyze if you want to tune, and you have to be prepared to get that data whichever way you can.
• Using profilers properly is not as easy as you may think. That’s one of the reasons the authors offer hands-on training courses that show you which profilers are available and how to use them. Check the available training courses page at JavaPerformanceTuning.com for more information.
• Read a review of Java Application Performance Management tools, including a list of commercial profilers.
• Get more information about the heapprofile profiler.
• Dr. Heinz Kabutz explains how to track stack traces using Java cod from wherever you want in your program.
• Here’s a comprehensive list of profilers, both open source and commercial.
• The Performance Inspector is a suite of performance-management tools for Linux.
• These training courses show you which profilers are available and how to use them, using hands-on tuning exercises.
• This article on JVMPI details the Java platform’s interface with profiling tools.
• WebSphere Studio integrates J2EE profiling into the development environment.
• Find hundreds of articles about every aspect of Java programming in the developerWorks Java technology zone.
• Interested in test driving IBM products without the typical high-cost entry point or short-term evaluation license? The developerWorks Subscription provides a low-cost, 12-month, single-user license for WebSphere®, DB2®, Lotus®, Rational®, and Tivoli® products — including the Eclipse-based WebSphere Studio® IDE — to develop, test, evaluate, and demonstrate your applications.