///Eye on performance: Wait leaks

Eye on performance: Wait leaks

Gain a Better Understanding of this Curious Race Condition

A fine line runs between performance tuning and debugging. Several particular categories of bugs, including memory errors and thread race conditions, frequently surface during performance tuning, and this month, our performance tuning experts Jack Shirazi and Kirk Pepperdine show how to spot a particular class of race conditions, called wait leaks.

Some types of bugs often fall to the performance tuner to fix, even though they are not, strictly speaking, performance problems. The out-of-memory error, often caused by object leaks, is one such type of bug. (We outlined how to handle those in “Trash talk,” an earlier installment of this column available in Resources.) Another type of bug that often falls to performance tuners to remedy is thread deadlock and other threading problems, such as race conditions, because these tend to only show up when the program is tested under load.

These bugs are often handed to performance tuners for good reason: The tools needed to identify and eliminate performance and memory bottlenecks are the same tools that can identify object leaks and race conditions. Deadlocks are relatively easily identified; once you notice that the application is frozen, a stack trace will show you which set of threads have locked the others’ monitors. Race conditions, unfortunately, can be more elusive.

Wait Leaks

One class of race conditions, which we call a wait leak, recently came to our attention. The basic issue is that when you use the wait/notify idiom, you typically have one or more threads blocked in the wait() call, waiting for another thread to notify it that some condition has become true, so it can exit the wait() call and carry on processing. The notifying thread calls notify() or notifyAll() to signal that waiting threads can now wake up and carry on processing.

This has the obvious potential for a race condition, but it is one we haven’t seen in practice until recently. What happens if you enter the wait state to wait for a particular resource to become available, but another thread calls notify() just before you enter the wait state? The result is a thread stuck in the wait state, even though the resource is available.

Of course there are solutions to avoid this scenario– after all, it is a bug like other bugs. Clearly you want to be more careful that you check whether the resource is available, in an atomic manner, before entering the wait state. More specifically, you should check whether the resource is available while inside the synchronized block, and not enter the wait if the resource is available (which is the recommended, but potentially less scalable solution), or you could use some of the more sophisticated synchronization classes and associated techniques available as part of JDK 5.0 (see Resources).

Wait leaks are certainly a bug, but what we want to look at here is not the solution to the problem, but a way to identify the problem. In a complex application with tens or hundreds of threads, it can be difficult to spot a wait leak unless you have seen the symptoms before. Unlike a deadlock, there is no obvious telltale evidence like two threads waiting on each other’s locked monitors. Instead, there are just lots of threads sitting in Object.wait() calls, which can otherwise be quite normal for many applications.

Simulating a Wait Leak

The best way to learn how to spot wait leaks is to see one and understand what caused it. Listing 1 demonstrates a very simple wait leak. The WaitLeak class implements Runnable, and each thread sits and waits until it gets notified, then terminates. In our simulation, four WaitLeak threads are started, one every second. Another class, the WaitLeakNotifier, notifies all threads sitting in the WaitLeakwait() call, then terminates. The main method takes one parameter, the number of milliseconds to wait before the WaitLeakNotifier will notify all of the waiting threads.

Listing 1. Wait leak simulation classes

public class WaitLeak implements Runnable

{
private static Object LOCK = new Object();

public static void main(String[] args)
throws Exception
{
int WAITTIME = Integer.parseInt(args[0]);
int NUMTHREADS = 4;

(new Thread(new WaitLeakNotifier(WAITTIME))).start();
for (int i = 0; i < NUMTHREADS; i++)
{
Thread.sleep(1000);
(new Thread(new WaitLeak())).start();
}
}

public void run()
{
System.out.println("Starting thread " + Thread.currentThread());
synchronized(LOCK)
{
try{
LOCK.wait();
} catch(InterruptedException e) {}
}
System.out.println("Terminating thread " + Thread.currentThread());
}

}

class WaitLeakNotifier implements Runnable
{
long waittime;
public WaitLeakNotifier(long time)
{
waittime = time;
}

public void run()
{
long now = System.currentTimeMillis();
long diff = 0;
while( (diff = System.currentTimeMillis() - now) < waittime)
{
try {
Thread.sleep(waittime - diff);
} catch(InterruptedException e){}
}
synchronized(WaitLeak.LOCK)
{
WaitLeak.LOCK.notifyAll();
}
}
}

Engineering a race condition

Figure 1 shows three possible scenarios, run with different delays before the notification is sent.

The top pane shows our program with a relatively large delay, like 10 seconds, as shown below:

java WaitLeak 10000

This approach results in all four WaitLeak threads starting up, waiting, getting the notification after 10 seconds, and then terminating.

The second pane of Figure 1 shows our program with a delay halfway through WaitLeak startup, like 2 or 3 seconds:

java WaitLeak 2000

In this scenario, the WaitLeak threads that start earlier than the notifier thread get the notification and terminate, but the WaitLeak threads that start after the notification are sent will wait forever.

The third scenario has a very short delay time like 1 millisecond, shown below, and is illustrated in the third pane of Figure 1.

java WaitLeak 1

Figure 1. Wait leaks in action

In this case, the WaitLeakNotifier sends its notification before any threads have started. So no threads will get the notification from the WaitLeakNotifier, leaving all the threads blocked, forever, in the wait state.

Listing 2 shows a stack trace minutes after startup, taken from the second scenario. (You obtain stack traces by typing Ctrl+Break on Windows, or Ctrl+\ on Unix, or sending a kill -3 to the process on Unix.)

Listing 2. Thread stack dump for java WaitLeak 2000

"Thread-4" prio=5 tid=0x00a0eee8 nid=0xf04 in Object.wait() [2d1f000..2d1fd8c]

at java.lang.Object.wait(Native Method)
- waiting on <0x1002c780> (a java.lang.Object)
at java.lang.Object.wait(Unknown Source)
at WaitLeak.run(WaitLeak.java:25)
- locked <0x1002c780> (a java.lang.Object)
at java.lang.Thread.run(Unknown Source)

"Thread-3" prio=5 tid=0x00a0c418 nid=0xc5c in Object.wait() [2cdf000..2cdfd8c]
at java.lang.Object.wait(Native Method)
- waiting on <0x1002c780> (a java.lang.Object)
at java.lang.Object.wait(Unknown Source)
at WaitLeak.run(WaitLeak.java:25)
- locked <0x1002c780> (a java.lang.Object)
at java.lang.Thread.run(Unknown Source)

"Thread-2" prio=5 tid=0x00a0d7a0 nid=0x118c in Object.wait() [2c9f000..2c9fd8c]
at java.lang.Object.wait(Native Method)
- waiting on <0x1002c780> (a java.lang.Object)
at java.lang.Object.wait(Unknown Source)
at WaitLeak.run(WaitLeak.java:25)
- locked <0x1002c780> (a java.lang.Object)
at java.lang.Thread.run(Unknown Source)

Spotting a Wait Leak

A thread dump shows the symptoms of the wait leak, but what is important is what is missing from the thread dump — the thread that is not notifying the waiting threads. So you need to add in some extra contextual information to help identify a wait leak. Typically, there are two failure modes that might get reported — a deadlock, or gradual degradation in application responsiveness.

Let’s first consider the standard deadlock-type problem report: The application is no longer doing anything (although it might still be responsive to a user initiated event) — the application is partially or completely frozen. The symptoms of a wait leak resemble the symptoms of a garden-variety deadlock report, except there is no sign of deadlock in the stack dumps. If you see that, consider that you might have a wait leak.

The second scenario is a gradually overloaded, less responsive application. In this case, more and more threads enter the wait-leak state over time, which means that more and more threads (which are supposed to be doing something) simply sit there and do no work. Eventually, the application becomes clogged with threads waiting for a notification that will never come. Eventually, some resource will be exhausted — perhaps a depleted thread pool, or an out-of-memory error from too many threads, or just a non-responsive application as the application finally reaches the equivalent of the first kind of deadlock-type symptom. This is possibly an easier wait leak to diagnose, because you can compare stack dumps over time and see that the number of some particular Object.wait() stack (possibly using the same locks) just keeps increasing. One production example we saw had a server that gradually produced slower responses, until, near the end, 43 wait-leak stacks become 108 wait-leak stacks just a couple of minutes later, shortly after which the server no longer responded to requests.

The Final Word

Interestingly, we don’t believe there is a way of spotting wait leaks automatically, unless only wait leak threads are left (as in our simulation, but an unlikely situation in a real application). In practice, it would be very difficult in most cases to determine that the code which is supposed to call notify() will definitely never execute again for the monitors that are locked. So manual inspection may be the best we can do — and that is our aim with this article, adding another tool to your performance-tuning arsenal. If you come across a wait leak, we’re sure you will spot it sooner or later, and we hope that with the help of this article it will be sooner.

Resources

Reference object leaks” (developerWorks, August 2003) provides a step by step technique for eliminating object leaks.

• ” Question of the month: Handling OutOfMemoryError” (Java Performance Tuning, November 2003) gives more details on object leaks.

• ” More flexible, scalable locking in JDK 5.0” (developerWorks, October 2004)offers some alternatives to using wait/notify.

• The Sun 1.4 JVM includes a deadlock detector.

• Issue 93 of The Java Specialists’ Newsletter had this article on Automatically Detecting Thread Deadlocks in 5.0 .

• Read all the tips in the Eye on performance series.

• The authors’ Web site, Java Performance Tuning, offers a wealth of performance tips and suggested reading.

• Read all about performance tuning in Java Performance Tuning, 2nd Edition (O’Reilly 2003) by Jack Shirazi.

• Find hundreds of articles about every aspect of Java programming in the developerWorks Java technology zone.

• Visit the Developer Bookstore for a comprehensive listing of technical books, including hundreds of Java-related titles.

2010-05-26T11:26:20+00:00 April 28th, 2005|Java|0 Comments

About the Author:

Jack Shirazi is the Director of JavaPerformanceTuning.com and author of Java Performance Tuning (O'Reilly). In addition to his performance tuning focus, Jack also develops intelligent agent technology. Contact Jack at jack@JavaPerformanceTuning.com.

Kirk Pepperdine is the Chief Technical Officer at JavaPerformanceTuning.com and has been focused on Object technologies and performance tuning for the last 15 years. Kirk is a co-author of the book ANT Developer's Handbook. Contact Kirk at kirk@JavaPerformanceTuning.com.

Leave A Comment