JVM Anatomy Quark #9: JNI Critical and GC Locker

About, Disclaimers, Contacts

"JVM Anatomy Quarks" is the on-going mini-post series, where every post is describing some elementary piece of knowledge about JVM. The name underlines the fact that the single post cannot be taken in isolation, and most pieces described here are going to readily interact with each other.

The post should take about 5-10 minutes to read. As such, it goes deep for only a single topic, a single test, a single benchmark, a single observation. The evidence and discussion here might be anecdotal, not actually reviewed for errors, consistency, writing 'tyle, syntaxtic and semantically errors, duplicates, or also consistency. Use and/or trust this at your own risk.

Aleksey Shipilëv, JVM/Performance Geek
Shout out at Twitter: @shipilev; Questions, comments, suggestions: aleksey@shipilev.net

Question

How does JNI Get*Critical cooperate with GC? What is GC Locker?

Theory

If you are familiar with JNI, you know there are two sets of methods that can get you the array contents. There is Get<PrimitiveType>Array* family of methods, and then there are these fellas:

void * GetPrimitiveArrayCritical(JNIEnv *env, jarray array, jboolean *isCopy);
void ReleasePrimitiveArrayCritical(JNIEnv *env, jarray array, void *carray, jint mode);
The semantics of these two functions are very similar to the existing Get/Release*ArrayElements functions. If possible, the VM returns a pointer to the primitive array; otherwise, a copy is made. However, there are significant restrictions on how these functions can be used.

— JNI Guide
Chapter 4: JNI Functions

The benefit for these are obvious: instead of providing you with the copy of the Java array, VM may choose to return a direct pointer, thus improving performance. That obviously comes with caveats, that are listed further down:

After calling GetPrimitiveArrayCritical, the native code should not run for an extended period of time before it calls ReleasePrimitiveArrayCritical. We must treat the code inside this pair of functions as running in a "critical region." Inside a critical region, native code must not call other JNI functions, or any system call that may cause the current thread to block and wait for another Java thread. (For example, the current thread must not call read on a stream being written by another Java thread.)

These restrictions make it more likely that the native code will obtain an uncopied version of the array, even if the VM does not support pinning. For example, a VM may temporarily disable garbage collection when the native code is holding a pointer to an array obtained via GetPrimitiveArrayCritical.

— JNI Guide
Chapter 4: JNI Functions

These paragraphs are read by some as if VM is stopping GC when critical region is running.

Actually, the only strong invariant for VM to maintain is that the object that is "critically" acquired is not moved. There are different strategies the implementation can try:

Disable the GC completely while any critical object is acquired. This is by far the simplest coping strategy, because it does not affect the rest of GC. The downside is that you have to block GC for an indefinite time (basically commiting to the mercy of users "release"-ing quickly enough), which might get problematic.
Pin the object, and work around it during the collection. This is tricky to get right if collectors expect contiguous spaces to allocate in, and/or expect the collection to process the entire heap subspace. For example, if you pin the object in young generation in simple generational GC, you cannot now "ignore" what is left in young after the collection. You cannot move the object from there either, because it breaks the very invariant you want to enforce.
Pin the subspace in heap that contains the object. Again, if GC is granular to entire generations, this is getting nowhere. But if you have regionalized heap, then you can pin a single region, and avoid GC for that region alone, keeping everyone happy.

We have seen people relying on JNI Critical to disable GC temporarily, but that only works for option "a", and not every collector employs the simplistic behavior like that.

Can we see this in practice?

Experiment

As always, we can look into it by constructing the experiment that acquires the int[] array with JNI Critical, and then deliberately ignores the suggestion to release the array after we are done with it. Instead, it would allocate and retain lots of objects between the acquire and release:

public class CriticalGC {

  static final int ITERS = Integer.getInteger("iters", 100);
  static final int ARR_SIZE = Integer.getInteger("arrSize", 10_000);
  static final int WINDOW = Integer.getInteger("window", 10_000_000);

  static native void acquire(int[] arr);
  static native void release(int[] arr);

  static final Object[] window = new Object[WINDOW];

  public static void main(String... args) throws Throwable {
    System.loadLibrary("CriticalGC");

    int[] arr = new int[ARR_SIZE];

    for (int i = 0; i < ITERS; i++) {
      acquire(arr);
      System.out.println("Acquired");
      try {
        for (int c = 0; c < WINDOW; c++) {
          window[c] = new Object();
        }
      } catch (Throwable t) {
        // omit
      } finally {
        System.out.println("Releasing");
        release(arr);
      }
    }
  }
}

…and the native parts:

#include <jni.h>
#include <CriticalGC.h>

static jbyte* sink;

JNIEXPORT void JNICALL Java_CriticalGC_acquire
(JNIEnv* env, jclass klass, jintArray arr) {
   sink = (*env)->GetPrimitiveArrayCritical(env, arr, 0);
}

JNIEXPORT void JNICALL Java_CriticalGC_release
(JNIEnv* env, jclass klass, jintArray arr) {
   (*env)->ReleasePrimitiveArrayCritical(env, arr, sink, 0);
}

We need to generate the appropriate headers, compile the native parts into a library, and then make sure JVM know where to find that library. Everything is encapsulated here.

Parallel/CMS

First, obvious thing, Parallel:

$ make run-parallel
java -Djava.library.path=. -Xms4g -Xmx4g -verbose:gc -XX:+UseParallelGC CriticalGC
[0.745s][info][gc] Using Parallel
...
[29.098s][info][gc] GC(13) Pause Young (GCLocker Initiated GC) 1860M->1405M(3381M) 1651.290ms
Acquired
Releasing
[30.771s][info][gc] GC(14) Pause Young (GCLocker Initiated GC) 1863M->1408M(3381M) 1589.162ms
Acquired
Releasing
[32.567s][info][gc] GC(15) Pause Young (GCLocker Initiated GC) 1866M->1411M(3381M) 1710.092ms
Acquired
Releasing
...
1119.29user 3.71system 2:45.07elapsed 680%CPU (0avgtext+0avgdata 4782396maxresident)k
0inputs+224outputs (0major+1481912minor)pagefaults 0swaps

Notice how GC is not happening in-between "Acquired" and "Released", this the implementation detail leaking out to us. But the smoking gun is "GCLocker Initiated GC" message. GCLocker is a lock that prevents GC from running when JNI critical is acquired. See the relevant block in OpenJDK codebase:

JNI_ENTRY(void*, jni_GetPrimitiveArrayCritical(JNIEnv *env, jarray array, jboolean *isCopy))
  JNIWrapper("GetPrimitiveArrayCritical");
  GCLocker::lock_critical(thread);   // <--- acquire GCLocker!
  if (isCopy != NULL) {
    *isCopy = JNI_FALSE;
  }
  oop a = JNIHandles::resolve_non_null(array);
  ...
  void* ret = arrayOop(a)->base(type);
  return ret;
JNI_END

JNI_ENTRY(void, jni_ReleasePrimitiveArrayCritical(JNIEnv *env, jarray array, void *carray, jint mode))
  JNIWrapper("ReleasePrimitiveArrayCritical");
  ...
  // The array, carray and mode arguments are ignored
  GCLocker::unlock_critical(thread); // <--- release GCLocker!
  ...
JNI_END

If GC was attempted, JVM should see if anybody holds that lock. If anybody does, then at least for Parallel, CMS, and G1, we cannot continue with GC. When the last critical JNI operation ends with "release", then VM checks if there are pending GC blocked by GCLocker, and if there are, then it triggers GC. This yields "GCLocker Initiated GC" collection.

G1

Of course, since we are playing with fire — doing weird things in JNI critical region — it can spectacularly blow up. This is reproducible with G1:

$ make run-g1
java -Djava.library.path=. -Xms4g -Xmx4g -verbose:gc -XX:+UseG1GC CriticalGC
[0.012s][info][gc] Using G1
<HANGS>

Oops! It hangs all right. jstack will even say we are RUNNABLE, but waiting on some weird condition:

"main" #1 prio=5 os_prio=0 tid=0x00007fdeb4013800 nid=0x4fd9 waiting on condition [0x00007fdebd5e0000]
   java.lang.Thread.State: RUNNABLE
  at CriticalGC.main(CriticalGC.java:22)

The easiest way to have a clue about this to run with "fastdebug" build, which will then fail on this interesting assert:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/home/shade/trunks/jdk9-dev/hotspot/src/share/vm/gc/shared/gcLocker.cpp:96), pid=17842, tid=17843
#  assert(!JavaThread::current()->in_critical()) failed: Would deadlock
#
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x15b5934]  VMError::report_and_die(...)+0x4c4
V  [libjvm.so+0x15b644f]  VMError::report_and_die(...)+0x2f
V  [libjvm.so+0xa2d262]  report_vm_error(...)+0x112
V  [libjvm.so+0xc51ac5]  GCLocker::stall_until_clear()+0xa5
V  [libjvm.so+0xb8b6ee]  G1CollectedHeap::attempt_allocation_slow(...)+0x92e
V  [libjvm.so+0xba423d]  G1CollectedHeap::attempt_allocation(...)+0x27d
V  [libjvm.so+0xb93cef]  G1CollectedHeap::allocate_new_tlab(...)+0x6f
V  [libjvm.so+0x94bdba]  CollectedHeap::allocate_from_tlab_slow(...)+0x1fa
V  [libjvm.so+0xd47cd7]  InstanceKlass::allocate_instance(Thread*)+0xc77
V  [libjvm.so+0x13cfef0]  OptoRuntime::new_instance_C(Klass*, JavaThread*)+0x830
v  ~RuntimeStub::_new_instance_Java
J 87% c2 CriticalGC.main([Ljava/lang/String;)V (82 bytes) ...
v  ~StubRoutines::call_stub
V  [libjvm.so+0xd99938]  JavaCalls::call_helper(...)+0x858
V  [libjvm.so+0xdbe7ab]  jni_invoke_static(...) ...
V  [libjvm.so+0xdde621]  jni_CallStaticVoidMethod+0x241
C  [libjli.so+0x463c]  JavaMain+0xa8c
C  [libpthread.so.0+0x76ba]  start_thread+0xca

Looking closely at this stack trace, we can reconstruct what had happened: we tried to allocate new object, there were no TLABs to satisfy the allocations from, so we jumped to slowpath allocation trying to get new TLAB. Then we discovered no TLABs are available, tried to allocate, failed, and discovered we need to wait for GCLocker to initiate GC. Enter stall_until_clear to wait for this… but since we are the thread who holds the GCLocker, waiting here leads to deadlock. Boom.

This is within the specfication, because the test had tried to allocate things within the acquire-release block. Leaving the JNI method without the paired release was a mistake that exposed us to this. If we haven’t left, we could not allocate in acquire-release without calling JNI, thus violating the "thou shalt not call JNI functions" principle.

You can tune up the test for collectors to not to fail this way, but then you will discover that GCLocker delaying the collection means we can start the GC when there is already too low space left in the heap, which will force us into Full GC. Oops.

Shenandoah

As described in theoreticals, the regionalized collector can pin the particular region holding the object, and leave that object alone without collection until JNI Critical is released. This is what Shenandoah is doing in its current implementation.

$ make run-shenandoah
java -Djava.library.path=. -Xms4g -Xmx4g -verbose:gc -XX:+UseShenandoahGC CriticalGC
...
Releasing
Acquired
[3.325s][info][gc] GC(6) Pause Init Mark 0.287ms
[3.502s][info][gc] GC(6) Concurrent marking 3607M->3879M(4096M) 176.534ms
[3.503s][info][gc] GC(6) Pause Final Mark 3879M->1089M(4096M) 0.546ms
[3.503s][info][gc] GC(6) Concurrent evacuation  1089M->1095M(4096M) 0.390ms
[3.504s][info][gc] GC(6) Concurrent reset bitmaps 0.715ms
Releasing
Acquired
....
41.79user 0.86system 0:12.37elapsed 344%CPU (0avgtext+0avgdata 4314256maxresident)k
0inputs+1024outputs (0major+1085785minor)pagefaults 0swaps

Notice how the GC cycle started and finished while JNI Critical was acquired. Shenandoah just pinned the region holding the array, and proceeded collecting other regions like nothing happened. It can even perform the JNI Critical on object that is in the collected region, by evacuating it first, and then pinning the target region (that is obviously not in the collection set). This allows to implement JNI Critical without GCLocker, and therefore without GC stalls.

Observations

Handling JNI Critical requires assistance from VM to either disable GC with GCLocker-like mechanism, or pin the subspace containing the object, or pin the object alone. Different GCs employ different strategies to deal with JNI Critical, and side-effects visible when running with one collector — like delaying the GC cycle — may not be visible with another.

Please note that specification says: "Inside a critical region, native code must not call other JNI functions", and this is the minimal requirement. The example above underlines the fact that within the bounds of allowed specification, quality of implementation defines how bad it would be to break the specification. Some GCs may let more things slide, others may be more restrictive. If you want to be portable, adhere to the specification requirements, not implementation details.

Or, if you rely on implementation details (which is a bad idea), and you run into these problems using JNI, understand what collectors are doing, and choose the approriate GC.