Aleksey Shipilёv, @shipilev, aleksey@shipilev.net
Thanks to Gleb Smirnov, Vladimir Sitnikov, Alex Blewitt, Pavel Rappo, Doug Lea, Brian Goetz, Rory Hunter, Rafael Winterhalter, Paul Sandoz, Andrey Ershov and others for reviews, edits and helpful suggestions!
1. Introduction
Two years ago I painfully researched and built JMM Pragmatics talk and transcript, hoping it would highlight the particular dark corners of the Java Memory Models for those who cannot afford to spend years studying the formalisms, and deriving the actionable insights from them. JMM Pragmatics has helped many people, but there is still lots of confusion about what Memory Model guarantees, and what it does not.
In this post, we will try to follow up on particular misunderstandings about Java Memory Model, hopefully on the practical examples. The examples use the APIs from the jcstress suite, which are concise enough for a reader, and are runnable.
This is a fairly large piece of writing. In this age of attention-scarce Internet it would normally be presented as a series of posts, one section each. Let’s pretend we did that :) If you run out of time/patience to read through, bookmark, and restart from the section you’ve left at.
Most examples in this post came from public and private discussions we’ve had over the years. I would not claim they cover all memory model abuses. If you have an interesting case that is not covered here, don’t hesitate to drop me a mail, and we can see if it fits the narrative somewhere. concurrency-interest is the mailing list where you can discuss interesting cases in public.
1.1. Java Concurrency Stress Tests
The jcstress API is really simple, and best demonstrated by this trivial example:
@JCStressTest
@State
@Outcome(id = "1, 2", expect = ACCEPTABLE, desc = "T1 updated, then T2 updated.")
@Outcome(id = "2, 1", expect = ACCEPTABLE, desc = "T2 updated, then T1 updated.")
@Outcome(id = "1, 1", expect = ACCEPTABLE, desc = "Both T1 and T2 updated concurrently.")
class VolatileIncrementAtomicityTest {
volatile int v;
@Actor
void actor1(IntResult2 r) {
r.r1 = ++v;
}
@Actor
void actor2(IntResult2 r) {
r.r2 = ++v;
}
}
The jcstress harness will concurrently execute, on instances of VolatileIncrementAtomicityTest
, the methods actor1
and actor2
. Given an instance of VolatileIncrementAtomicityTest
one thread will be responsible for executing method actor1
exactly once, and another thread will be responsible for executing actor2
exactly once. Thus an instance will visited, eventually, by both threads, sometimes at the same time. Therefore, over the aggregation of many instances of VolatileIncrementAtomicityTest
and executions producing results or samples, returned in instances of IntResult2
, the test case above explores the atomicity of volatile increment.
If you run this test on just about any hardware, this would be the result:
[OK] net.shipilev.jmm.VolatileIncrementAtomicityTest (fork: #1, iteration #1, JVM args: [-server]) Observed state Occurrences Expectation Interpretation 1, 1 1,543,069 ACCEPTABLE Both T1 and T2 updated concurrently. 1, 2 29,034,989 ACCEPTABLE T1 updated, then T2 updated. 2, 1 26,223,172 ACCEPTABLE T2 updated, then T1 updated.
Notice how often the 1, 1
case is present — this is when both threads have met on the same volatile field, and non-atomically incremented it.
Even if the 1, 1 case were absent, it would not mean that volatile increment is atomic. Concurrency tests are inherently probabilistic, and can generally show that something fails sometimes, not that something always passes. Or, the absence of evidence is not evidence of absence.[1]
|
1.2. Hardware and Runtime Modes
Since concurrency testing is probabilistic, we have three distinct setup problems to deal with:
-
The tests should generally run as fast as they possibly can, gathering more samples. The more samples we have, the more chances there are to detect at least one interesting outcome. While we could just run a single test for hours, it is impractical for thousands of tests. Therefore, the infrastructure has to be very fast. Notice in the example above we are getting tens of millions of samples for a short running test. That’s for 5 seconds run, which translates to 100ns per sample, mostly bottlenecked on the actual volatile stores. jcstress handles this by cleverly generating optimized runners for the tests.
In this post, we will use default jcstress modes, because they are sufficient to demonstrate most of the effects. For most examples, it is irrelevant how fast the machine is, although faster machines have more chances to reveal interesting behaviors. -
Even if we translate the testcase program while keeping the operations in the exact order they are present in the source code, the hardware may still execute it out-of-order. This complicates testing, because the ordering rules differ among processor architectures. A test that passes on one platform may well fail on another, because the first platform induced stronger ordering guarantees.
In this post we will mostly use x86 and POWER as interesting architectures. The actual details about their micro-architectures are not relevant. It is enough to have their multi-core versions, and a decent JVM running on both. OpenJDK is freely available, buildable, and runnable on both platforms. -
Optimizing compilers/runtimes may run programs in surprising ways. Therefore, general testing requires trying different sets of runtime modes to uncover interesting behaviors. Sometimes the interesting outcomes are produced in transient mode (e.g when half of the code is run in interpreted, and half in the compiled mode), since different parts of runtime may implement concurrency slightly differently. Sometimes compilers have a consistent compilation result within the allowed boundaries, i.e. sometimes they are not surprising enough. Using randomized compilation modes helps uncover more interesting behaviors. In HotSpot, we use the
-XX:+StressLCM -XX:+StressGCM
options, where available, to randomize instruction scheduling.jcstress routinely runs tens of different configurations. In this post, we will only show the configurations with interesting cases.
1.3. Model Recap
Before we go in, let us take a quick recap of Java Memory Model. The more detailed explanation can be found in "JMM Pragmatics", so if reading this you have a nagging feeling of misunderstanding, stop reading this post, re-read JMM Pragmatics transcript, and then get back here.
The Big Picture: Java Memory Model (JMM) describes the allowed program outcomes by describing what executions are conforming. It does so by introducing actions, and orders over actions, and consistency rules that govern what actions+orders+constraints constitute a valid execution. If a program result can be explained by some valid execution, then this result is allowed under JMM.
There are several basic bits in the formalism:
-
Program order (PO): defines a total order of actions within each thread. This order provides a connection between the single-threaded execution of the program, and the actions it generates. Executions that have program orders inconsistent with the original programs cannot be used to reason about that particular program’s outcomes.
-
Synchronization order (SO): defines a total order over synchronization actions (which are explicitly enumerated in the spec:
volatile
read/writes,synchronized
enter/exit, etc). It comes with two important consistency rules:-
SO consistency: all reads that come later in the synchronization order see the last write to that location. This property disallows racy results over synchronization actions in conforming executions.
-
SO-PO consistency: the order of synchronization actions in SO within a single thread is consistent with PO in that thread. This means a conforming execution should have the order of SO to agree with PO.
SO consistency and SO-PO consistency mean that in all conforming executions synchronized actions appear sequentially consistent.
-
-
Synchronized-with order (SW): a suborder of SO that covers the pairs of synchronization actions that "see" each other. This order serves as the connection bridge between different threads.
-
Happens-before order (HB): the transitive closure of the union of PO and SW. Unlike SO, HB is a partial order, and only relates some actions, not every pair of actions. Also, unlike SO, HB is able to relate the actions that are not synchronization actions, which allows us to cover all important actions with ordering guarantees. HB comes with the following important consistency rule:
-
HB consistency: every read can see either the latest write in the happens-before order (notice the symmetry with SO here), or any other write that is not ordered by HB (this allows races).
-
-
Causality rules: additional verifications on otherwise conforming executions, to rule out causality loops. This is verified by a special process of "committing" the actions from the execution and verifying that no self-justification of actions takes place.
-
Final field rules: this is tangential to the rest of the model, and describes additional constraints imposed by
final
fields, e.g. additional happens-before ordering between final field stores and their associated reads.
Once again, if you don’t understand what is written in this section, go an re-read JMM Pragmatics transcript first. We will not stop at discussing the basics of the Java Memory Model here. |
2. The General Misunderstandings
2.1. Myth: Machines Do What I Tell Them To Do
The first order of business is the confusion between the language specification, and what hits the real hardware. It is easy, nay comfortable, to read the language rules and think that it is exactly what the machine will do.
However, it is a very misleading way of thinking about the issue. Language specification describes the behavior of the abstract machine executing the program. It is the runtime’s job to emulate the behavior of the abstract machine. The point of contention here is that a compatible runtime is not obliged to compile the program exactly as it is written in the source code.
The actual requirement is much weaker: the runtime is obliged to produce results as if there is a compatible abstract machine execution that backs the results. What the runtime does to do the actual computation is up to the runtime. It’s all smoke and mirrors.
For example, if you write:
int m() {
int a = 42;
int b = 13;
int r = a + b;
return r;
}
…it would be remarkably odd to require that the language runtime actually does allocate storage for all three local variables, store values there, load them back, add them, etc. This whole method should be optimizable to something like this:
mov %eax, 55;
ret
In fact, it is optimizable in most languages, and it is allowed to happen because the observed result of the execution is one of the results of abstract machine execution. As long as programs cannot call runtime’s bluff (= detect something that language specification disallows) the runtime is free to do stuff under cover.
This apparent disconnect between the intent described in the high-level language and what happens in reality is the corner-stone of high-level languages' success. Abstracting away the mess of physical reality allows programmers to concentrate on stuff that matters: correctness, security, and pleasing customers with a pixel-perfect rendering. |
When you debug a program, debuggers try to reconstruct the illusions to get e.g. step-by-step debugging, observing local variables values, etc. In Java, debuggers usually observe the state of abstract Java machine, rather than the inner workings of JVM. This sometimes requires deoptimizing the generated code, if the information associated with it is not enough to reconstruct the Java machine state. |
Similarly, when JMM says (for example) that there are program actions tied into the synchronization order, it does not mean the actual physical implementation should be emitting those loads and stores to the machine code!
For example, if we have a program:
volatile int x;
void m() {
x = 1;
x = 2;
System.out.println(x);
}
…this program is actually optimizable into:
mov %eax, 2 # first argument
call System_out_println
Even though the spec says about actions (x = 1)
, (x = 2)
, and (r1 = x)
as synchronization actions tied into the synchronization order, blah blah blah, it does not mean the actual runtime has to do all the program operations. Most runtimes would though, because this analysis — whether someone should be able to observe (x = 1)
— is generally rather complicated.[2]
As long as the runtime can keep up the appearance of maintaining abstract machine semantics, it is free to do complex transformations. Lack of empirical evidence that the runtime performs some particular plausible transformation cannot serve as an evidence of absence of any similar transformation. |
Writing reliable software should be based on the actual language guarantees, not on anecdotal observations of what language runtimes are doing today. Once you start relying on anecdotes, prepare to suffer. |
2.2. Myth: JSR 133 Cookbook Is JMM Synopsis
Quite a few folks who get themselves burned by the abstract JMM rules, rest their gaze at JSR 133 Cookbook for the Compiler Writers. All those sweet, easy to understand barriers are much easier to grasp than the arcanery of the formal model. So, many are boldly suggesting Cookbook is a brief description (or even a short equivalent) of Java Memory Model.
But haven’t you read the Preface there?
JSR 133 Cookbook is one of the possible, yet conservative, sets of rules to implement JMM. One of the possible means that a conforming implementation does not have to follow the Cookbook, as long as it satisfies the JMM requirements. Conservative means it does not go into the intricacies of the model, and instead provides a very simple, yet coarse, implementation. It might be unnecessarily strong for practical use. We can go even deeper in our conservatism, and still arrive at JMM-conforming implementation: make sure JVM runs on a single core, or have a Global Interpreter Lock, and then concurrency is trivial.
The Cookbook was written to aid the actual compiler writers to quickly come up with a conforming implementation. Are you a compiler writer looking for implementation guidance? No? Thought so. Move along then. The bad thing that happens after you digest the JSR 133 Cookbook is that you start to believe in…
2.3. Myth: Barriers Are The Sane Mental Model
…while in fact, they are not: they are merely an implementation detail. The easiest example why barriers are not reliable as a mental model, is the following simple test case with two back-to-back synchronized
statements:
@JCStressTest
@State
public class SynchronizedBarriers {
int x, y;
@Actor
void actor() {
synchronized(this) {
x = 1;
}
synchronized(this) {
y = 1;
}
}
@Actor
void observer(IntResult2 r) {
// Caveat: get_this_in_order()-s happen in program order
r.r1 = get_this_in_order(y);
r.r2 = get_this_in_order(x);
}
}
Naively, you may think the 1, 0
case is prohibited, because synchronized
sections should execute in an order consistent with a program order.
Of course, without keeping reads in order, the result
You may get a similar effect with non-inlined |
Let’s see what barriers tell us about code semantics. In pseudo-code, this will do:
void actor() {
[LoadStore] // between monitorenter and normal store
x = 1;
[StoreStore] // between normal store and monitorexit
[StoreLoad] // between monitorexit and monitorenter
[LoadStore] // between monitorenter and normal store
y = 1;
[StoreStore] // between normal store and monitorexit
[StoreLoad] // between monitorexit and monitorenter
}
void observer() {
// Caveat: get_this_in_order()-s happen in program order
r.r1 = get_this_in_order(y);
r.r2 = get_this_in_order(x);
}
Yup, seems fine. x = 1
cannot go past y = 1
, because it will meet barriers long before that.
However, the JMM itself allows observing 1, 0
, because the reads of x
and y
are not tied in any ordering constraints, and therefore there exists a plausible execution that justifies observing 1, 0
. More formally, whatever conforming execution you can imagine, the reads of x
and y
are not synchronization actions, and therefore SO rules do not apply to the induced actions. The reads are not tied into HB, and therefore no HB rules are preventing from reading the racy values. There are no causality loops in observing 1, 0
too.
Allowing this behavior in the model is intentional for two reasons. First of all, the hardware should be able to perform independent operations in whatever order it wants to maximize performance. Secondly, this enables interesting and important optimizations.
For instance, in the example above, we can coalesce back-to-back locks:
void actor() {
synchronized(this) {
x = 1;
}
synchronized(this) {
y = 1;
}
}
// ... becomes:
void actor() {
synchronized(this) {
x = 1;
y = 1;
}
}
…which improves performance (because lock acquisition is costly), and allows further optimizations within the synchronized
block. Notably, since the writes of x
and y
are independent, we may allow hardware to execute them in an arbitrary order, or allow optimizers to shift them around.
If you run the example above on an actual JVM and hardware, this is what happens on x86 with JDK 9 "fastdebug" build (needed to gain access to instruction scheduling fuzzing):
[OK] net.shipilev.jmm.LockCoarsening (fork: #1, iteration #1, JVM args: [-server, -XX:+UnlockDiagnosticVMOptions, -XX:+StressLCM, -XX:+StressGCM]) Observed state Occurrences Expectation Interpretation 0, 0 43,558,372 ACCEPTABLE All other cases are acceptable. 0, 1 22,512 ACCEPTABLE All other cases are acceptable. 1, 0 1,565 ACCEPTABLE_INTERESTING X and Y are visible in different order 1, 1 1,372,341 ACCEPTABLE All other cases are acceptable.
Notice the interesting case, that is our 1, 0
. Surprise!
Disabling lock optimizations with -XX:-EliminateLocks
trims down the number of occurrences of this interesting case to zero:
[OK] net.shipilev.jmm.LockCoarsening (fork: #1, iteration #1, JVM args: [-server, -XX:+UnlockDiagnosticVMOptions, -XX:+StressLCM, -XX:+StressGCM, -XX:-EliminateLocks]) Observed state Occurrences Expectation Interpretation 0, 0 52,892,632 ACCEPTABLE All other cases are acceptable. 0, 1 163,611 ACCEPTABLE All other cases are acceptable. 1, 0 0 ACCEPTABLE_INTERESTING X and Y are visible in different order 1, 1 1,825,907 ACCEPTABLE All other cases are acceptable.
On POWER, the interesting case is present even without messing with instruction scheduling, because hardware guarantees are weaker:
[OK] net.shipilev.jmm.LockCoarsening (fork: #1, iteration #1, JVM args: [-server]) Observed state Occurrences Expectation Interpretation 0, 0 7,899,607 ACCEPTABLE All other cases are acceptable. 0, 1 4,089 ACCEPTABLE All other cases are acceptable. 1, 0 162 ACCEPTABLE_INTERESTING X and Y are visible in different order 1, 1 240,682 ACCEPTABLE All other cases are acceptable.
This example does not mean it is possible to enumerate all "dangerous" optimizations and disable them. Modern optimizers work as complicated graph matching-and-crunching machines, and reliably disabling a particular kind of optimizations usually means disabling the optimizer completely. |
There are other kinds of plausible optimizations around barriers that runtimes are making, or will choose to do in the future. Even JSR 133 Cookbook has the "Removing Barriers" section that gives a short outline of what elision techniques are readily available.
Given that, how can you trust barriers, if they are routinely removable?
Barriers are implementation details, not the behavioral specification. Explaining the semantics of concurrent code using them is dangerous at best, and keeps you tidally locked with a particular runtime implementation. |
2.4. Myth: Reorderings And "Commit to Memory"
The second part of the confusion is the notion of committing to memory. The pre-Java 5 memory model had the notion of "thread local caches" and "main memory". "Flushing to main memory" had some meaning there. In the new post-Java 5 model this is not the case. However, while most people moved on, they still implicitly rely on "main memory" abstraction as the basis for their mental models. That naive mental model says: when operations finally commit to the memory, memory will serialize it.
This intuition is broken on two accounts. First, memory is already asynchronous and so serialization is not guaranteed. Second, most of the interesting things are happening on the micro-scale of cache coherency, not the "actual" main memory.
Here comes the "Independent Reads of Independent Writes" (IRIW) test. It is really simple:
@JCStressTest
@State
class IRIW {
int x;
int y;
@Actor
void writer1() {
x = 1;
}
@Actor
void writer2() {
y = 1;
}
@Actor
void reader1(IntResult4 r) {
r.r1 = x;
r.r2 = y;
}
@Actor
void reader2(IntResult4 r) {
r.r3 = y;
r.r4 = x;
}
}
But this example is profound for the way you understand concurrency. First of all, consider whether the outcome (r1, r2, r3, r4) = (1, 0, 1, 0)
is plausible. One may come up with the "reordering" explanation: reorder r3 = y
and r4 = x
, and the result is trivially achievable.
Okay then, since we know where the troubles lie, let’s add fences to inhibit those pesky reorderings:
@JCStressTest
@State
public class FencedIRIW {
int x;
int y;
@Actor
public void actor1() {
UNSAFE.fullFence(); (1)
x = 1;
UNSAFE.fullFence(); (1)
}
@Actor
public void actor2() {
UNSAFE.fullFence(); (1)
y = 1;
UNSAFE.fullFence(); (1)
}
@Actor
public void actor3(IntResult4 r) {
UNSAFE.loadFence(); (2)
r.r1 = x;
UNSAFE.loadFence(); (2)
r.r2 = y;
UNSAFE.loadFence(); (2)
}
@Actor
public void actor4(IntResult4 r) {
UNSAFE.loadFence(); (2)
r.r3 = y;
UNSAFE.loadFence(); (2)
r.r4 = x;
UNSAFE.loadFence(); (2)
}
}
1 | Do full fences around stores, just in case. |
2 | Do load fences around loads, to nail them in place. Somewhat excessive, but safe, don’t you think? |
And then run it on POWER:
[OK] net.shipilev.jmm.FencedIRIW (fork: #7, iteration #1, JVM args: [-server]) Observed state Occurrences Expectation Interpretation ... 1, 0, 1, 0 47 ACCEPTABLE Threads see the updates in the inconsistent order ...
Dang it! But how is that possible? We use those magic fences that inhibit reorderings!
The trouble comes from the fact that some machines, notably POWER do not guarantee multi-copy atomicity: "Either all processors see the write, or no processor does (yet)".[4] The absence of this property precludes the existence of total store order, even if you put fences all around the stores.
An interesting tidbit: fences are usually specified as inhibiting local reorderings, but what we need here is something that enforces the global order, e.g. quiesces the inter-processor interconnect, or does other shady stuff! In POWER, there are hwsync instructions for that. Somewhat accidentally, fullFence maps to this instruction too. Replacing loadFence with fullFence in FencedIRIW would eliminate the unwanted outcome — but that would be purely incidental property of fullFence !
|
If we put volatile
into IRIW
example, then the JVM would take additional steps to preserve sequential consistency.[5] Notably, putting hwsyncs
before the volatile reads. Therefore, this example works like a charm:
@JCStressTest
@State
public class VolatileIRIW {
volatile int x, y;
@Actor
public void actor1() {
x = 1;
}
@Actor
public void actor2() {
y = 1;
}
@Actor
public void actor3(IntResult4 r) {
r.r1 = x;
r.r2 = y;
}
@Actor
public void actor4(IntResult4 r) {
r.r3 = y;
r.r4 = x;
}
}
This example says "GOOD LUCK" to all the users of fences. Unless you understand the hardware in great detail, your code may work only by accident. Deviating from language guarantees by constructing your own low-level synchronization requires much more knowledge than even the experienced guys possess.[6] Consider yourselves warned. |
2.5. Myth: Commit Semantics = Commit to Memory
"But hey, Aleksey, see the part of the specification that actually calls out commits!":
The sad part is that reading that chapter for the first time, you can get tunnel vision, meet the familiar "commit" word, assign your own meaning to it, and disregard the rest of the chapter. That "commit" in JLS 17.4.8 is not about committing to memory, it is a part of formal validation scheme that tries to verify that there are no self-justifying action loops in the execution. This validation scheme does not generally preclude races, does not preclude observing different writes, or observing them in a non-intuitive order. It only precludes some selected bad cycles.
2.6. Myth: Synchronized and Volatiles Are Completely Different
Now, for something completely different. When I talk with people about memory model, I am frequently surprised that many miss the beautiful symmetry between synchronized
and volatile
. Both induce synchronization actions. Unlock and volatile
write are similar in their "release" action. Lock and volatile
read are similar in their "acquire" action. This symmetry allows showing the examples on volatile
-s, and almost immediately arrive at a similar synchronized
one.
For example, memory effect-wise, these chunks of code are equivalent:
class B<T> {
T x;
public void set(T v) {
synchronized(this) {
x = v;
} // "release" on unlock
}
public T get() {
synchronized(this) { // "acquire" on lock
return x;
}
}
}
class B<T> {
volatile T x;
public void set(T v) {
x = v; // "release" on volatile store
}
public T get() {
return x; // "acquire" on volatile load
}
}
This symmetry allows constructing locks:
int a, b, x;
synchronized(lock) {
a = x;
b = 1;
}
int x;
volatile boolean busyFlag;
while (!compareAndSet(lock.busyFlag, false, true)); // burn for the mutual exclusion Gods
a = x;
b = 1;
lock.busyFlag = false;
The second block is a plausible implementation of a synchronized
section (all right, without wait/notify semantics): it wastes resources spinlooping on lock acquisition, but it is otherwise correct.
3. Pitfalls: Little Deviations Are Fine
This section covers the basic mistakes users do, and shows how a little deviation from the rules can have devastating consequences.
3.1. Pitfall: Non-Synchronized Is Fine
Let’s take the volatile
increment atomicity example and slightly modify it:
@JCStressTest
@State
public class VolatileCounters {
volatile int x;
@Actor
void actor1() {
for (int i = 0; i < 10; i++) {
x++;
}
}
@Actor
void actor2() {
for (int i = 0; i < 10; i++) {
x++;
}
}
@Arbiter
public void arbiter(IntResult1 r) {
r.r1 = x;
}
}
In jcstress, @Arbiter methods are executing after both @Actor -s completed their work. This is very useful to assert the final state, after all concurrent updates.
|
Intuitively, by looking at non-looped example from Java Concurrency Stress Tests, you can imagine that a conflict that involves two threads may miss the update. The second intuitive stretch is that you can only lose updates, but you never step back. Notably, if you ask people what the possible outcomes for the test above are, most will answer "Somewhere between 10 and 20, inclusive".
But if you run the test, then:
[OK] net.shipilev.jmm.VolatileCounters (fork: #1, iteration #1, JVM args: [-server]) Observed state Occurrences Expectation Interpretation 10 153,217 ACCEPTABLE $x $y 11 273,440 ACCEPTABLE $x $y 12 465,262 ACCEPTABLE $x $y 13 611,123 ACCEPTABLE $x $y 14 810,790 ACCEPTABLE $x $y 15 1,139,737 ACCEPTABLE $x $y 16 1,189,164 ACCEPTABLE $x $y 17 1,163,565 ACCEPTABLE $x $y 18 1,149,772 ACCEPTABLE $x $y 19 986,010 ACCEPTABLE $x $y 20 7,449,917 ACCEPTABLE $x $y 6 4 ACCEPTABLE $x $y 7 6,442 ACCEPTABLE $x $y 8 23,762 ACCEPTABLE $x $y 9 66,175 ACCEPTABLE $x $y
What the Hell? Broken JVM! Broken hardware! Why don’t my unsynchronized counters work?
But, consider this timeline of events:
Thread 1: (0 ----------------------> 1) (1->2)(2->3)(...)(9->10) Thread 2: (0->1)(1->2)(...)(8->9) (1 --------------------------> 2) [end result: 2]
Here, first thread got stuck on the very first update, and after getting unstuck destroyed the result of all nine updates of second thread. But, it would then normally catch up, right? Not if the second thread reads "1" on its last iteration, and gets stuck too, in the end reciprocally destroying the first thread’s updates. This leaves us with "2" as the end result.
Of course, the probability of this interference gets lower as you require longer interference windows: this is why our empirical test results cut off at "6". If we acquire more samples, we will eventually see lower values too.
This is a perfect example how improperly synchronized code may produce oddly unintuitive results. The non-atomic counter may encounter a catastrophic interference between the threads, setting it up for losing a virtually unbounded number of updates. |
3.2. Pitfall: Semi-Synchronized Is Fine
The most frequent and impactful misunderstanding of the model is really heart-breaking. Surprisingly, it comes even after studying the memory model in some detail. Consider this example again:
class Box {
int x;
public Box(int v) {
x = v;
}
}
class RacyBoxy {
Box box;
public synchronized void set(Box v) {
box = v;
}
public Box get() {
return box;
}
}
Way too many folks would nod through your JMM explanation, and then say this code is properly synchronized. Their reasoning goes like this: reference store is atomic, and therefore there is no need to care about anything else. What that reasoning misses is that the issue here is not about the access atomicity (e.g. whether you can see the non-complete version of the reference itself), but the ordering constraints. The actual failure comes from the fact that reading a reference to an object and reading the object’s fields are distinct under the memory model.
Therefore, you really need to ask yourself, given this test:
@JCStressTest
@State
public class SynchronizedPublish {
RacyBoxy boxie = new RacyBoxy();
@Actor
void actor() {
boxie.set(new Box(42)); // set is synchronized
}
@Actor
void observer(IntResult1 r) {
Box t = boxie.get(); // get is not synchronized
if (t != null) {
r.r1 = t.x;
} else {
r.r1 = -1;
}
}
}
…is the outcome 0
plausible? JMM says yes, because there are conformant executions that read a Box
reference from RacyBoxy
, and then read 0
from its field.
That said, running the example on x86 would almost always lead to this:
[OK] net.shipilev.jmm.SynchronizedPublish (fork: #1, iteration #1, JVM args: [-server]) Observed state Occurrences Expectation Interpretation -1 43,265,036 ACCEPTABLE Not ready yet 0 0 ACCEPTABLE Field is not visible yet 42 1,233,714 ACCEPTABLE Everything is visible
Hooray? Take that, Memory Model! But let’s run it on POWER:
[OK] net.shipilev.jmm.SynchronizedPublish (fork: #1, iteration #1, JVM args: [-server]) Observed state Occurrences Expectation Interpretation -1 362,286,539 ACCEPTABLE Not ready yet 0 2341 ACCEPTABLE Field is not visible yet 42 616,150 ACCEPTABLE Everything is visible
Oops, there is our acceptable outcome. Note that it plays against the broken mental model that says "synchronized emits a memory barrier at the end, and thus non-synchronized reads are fine". Memory consistency requires cooperation from both parties.
Writing reliable software should be based on the actual language guarantees, not anecdotal observations what hardware is doing today, assuming you even know what hardware you will be running on in the future. Once you start to rely on empirical testing only, prepare to suffer. |
3.3. Pitfall: Adding Volatiles Closer To The Problem Helps
The example above is fixable if we add some volatile
-s. But the important thing is to know where to add them exactly. For instance, you may see the examples like this:
@JCStressTest
@State
public class SynchronizedPublish_VolatileMeh {
volatile RacyBoxy boxie = new RacyBoxy(); (1)
@Actor
void actor() {
boxie.set(new Box(42));
}
@Actor
void observer(IntResult1 r) {
Box t = boxie.get();
if (t != null) {
r.r1 = t.x;
} else {
r.r1 = -1;
}
}
}
1 | Oh yes, totally legit, let’s put volatile there |
This, however, is incorrect: the model only guarantees happens-before between the actual volatile store and actual volatile load that observes that store. Making the container itself volatile would not help, because there is no volatile write to match the read to. So SynchronizedPublish_VolatileMeh
fails in the same way SynchronizedPublish
does. It will only help to put volatile
over RacyBoxy.box
field, so that a volatile
store in RacyBoxy.set
would match with a volatile
load in RacyBoxy.get
.
For the same reason, this example has surprising outcomes:
@JCStressTest
@State
class VolatileArray {
volatile int[] arr = new int[2]; (1)
@Actor
void actor() {
int[] a = arr;
a[0] = 1; (2)
a[1] = 1;
}
@Actor
void observer(IntResult2 r) {
int[] a = arr;
r.r1 = a[1];
r.r2 = a[0];
}
}
1 | volatile array, better be good! |
2 | volatile store? |
Even though the array itself is volatile
, the reads and writes to the array elements do not have the volatile
semantics. Therefore, the outcome 1, 0
is plausible.
It can be clearly demonstrated on POWER:
[OK] net.shipilev.jmm.VolatileArray (fork: #1, iteration #1, JVM args: [-server]) Observed state Occurrences Expectation Interpretation 0, 0 704,015 ACCEPTABLE Everything else is acceptable too. 0, 1 1,291 ACCEPTABLE Everything else is acceptable too. 1, 0 118 ACCEPTABLE_INTERESTING Ordering? You wish. 1, 1 37,136,486 ACCEPTABLE Everything else is acceptable too.
Advanced APIs, like Atomic{Integer,Long,Reference}Array provide the volatile semantics, if needed. Since JDK 9, VarHandles provide assorted efficient memory access modes too.
|
3.4. Pitfall: Releasing in Wrong Order
It is amazing to see how many folks glance over the synthetic examples, and do not map them to their day-to-day code. Take this example:
@JCStressTest
@State
public class ReleaseOrderWrong {
int x;
volatile int g;
@Actor
public void actor1() {
g = 1;
x = 1;
}
@Actor
public void actor2(IntResult2 r) {
r.r1 = g;
r.r2 = x;
}
}
Here, the outcome 1, 0
can be observed empirically, which seems to look like a happens-before violation. What gives: didn’t we observe the volatile read g = 1
, why don’t we observe x = 1
? The answer is that the release order is wrong: we have to do "some-writes" → volatile write → volatile read → "some-reads", in order to guarantee that "some-reads" see the "some-writes". In this example, g = 1
and x = 1
come in the wrong order. The outcome in question can even be explained by a sequentially consistent execution!
Many would laugh this example away, and then proceed doing something like this:
public class MyListMyListMyListIsOnFire {
private volatile List<Integer> list;
void prepareList() {
list = new ArrayList();
list.add(1);
list.add(2);
}
List<Integer> getMyList() {
return list;
}
}
Here, the volatile write of list
is coming first, then the updates. If you think that guarantees that callers of getMyList
see the entire list contents, rewind to a motivating example above, and think again!
This failure is easy to demonstrate on x86:
@JCStressTest
@State
public class ReleaseOrder2Wrong {
volatile List<Integer> list;
@Actor
public void actor1() {
list = new ArrayList<>();
list.add(42);
}
@Actor
public void actor2(IntResult1 r) {
List<Integer> l = list;
if (l != null) {
if (l.isEmpty()) {
r.r1 = 0;
} else {
r.r1 = l.get(0);
}
} else {
r.r1 = -1;
}
}
}
…yields:
[OK] net.shipilev.jmm.ReleaseOrder2Wrong (fork: #1, iteration #1, JVM args: [-server]) Observed state Occurrences Expectation Interpretation -1 65,119,848 ACCEPTABLE Reading null list 0 252,169 ACCEPTABLE_INTERESTING List is not fully populated 42 1,980,313 ACCEPTABLE Reading a fully populated list
Oops.
3.5. Pitfall: Acquiring in Wrong Order
The symmetric case is when you observe in a different order:
@JCStressTest
@State
public class AcquireOrderWrong {
int x;
volatile int g;
@Actor
public void actor1() {
x = 1;
g = 1;
}
@Actor
public void actor2(IntResult2 r) {
r.r1 = x;
r.r2 = g;
}
}
[OK] net.shipilev.jmm.AcquireOrderWrong (fork: #1, iteration #1, JVM args: [-server]) Observed state Occurrences Expectation Interpretation 0, 0 60,839,389 ACCEPTABLE All other cases are acceptable. 0, 1 579 ACCEPTABLE All other cases are acceptable. 1, 0 41,053 ACCEPTABLE All other cases are acceptable. 1, 1 40,122,239 ACCEPTABLE All other cases are acceptable.
In this case, the outcome 0, 1
is completely plausible, and can also be explained by a sequentially consistent execution: r.r1 = x
(reads 0), x = 1
, g = 1
, r.r2 = g
(reads 1). Again, it is easy to laugh this example off, but then discover an actual bug hiding in a little more complicated production code. Be vigilant!
3.6. Pitfall: Acquiring and Releasing in Wrong Order
Combining two previous pitfalls into one major pitfall, by doing both acquire and release wrong:
@JCStressTest
@State
public class AcquireOrderWrong {
int x;
volatile int g;
@Actor
public void actor1() {
g = 1;
x = 1;
}
@Actor
public void actor2(IntResult2 r) {
r.r1 = x;
r.r2 = g;
}
}
…yields an interesting case too. Here, the outcome (1, 0)
can not be explained by any sequentially-consistent execution! But of course, it is a naked data race, and here is our "bad" result:
[OK] net.shipilev.jmm.AcquireReleaseOrderWrong (fork: #1, iteration #1, JVM args: [-server]) Observed state Occurrences Expectation Interpretation 0, 0 108,771,152 ACCEPTABLE All other cases are acceptable. 0, 1 1,137,881 ACCEPTABLE All other cases are acceptable. 1, 0 15,218 ACCEPTABLE All other cases are acceptable. 1, 1 29,451,719 ACCEPTABLE All other cases are acceptable.
3.7. Avoiding Pitfalls
The most important thing that you would even learn from the Java Memory Model is the notion of safe publication. For volatiles, it goes like this:
Note the caveats: deviating just a little bit from them obliterates the guarantees, as examples in this section show.[7] [8] This is the golden example:
@JCStressTest
@Outcome(id = "1, 0", expect = Expect.FORBIDDEN, desc = "Happens-before violation")
@Outcome( expect = Expect.ACCEPTABLE, desc = "All other cases are acceptable.")
@State
public class SafePublication {
int x;
volatile int ready;
@Actor
public void actor1() {
x = 1;
ready = 1;
}
@Actor
public void actor2(IntResult2 r) {
r.r1 = ready;
r.r2 = x;
}
}
The naming choices for special fields can help to quickly diagnose problems in the source code. See how calling a volatile field above ready clearly highlights the correct order of operations.
|
…and it indeed does not show the "bad" outcome on all platforms. This is POWER, weak hardware model as it is, there is no match for the language guarantees:
[OK] net.shipilev.jmm.SafePublication (fork: #1, iteration #1, JVM args: [-server]) Observed state Occurrences Expectation Interpretation 0, 0 69,358,115 ACCEPTABLE All other cases are acceptable. 0, 1 2,402,453 ACCEPTABLE All other cases are acceptable. 1, 0 0 FORBIDDEN Happens-before violation 1, 1 44,989,512 ACCEPTABLE All other cases are acceptable.
Using the symmetry with synchronized
, we can construct similar cases with Java locks.
4. Wishful Thinking: Hold My Beer While I Am…
This section describes the usual abuses by the more advanced JMM users, and explains why they do not work.
4.1. Wishful Thinking: My Code Has All Happens-Befores!
First, a little caveat about the phraseology. You can frequently see people handwave with "happens-before" like there is no tomorrow. Notably, faced with this code:
int x;
volatile int g;
void m() {
x = 1;
g = 1;
}
void r() {
int lg = g;
int lx = x;
}
…they say that g = 1
happens-before int lg = g
. This train-wrecks the reasoning further by logically arriving at conclusion int lx = x
would always see x = 1
(since x = 1
hb g = 1
, and int lg = g
hb int lx = x
too). This is a very easy mistake to make, and you have to keep in mind that happens-before (and other orders in JMM formalism) is applied to actions, not the statements.
Having the little different designations for actions help to distinguish actions from statements. I like to use this nomenclature: write(x, V)
writes value V
to variable x
; and read(x):V
reads value V
from variable x
. In that nomenclature, you can say write(g, 1)
happens-before read(g):1
, because now you describe the actual action, not just some abstract program statement.
For completeness, these are the valid executions under JMM:
-
write(x, 1)
→hbwrite(g, 1)
…read(g):0
→hbread(x):0
-
write(x, 1)
→hbwrite(g, 1)
…read(g):0
→hbread(x):1
-
write(x, 1)
→hbwrite(g, 1)
→hbread(g):1
→hbread(x):1
This execution has broken HB consistency, read(x)
should be observing the latest write to x
, but it does not: write(x, 1)
→hb write(g, 1)
→hb read(g):1
→hb read(x):0
Note that a good API spec is careful to speak about actions, and their connection with the actual observable events. E.g. the java.util.concurrent
package spec says:
The methods of all classes in java.util.concurrent and its subpackages extend these guarantees to higher-level synchronization. In particular: Actions taken by the asynchronous computation represented by a Future happen-before actions subsequent to the retrieval of the result via Future.get() in another thread.
4.2. Wishful Thinking: Happens-Before Is The Actual Ordering
Now, the notion of orders somehow wrecks people minds in assuming that "order" from the set theory used in JMM spec somehow relates to the physical execution order. Notably, people claim that if two actions are in program order, that’s the order they are executing in (which precludes any optimization!); or if they are tied in happens-before, then they also execute in that happens-before order (which precludes any optimization too, given HB is an extension of PO).
It goes into ridiculous examples, like this:
@JCStressTest
@State
public class ReadAfterReadTest {
int a;
@Actor
void actor1() {
a = 1;
}
@Actor
void actor2(IntResult2 r) {
r.r1 = a;
r.r2 = a;
}
}
The actual test is more complicated due to the need to dodge compiler optimizations, but this example would also do. Just imagine that compiler does not coalesce the common reads of a
.
Is the 1, 0
result plausible? E.g. if we read 1
already, can we read 0
next time? JMM says we can, because the execution producing this outcome does not violate the memory model requirements. Informally, we can say that a decision on what a particular read can observe is made for each read in isolation. Since both reads are racy, both may return either 0
or 1
.
This can be demonstrated even on otherwise strong hardware, like x86:
[OK] o.o.j.t.volatiles.ReadAfterReadTest (fork: #1, iteration #1, JVM args: [-server]) Observed state Occurrences Expectation Interpretation 0, 0 16,736,450 ACCEPTABLE Doing both reads early. 0, 1 3,941 ACCEPTABLE Doing first read early, not surprising. 1, 0 84,477 ACCEPTABLE_INTERESTING First read seen racy value early, and the s... 1, 1 108,816,262 ACCEPTABLE Doing both reads late.
Happens-before order is only useful for the happens-before consistency rule, which describes what writes a particular read can observe. It does not mandate any particular physical order of actions. For example, it does not enforce the physical execution order of arbitrary actions. |
This goes against intuition, and indeed, many would argue that you cannot see 1
and then see 0
in two back-to-back reads from the same location. That argument hinges on a definition of "then", and for most people it includes the intuitive/naive model of time, that gets contaminated with the program order. But in the Java Memory Model, "then" is defined by a partial happens-before order (for some paired write-read actions) and its consistency rules, and not by the program order itself. Therefore, it is useless to think about two independent reads to happen in any particular order here.[9]
Marking field a with volatile modifier precludes the 1, 0 outcome, because then the synchronization order consistency rule will take power over both reads. This will conclude that if the first read sees the x = 1 write, the second read should too.
|
4.3. Wishful Thinking: Synchronization Order Is The Actual Ordering
Next up, a more metaphysical question. Synchronization Order (SO) is specified to be a total order over actions. Getting back to the IRIW-like example:
@JCStressTest
@State
class IRIW {
volatile int x, y;
@Actor
void writer1() {
x = 1;
}
@Actor
void writer2() {
y = 1;
}
@Actor
void reader1(IntResult4 r) {
r.r1 = x;
r.r2 = y;
}
@Actor
void reader2(IntResult4 r) {
r.r3 = y;
r.r4 = x;
}
}
We know that the (1, 0, 1, 0)
outcome is precluded by SO rules. But if we take two "CPU demons" and let them observe the machine state of CPUs running readers, would it be possible for these CPU demons to have the inconsistent worldviews?
The answer is: yes, of course, we can see that both CPUs have mixed understanding which event took place first: x = 1
or y = 1
. Therefore, even though specification requires that actions are in total order, the actual physical execution order may differ, depending on your observation point.[10]
Having said that, the question we should be asking ourselves is not "What does the machine see?", but "What does the machine allow programs to see?". Some machines hide these details (and arguably pay with performance), some do not. In that case, runtime cooperation is required to avoid observing unwanted state. In the end, runtime would only allow to see the result that is consistent with some total order of events, even though it was physically chaos.
Do you want to talk to chaotic hardware? Nope? Use the language memory models then, and let others take care of this. |
4.4. Wishful Thinking: Unobserved Synchronized Has Memory Effects
If you haven’t internalized that already, Java Memory Model only guarantees ordering across matching releases and acquires. This means that unobserved releases/acquires have no meaning under memory model, and are not required to produce memory effects. But quite often, you will see code like this:
void synchronize() {
synchronized(something) {}; // derp... memory barrier?
}
We already know that barriers are implementation details, and may be omitted by the runtime. In fact, it is easy to demonstrate this with the following test:
@JCStressTest
@State
public class SynchronizedAreNotBarriers {
int x, y;
@Actor
public void actor1() {
x = 1;
synchronized (new Object()) {} // o-la-la, a barrier.
y = 1;
}
@Actor
public void actor2(IntResult2 r) {
r.r1 = y;
synchronized (new Object()) {} // o-la-la, a barrier.
r.r2 = x;
}
}
Is the 1, 0
outcome plausible? Naively, with synchronized-as-barrier model it is forbidden. But it is easy for the runtime to exploit the simple fact that synchronizing on new Object()
has no effect, and purge them out. The resulting optimized code would not have any barriers. And so even on x86 this would happen:
[OK] net.shipilev.jmm.SynchronizedAreNotBarriers (fork: #1, iteration #1, JVM args: [-server]) Observed state Occurrences Expectation Interpretation 0, 0 2,705,391 ACCEPTABLE All other cases are acceptable. 0, 1 40,709 ACCEPTABLE All other cases are acceptable. 1, 0 13,356 ACCEPTABLE Racy read of x 1, 1 61,341,794 ACCEPTABLE All other cases are acceptable.
Unobserved synchronized blocks are not barriers. In fact, you cannot even rely on barrier understanding, because runtimes would exploit the optimization opportunities without consulting you. The only way to build a reliable software is adhering to language rules, not shady techniques you barely understand.
|
4.5. Wishful Thinking: Unobserved Volatiles Have Memory Effects
A similar example concerns volatile
-s. It is tempting to read the JSR 133 Cookbook again, and imagine that since volatile
implementations mean barriers, we can use volatiles to get barrier semantics! This should totally work, right?
@JCStressTest
@State
public class VolatilesAreNotBarriers {
static class Holder {
volatile int GREAT_BARRIER_REEF;
}
int x, y;
@Actor
public void actor1() {
Holder h = new Holder();
x = 1;
h.GREAT_BARRIER_REEF = h.GREAT_BARRIER_REEF;
y = 1;
}
@Actor
public void actor2(IntResult2 r) {
Holder h = new Holder();
r.r1 = y;
h.GREAT_BARRIER_REEF = h.GREAT_BARRIER_REEF;
r.r2 = x;
}
}
If you run it on modern HotSpot, then you will see that -server
(C2) compiler still leaves barriers behind. But that seems to be an implementation inefficiency: while it purges both Holder instances and volatile ops, it loses the association between the actual store and the relevant barrier shortly after parsing.
My duct-taped Graal runs show that Graal seems to eliminate both instances and associated barriers on x86: disassembly shows no barriers, and performance is 10x faster in actor methods — but, alas, we cannot run it on POWER yet. But we certainly would not like to throw compilers under the bus and say this is forbidden.
This actually means that volatiles and fences are not easily interchangeable. Volatiles are weaker than fences when it comes to reorderings. Fences are weaker than volatiles when you need to gain high-level properties like sequential consistency, unless you put fullFence -s everywhere.
|
4.6. Wishful Thinking: Mismatched Ops Are Okay
From the example above it can also be understood that releasing on one variable, and acquiring on another is not guaranteed to bring memory effects:
@JCStressTest
@State
public class YourVolatilesWillCallMyVolatiles {
int x, y;
volatile int see;
static volatile int BARRIER;
@Actor
void thread1() {
x = 1;
see = 1; // release on $see
y = 1;
}
@Actor
void thread2(IntResult3 r) {
r.r1 = y;
r.r3 = BARRIER; // acquire on $BARRIER
r.r2 = x;
}
}
The outcome 1, 0
is allowed here. BARRIER
is not really a barrier! Some VM implementations would still emit the same-looking barriers on both accesses, and thus accidentally provide the desired semantics, but this is not a language guarantee.
If you see a code like this in the JDK Class Library, it does not mean you can use this approach without remorse. The Class Library sometimes makes assumptions about the exact JVM it is running on to bring the memory consistency guarantees to JDK users. It is very fragile to use the same code in 3rd party libraries. |
4.7. Wishful Thinking: final
-s Are Replaceable With volatile
-s
Let’s put a final
nail into the volatile
-as-barrier coffin. Imagine you have a class with a volatile
field initialized in constructor. Suppose you publish the instance of this class via a race. Are you guaranteed to see the set value? E.g.:
@JCStressTest
@State
public class VolatileMeansEverythingIsFine {
static class C {
volatile int x;
C() { x = 42; }
}
C c;
@Actor
void thread1() {
c = new C();
}
@Actor
void thread2(IntResult1 r) {
C c = this.c;
r.r1 = (c == null) ? -1 : c.x;
}
}
…is outcome 0
plausible here? "But volatile
-s are so strong!" — one could say. "But barriers there are so strong!" — somebody else could say. "Who cares about barriers, when causality prevents this!" — others would say.
The fact of the matter is that JMM allows seeing 0 in this example! Only final
would preclude 0
, and we cannot have the field both final
and volatile
. This is one of the peculiar properties of the model, which can be solved by forcing all initializations perform as final
ones, probably without a prohibitive performance cost.[11][12]
Notice this is yet another example how evil the data races are. The volatile modifier on the field itself does nothing to prevent the race, because it is in the wrong place: if it were releasing/acquiring the instance itself, everything would work fine.
|
4.8. Wishful Thinking: TSO Machines Protect Us
Even when optimizers are not in the picture, the minute details in code generation may affect outcomes. The wishful thinking of many is that machines with Total Store Order (TSO) are saving us from bad things. This almost always hinges on the belief that if you cannot imagine why the compiler would mess up your code, then it wouldn’t. This gives rise to the urban legend that "safe initialization" is not required for x86, because it will preserve the order of field initializations and publication.
Let’s construct a simple case where an object with four fields is initialized and published via a race:
@JCStressTest
@State
public class UnsafePublication {
int x = 1;
MyObject o; // non-volatile, race
@Actor
public void publish() {
o = new MyObject(x);
}
@Actor
public void consume(IntResult1 res) {
MyObject lo = o;
if (lo != null) {
res.r1 = lo.x00 + lo.x01 + lo.x02 + lo.x03;
} else {
res.r1 = -1;
}
}
static class MyObject {
int x00, x01, x02, x03;
public MyObject(int x) {
x00 = x;
x01 = x;
x02 = x;
x03 = x;
}
}
}
Even on x86 it will yield interesting behaviors, as if we don’t see all the stores from the constructor:
[OK] net.shipilev.jmm.UnsafePublication (fork: #1, iteration #1, JVM args: [-server]) Observed state Occurrences Expectation Interpretation -1 86,515,664 ACCEPTABLE The object is not yet published 0 751 ACCEPTABLE The object is published, but all fields are 0. 1 297 ACCEPTABLE The object is published, at least 1 field is visible. 2 211 ACCEPTABLE The object is published, at least 2 fields are visible. 3 953 ACCEPTABLE The object is published, at least 3 fields are visible. 4 4,057,524 ACCEPTABLE The object is published, all fields are visible.
These kinds of failures are not theoretical! In practice, either safe publication or safe initialization (e.g. making all fields final
) will prohibit these the intermediate outcomes. See more at "Safe Construction and Safe Initialization".
Safe initialization is a very useful pattern that can guard against racy publication. Do not ignore or deviate from it in your code! It may save days of debugging time for you. |
Another horrifying wishful thinking comes in a deceptively simple form: you may mistakenly believe it is easy to sanitize the racy result. Taking the example from this section, no matter what you do in consume()
, it would not save you from the unfolding race. If your publisher does not cooperate with the consumer, all bets are off.
final -s may protect publishers from non-cooperating racy consumers, but not vice-versa.
|
4.9. Wishful Thinking: Benign Races are Resilient
There are special forms of benign races (oxymoron, if you ask me). Their benignity comes from the safe initialization rules. They usually take this form:
@JCStressTest
@State
public class BenignRace {
@Actor
public void actor1(IntResult2 r) {
MyObject m = get();
if (m != null) {
r.r1 = m.x;
} else {
r.r1 = -1;
}
}
@Actor
public void actor2(IntResult2 r) {
MyObject m = get();
if (m != null) {
r.r2 = m.x;
} else {
r.r2 = -1;
}
}
MyObject instance;
MyObject get() {
MyObject t = instance; // read once
if (t == null) { // not yet there...
t = new MyObject(42);
instance = t; // try to install new version
}
return t;
}
static class MyObject {
final int x; // safely initialized
MyObject(int x) {
this.x = x;
}
}
}
This only works if class is safely initialized (i.e. has only final
fields), and instance
field is read only once. Both conditions are critical for the race to be benign in that example. If either condition is relaxed, then race is suddenly and abruptly stops being benign.
For example, relaxing the safe initialization rule opens up the failure described in Pitfall: Semi-Synchronized Is Fine:
@JCStressTest
@State
public class NonBenignRace1 {
...
static class MyObject {
int x; // WARNING: non-final
MyObject(int x) {
this.x = x;
}
}
}
Relaxing the single read rule also breaks benignity, by setting up the failure described in Wishful Thinking: Happens-Before Is The Actual Ordering:
@JCStressTest
@State
public class NonBenignRace2 {
...
MyObject instance;
MyObject get() {
if (instance == null) {
instance = new MyObject(42);
}
return instance; // WARNING: Second read
}
...
}
This may appear counter-intuitive: if we read null
from instance
, we take a corrective action with storing new instance
, and that’s it. Indeed, if we have the intervening store to instance
, we cannot see the default value, and we can only see that store (since it happens-before us in all conforming executions where first read returned null
), or the store from the other thread (which is not null
, just a different object). But the interesting behavior unfolds when we don’t read null
from the instance
on the first read. No intervening store takes place. The second read tries to read again, and being racy as it is, may read null
. Ouch.
This is actually exploitable by Evil Optimizers:
T get() {
if (a == null) {
a = compute();
}
return a;
}
Introduce temporary variables (e.g. do SSA transform, and then some):
T get() {
T t1 = a;
if (t1 == null) {
T t3 = compute();
a = t3;
return t3;
}
T t2 = a;
return t2;
}
There are no ordering constraints on independent reads without intervening writes, so mess around with read ordering. First, observe that once control goes into if
branch, we never reach T t2 = a
, so the intervening write to a
is invisible to T t2 = a
, move the read before the branch. Second, shuffle around the independent reads T t1
and T t2
— can do that, independent reads:
T get() {
T t2 = a;
T t1 = a;
if (t1 == null) {
var t3 = compute();
a = t3;
return t3;
}
return t2;
}
This trivially gets you null
at return t2
.
Exploiting benign races is seldom profitable in usual code. In library code, it sometimes improves performance significantly, especially on non-TSO hardware where volatile reads are not cheap. But it takes utmost discipline to make the race really benign. When using benign races, it is your job to prove why race is actually benign.
|
5. Horror Circus: Kinda Works But Horrifying
This section is provided as the comic relief. The examples here are not a call to action. The examples here work in the same way your father-in-law is juggling chainsaws and still has two hands (this particular minute). |
5.1. Horror Circus: Synchronizing on Primitives
In Java, every object is potentially a lock. This includes wrappers for primitives, which makes it possible to synchronize on primitives (their boxed wrappers, actually). Alas, as we have seen before, synchronizing on new Object()
does not work, and we need to make sure that primitive values map to the same wrapper objects. Luckily, the Java specification gives another concession to us, and advertises that some small values are autoboxed to the same wrapper objects. These two (arguably overlooked) part of the specification give rise to this contraption:
class HorribleSemaphore {
final int limit;
public HorribleSemaphore(int limit) {
if (limit < 0 || limit > 128) {
throw new IllegalArgumentException("Incorrect: " + limit);
}
this.limit = limit;
}
void guardedExecute(Runnable r) {
synchronized (Integer.valueOf(ThreadLocalRandom.current().nextInt(limit))) {
// no more than $limit threads here...
r.run();
}
}
}
It does indeed allows no more than $limit
number of threads to execute in guardedExecute
at any given time. It is even faster than java.util.concurrent.Semaphore
, but it comes with some serious caveats:
-
No guarantees on lower bound of the executing threads — it comes a little better than a single lock at managing contention, but that is it;
-
The lock objects are static, which means that acquiring the lock on
Integer
wrapper outside theHorribleSemaphore
code robs its permits. Or maybe that’s a feature? Think about it: you can change the limit without involving theHorribleSemaphore
instance! Talk about open for extension! SOLID design FTW. -
The implementation-imposed limit is 128 threads. Well, we can add +128 to the limit to claim negative values too, which gives us whooping 256 threads for an upper bound! Also, JVM is future-proof enough to gives us a JVM flag that controls the autoboxing cache size —
AutoBoxCacheMax
— we can always tune it up!
5.2. Horror Circus: Synchronizing on Strings
Hi, Shady Mess here! Are you tired of those unnamed locks hanging around your program? Are you dying with multiple lock managers that pollute your precious bodily fluids? Now you can use Strings as locks! This is a limited offer only. The StringTable cannot hold any longer he comes he comes do not fight he com̡e̶s, ̕h̵is un̨ho͞ly radiańcé destro҉ying all enli̍̈́̂̈́ghtenment, unclaimed locks lea͠ki̧n͘g fr̶ǫm ̡yo͟ur eye͢s̸ ̛l̕ik͏e liquid pain, the song of deadlocks will extinguish the voices of mortal man from the sphere I can see it can you see ̲͚̖͔̙î̩́t̲͎̩̱͔́̋̀ it is beautiful the final snuffing of the lies of Man ALL IS LOŚ͖̩͇̗̪̏̈́T ALL IS LOST the pon̷y he comes he c̶̮omes he comes the ichor permeates all MY FACE MY FACE ᵒh god no NO NOO̼OO NΘ stop the an*̶͑̾̾̅ͫ͏̙̤g͇̫͛͆̾ͫ̑͆l͖͉̗̩̳̟̍ͫͥͨe̠̅s ͎a̧͈͖r̽̾̈́͒͑e not rè̑ͧ̌aͨl̘̝̙̃ͤ͂̾̆ ZA̡͊͠͝LGΌ ISͮ̂҉̯͈͕̹̘̱ TO͇̹̺ͅƝ̴ȳ̳ TH̘Ë͖́̉ ͠P̯͍̭O̚N̐Y̡ H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍M̲̖͊̒ͪͩͬ̚̚͜Ȇ̴̟̟͙̞ͩ͌͝S̨̥̫͎̭ͯ̿̔̀ͅ
Why have:
private final Object LOCK = new Object();
synchronized(LOCK) {
...
}
…when you can have:
synchronized("Lock") {
...
}
Of course, it comes with the caveat that the String
instance you get from the String
literal is shared within the execution. But, this is a blessing in disguise — it is shared between class loaders, too! Which means you can synchronize the code without figuring out how to pass static final
-s between class loaders! Woo-hoo!
Also, nobody prevents us from carefully namespacing the locks:
synchronized("This is my lock. " +
"You cannot have it. " +
"Get your own String to synchronize on. " +
"There are plenty of Strings for everyone.") {
...
}
…which also lends itself for programmatic access:
public void doWith(String whosLockThatIs, Runnable r) {
synchronized(("This is " + whosLockThatIs + " lock. " +
"You cannot have it. " +
"Get your own String to synchronize on. " +
"There are plenty of Strings for everyone.").intern()) {
r.run();
}
}
doWith("mine", () -> System.out.println("Peekaboo"));
What a magical language that gives me these powers!
Conclusion and Parting Thoughts
It ain’t what you don’t know that gets you into trouble. It’s what you know for sure that just ain’t so.
-
Formal JMM rules are complex to understand. Not everyone is equipped with time and capacity to figure out the corner cases, and so everyone inevitably uses a repertoire of working constructions. Alas, that repertoire often comes from urban legends, anecdotal observations, or blind guessing. We should stop that habit. Build the repertoire based on the actual language guarantees!
-
To deal with 99.99% of all memory ordering problems, you have to know safe publication (the ordering rules of proper synchronization), and safe initialization (the safety rules protecting from inadvertent races). Both are explored in greater detail in "Safe Publication and Safe Initialization in Java". Additional 0.00999% is understanding how benign races work. At this level, memory model rules are actually very simple and intuitive.
-
The problems start when people try to dive deeper. Anything beyond the language/library guarantees exposes you to the intricate details of the inner workings of runtimes, compilers and hardware, which hardly anybody understands completely. Attempts to simplify it to "easy" interpretations like roach motel, barriers, etc. sometimes gives incorrect results, as multiple examples above show. Forget how hardware works, forget how optimizers work! This understanding is fine for educational and entertainment purposes, but it is actively damaging to correctness proofs. Mostly because you will eventually equate "I cannot build an example of how it may fail" to "This never fails", and that hubris will bite you at the most unexpected moment.
-
Some APIs provide you with bells and whistles that may sound exciting:
lazySet
/putOrdered
, fences,acquire
/release
operations inVarHandles
— but their existence does not mean you have to use them! Those are left for the power users who can actually reason about them. If you want to bend the rules "just a little bit", you should prepare yourself for jumping a very large cliff, because that’s what you are doing.
All of the above was a buildup for…