About, Disclaimers, Contacts
"JVM Anatomy Quarks" is the on-going mini-post series, where every post is describing some elementary piece of knowledge about JVM. The name underlines the fact that the single post cannot be taken in isolation, and most pieces described here are going to readily interact with each other.
The post should take about 5-10 minutes to read. As such, it goes deep for only a single topic, a single test, a single benchmark, a single observation. The evidence and discussion here might be anecdotal, not actually reviewed for errors, consistency, writing 'tyle, syntaxtic and semantically errors, duplicates, or also consistency. Use and/or trust this at your own risk.
Aleksey Shipilëv, JVM/Performance Geek
Shout out at Twitter: @shipilev; Questions, comments, suggestions: aleksey@shipilev.net
Theory
If you read the Java Language Specification chapters that concerns themselves with describing the base semantics of final
variables, then you will discover a spooky paragraph:
A constant variable is a final variable of primitive type or type String that is initialized with a constant expression (§15.28). Whether a variable is a constant variable or not may have implications with respect to class initialization (§12.4.1), binary compatibility (§13.1, §13.4.9), and definite assignment (§16 (Definite Assignment)).
4.12.4
Brilliant! Is this observable in practice?
Practice
Consider this code. What does it print?
import java.lang.reflect.Field;
public class ConstantValues {
final int fieldInit = 42;
final int instanceInit;
final int constructor;
{
instanceInit = 42;
}
public ConstantValues() {
constructor = 42;
}
static void set(ConstantValues p, String field) throws Exception {
Field f = ConstantValues.class.getDeclaredField(field);
f.setAccessible(true);
f.setInt(p, 9000);
}
public static void main(String... args) throws Exception {
ConstantValues p = new ConstantValues();
set(p, "fieldInit");
set(p, "instanceInit");
set(p, "constructor");
System.out.println(p.fieldInit + " " + p.instanceInit + " " + p.constructor);
}
}
On my machine, it prints:
42 9000 9000
In other words, even though we had overwritten the "fieldInt" field, we don’t observe its new value. More confusingly, other two variables seem to be happily updated. The answer is that two other fields are blank final fields, and the first field is constant variable. If you look into the generated bytecode for the class above, then:
$ javap -c -v -p ConstantValues.class
...
final int fieldInit;
descriptor: I
flags: ACC_FINAL
ConstantValue: int 42 <---- oh...
final int instanceInit;
descriptor: I
flags: ACC_FINAL
final int constructor;
descriptor: I
flags: ACC_FINAL
...
public static void main(java.lang.String...) throws java.lang.Exception;
descriptor: ([Ljava/lang/String;)V
flags: ACC_PUBLIC, ACC_STATIC, ACC_VARARGS
Code:
...
41: bipush 42 // <--- Oh wow, inlined fieldInit field
43: invokevirtual #18 // StringBuilder.append
46: ldc #19 // String " "
48: invokevirtual #20 // StringBuilder.append
51: aload_1
52: getfield #3 // Field instanceInit:I
55: invokevirtual #18 // StringBuilder.append
58: ldc #19 // String ""
60: invokevirtual #20 // StringBuilder.append
63: aload_1
64: getfield #4 // Field constructor:I
67: invokevirtual #18 // StringBuilder.append
70: invokevirtual #21 // StringBuilder.toString
73: invokevirtual #22 // System.out.println
No wonder we do not see the update to "fieldInit" field: the javac itself had inlined its value at use, and there is no chance the JVM would double-back and rewrite the bytecode to reflect something else.
This optimization is handled by the bytecode compiler itself. This has obvious performance benefits: no need for complicated analysis in JIT compiler to make use of constness of constant variables. But, as always, that comes with a cost. Besides implications for binary compatibility (for example, what happens if we recompile the class with new value?), which is briefly discussed in relevant chapters of JLS, this has interesting implications on low-level benchmarking. For example, blindly trying to quantify if final
modifier on instance field gives the performance improvement for real classes, we might want to measure the most trivial thing:
@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(3)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Benchmark)
public class FinalInitBench {
// Too lazy to actually build the example class with constructor that initializes
// final fields, like we have in production code. No worries, we shall just model
// this with naked fields. Right?
final int fx = 42; // Compiler complains about initialization? Okay, put 42 right here!
int x = 42;
@Benchmark
public int testFinal() {
return fx;
}
@Benchmark
public int test() {
return x;
}
}
Initializing the final field with its own initializer silently introduces the effect we are not probably after! Running this example benchmark with "perfnorm" profiler right away to see the low-level performance counters, you get a spooky result: final
field access is slightly better, and it produces less loads![1]
Benchmark Mode Cnt Score Error Units
FinalInitBench.test avgt 9 1.920 ± 0.002 ns/op
FinalInitBench.test:CPI avgt 3 0.291 ± 0.039 #/op
FinalInitBench.test:L1-dcache-loads avgt 3 11.136 ± 1.447 #/op
FinalInitBench.test:L1-dcache-stores avgt 3 3.042 ± 0.327 #/op
FinalInitBench.test:cycles avgt 3 7.316 ± 1.272 #/op
FinalInitBench.test:instructions avgt 3 25.178 ± 2.242 #/op
FinalInitBench.testFinal avgt 9 1.901 ± 0.001 ns/op
FinalInitBench.testFinal:CPI avgt 3 0.285 ± 0.004 #/op
FinalInitBench.testFinal:L1-dcache-loads avgt 3 9.077 ± 0.085 #/op <--- !
FinalInitBench.testFinal:L1-dcache-stores avgt 3 4.077 ± 0.752 #/op
FinalInitBench.testFinal:cycles avgt 3 7.142 ± 0.071 #/op
FinalInitBench.testFinal:instructions avgt 3 25.102 ± 0.422 #/op
This is because there is no field load in the generated code at all, and all we do is use the inlined constant from the incoming bytecode:
# test
...
1.02% 1.02% mov 0x10(%r10),%edx ; <--- get field x
2.50% 1.79% nop
1.79% 1.60% callq CONSUME
...
# testFinal
...
8.25% 8.21% mov $0x2a,%edx ; <--- just use inlined "42"
1.79% 0.56% nop
1.35% 1.19% callq CONSUME
...
Not a problem in itself, but that result would be different for blank final fields, which would be closer aligned with real-world usages. So, a less lazier version:
@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(3)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Benchmark)
public class FinalInitCnstrBench {
final int fx;
int x;
public FinalInitCnstrBench() {
this.fx = 42;
this.x = 42;
}
@Benchmark
public int testFinal() {
return fx;
}
@Benchmark
public int test() {
return x;
}
}
…produces more sensible results, where both tests produce equal performance:[2]
Benchmark Mode Cnt Score Error Units
FinalInitCnstrBench.test avgt 9 1.922 ± 0.003 ns/op
FinalInitCnstrBench.test:CPI avgt 3 0.289 ± 0.049 #/op
FinalInitCnstrBench.test:L1-dcache-loads avgt 3 11.171 ± 1.429 #/op
FinalInitCnstrBench.test:L1-dcache-stores avgt 3 3.042 ± 0.031 #/op
FinalInitCnstrBench.test:cycles avgt 3 7.301 ± 0.445 #/op
FinalInitCnstrBench.test:instructions avgt 3 25.235 ± 1.732 #/op
FinalInitCnstrBench.testFinal avgt 9 1.919 ± 0.002 ns/op
FinalInitCnstrBench.testFinal:CPI avgt 3 0.287 ± 0.014 #/op
FinalInitCnstrBench.testFinal:L1-dcache-loads avgt 3 11.170 ± 1.104 #/op
FinalInitCnstrBench.testFinal:L1-dcache-stores avgt 3 3.039 ± 0.864 #/op
FinalInitCnstrBench.testFinal:cycles avgt 3 7.278 ± 0.394 #/op
FinalInitCnstrBench.testFinal:instructions avgt 3 25.314 ± 0.588 #/op
Observations
The constant propagation story in Java is complicated, and there are some interesting corner cases. Constant variables that are treated specially by the bytecode compiler is one of those corner cases. It is most likely you will blow yourself up on this in low-level benchmarking, not dealing with production code that initializes fields in constructors anyway. The need for capturing and quantifying these corner cases is one of the reasons why JMH has "perfasm" and "perfnorm" profilers are there to make sense of the results.