Яндекс.Метрика

About

"JVM Anatomy Park" is the on-going mini-post series, where every post is slated to take 5-10 minutes to read. As such, it goes deep for only a single topic, a single test, a single benchmark, a single observation. So, the evidence and discussion here are anecdotal, not actually reviewed for errors, consistency, writing style, syntactic and semantic errors, duplicates, or consistency. Use and/or trust this at your own risk.

Aleksey Shipilёv, Performance Geek @ Red Hat OpenJDK Team
Shout out at Twitter: @shipilev
Questions, comments, suggestions: aleksey@shipilev.net

Question

Are final instance fields ever trivially treated as constants?

Theory

If you read the Java Language Specification chapters that concerns themselves with describing the base semantics of final variables, then you will discover a spooky paragraph:

A constant variable is a final variable of primitive type or type String that is initialized with a constant expression (§15.28). Whether a variable is a constant variable or not may have implications with respect to class initialization (§12.4.1), binary compatibility (§13.1, §13.4.9), and definite assignment (§16 (Definite Assignment)).

— Java Language Specification
4.12.4

Brilliant! Is this observable in practice?

Practice

Consider this code. What does it print?

import java.lang.reflect.Field;

public class ConstantValues {

    final int fieldInit = 42;
    final int instanceInit;
    final int constructor;

    {
        instanceInit = 42;
    }

    public ConstantValues() {
        constructor = 42;
    }

    static void set(ConstantValues p, String field) throws Exception {
        Field f = ConstantValues.class.getDeclaredField(field);
        f.setAccessible(true);
        f.setInt(p, 9000);
    }

    public static void main(String... args) throws Exception {
        ConstantValues p = new ConstantValues();

        set(p, "fieldInit");
        set(p, "instanceInit");
        set(p, "constructor");

        System.out.println(p.fieldInit + " " + p.instanceInit + " " + p.constructor);
    }

}

On my machine, it prints:

42 9000 9000

In other words, even though we had overwritten the "fieldInt" field, we don’t observe its new value. More confusingly, other two variables seem to be happily updated. The answer is that two other fields are blank final fields, and the first field is constant variable. If you look into the generated bytecode for the class above, then:

$ javap -c -v -p ConstantValues.class
...

final int fieldInit;
  descriptor: I
  flags: ACC_FINAL
  ConstantValue: int 42  <---- oh...

final int instanceInit;
  descriptor: I
  flags: ACC_FINAL

final int constructor;
  descriptor: I
  flags: ACC_FINAL

...
public static void main(java.lang.String...) throws java.lang.Exception;
  descriptor: ([Ljava/lang/String;)V
  flags: ACC_PUBLIC, ACC_STATIC, ACC_VARARGS
  Code:
     ...
     41: bipush        42   // <--- Oh wow, inlined fieldInit field
     43: invokevirtual #18  // StringBuilder.append
     46: ldc           #19  // String " "
     48: invokevirtual #20  // StringBuilder.append
     51: aload_1
     52: getfield      #3   // Field instanceInit:I
     55: invokevirtual #18  // StringBuilder.append
     58: ldc           #19  // String ""
     60: invokevirtual #20  // StringBuilder.append
     63: aload_1
     64: getfield      #4   // Field constructor:I
     67: invokevirtual #18  // StringBuilder.append
     70: invokevirtual #21  // StringBuilder.toString
     73: invokevirtual #22  // System.out.println

No wonder we do not see the update to "fieldInit" field: the javac itself had inlined its value at use, and there is no chance the JVM would double-back and rewrite the bytecode to reflect something else.

This optimization is handled by the bytecode compiler itself. This has obvious performance benefits: no need for complicated analysis in JIT compiler to make use of constness of constant variables. But, as always, that comes with a cost. Besides implications for binary compatibility (for example, what happens if we recompile the class with new value?), which is briefly discussed in relevant chapters of JLS, this has interesting implications on low-level benchmarking. For example, blindly trying to quantify if final modifier on instance field gives the performance improvement for real classes, we might want to measure the most trivial thing:

@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(3)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Benchmark)
public class FinalInitBench {
    // Too lazy to actually build the example class with constructor that initializes
    // final fields, like we have in production code. No worries, we shall just model
    // this with naked fields. Right?

    final int fx = 42;  // Compiler complains about initialization? Okay, put 42 right here!
          int x  = 42;

    @Benchmark
    public int testFinal() {
        return fx;
    }

    @Benchmark
    public int test() {
        return x;
    }
}

Initializing the final field with its own initializer silently introduces the effect we are not probably after! Running this example benchmark with "perfnorm" profiler right away to see the low-level performance counters, you get a spooky result: final field access is slightly better, and it produces less loads![1]

Benchmark                                  Mode  Cnt   Score    Error  Units
FinalInitBench.test                        avgt    9   1.920 ±  0.002  ns/op
FinalInitBench.test:CPI                    avgt    3   0.291 ±  0.039   #/op
FinalInitBench.test:L1-dcache-loads        avgt    3  11.136 ±  1.447   #/op
FinalInitBench.test:L1-dcache-stores       avgt    3   3.042 ±  0.327   #/op
FinalInitBench.test:cycles                 avgt    3   7.316 ±  1.272   #/op
FinalInitBench.test:instructions           avgt    3  25.178 ±  2.242   #/op

FinalInitBench.testFinal                   avgt    9   1.901 ±  0.001  ns/op
FinalInitBench.testFinal:CPI               avgt    3   0.285 ±  0.004   #/op
FinalInitBench.testFinal:L1-dcache-loads   avgt    3   9.077 ±  0.085   #/op  <--- !
FinalInitBench.testFinal:L1-dcache-stores  avgt    3   4.077 ±  0.752   #/op
FinalInitBench.testFinal:cycles            avgt    3   7.142 ±  0.071   #/op
FinalInitBench.testFinal:instructions      avgt    3  25.102 ±  0.422   #/op

This is because there is no field load in the generated code at all, and all we do is use the inlined constant from the incoming bytecode:

# test
...
1.02%    1.02%  mov    0x10(%r10),%edx ; <--- get field x
2.50%    1.79%  nop
1.79%    1.60%  callq  CONSUME
...

# testFinal
...
8.25%    8.21%  mov    $0x2a,%edx      ; <--- just use inlined "42"
1.79%    0.56%  nop
1.35%    1.19%  callq  CONSUME
...

Not a problem in itself, but that result would be different for blank final fields, which would be closer aligned with real-world usages. So, a less lazier version:

@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(3)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Benchmark)
public class FinalInitCnstrBench {
    final int fx;
    int x;

    public FinalInitCnstrBench() {
        this.fx = 42;
        this.x = 42;
    }

    @Benchmark
    public int testFinal() {
        return fx;
    }

    @Benchmark
    public int test() {
        return x;
    }
}

…​produces more sensible results, where both tests produce equal performance:[2]

Benchmark                                            Mode  Cnt   Score    Error  Units
FinalInitCnstrBench.test                             avgt    9   1.922 ±  0.003  ns/op
FinalInitCnstrBench.test:CPI                         avgt    3   0.289 ±  0.049   #/op
FinalInitCnstrBench.test:L1-dcache-loads             avgt    3  11.171 ±  1.429   #/op
FinalInitCnstrBench.test:L1-dcache-stores            avgt    3   3.042 ±  0.031   #/op
FinalInitCnstrBench.test:cycles                      avgt    3   7.301 ±  0.445   #/op
FinalInitCnstrBench.test:instructions                avgt    3  25.235 ±  1.732   #/op

FinalInitCnstrBench.testFinal                        avgt    9   1.919 ±  0.002  ns/op
FinalInitCnstrBench.testFinal:CPI                    avgt    3   0.287 ±  0.014   #/op
FinalInitCnstrBench.testFinal:L1-dcache-loads        avgt    3  11.170 ±  1.104   #/op
FinalInitCnstrBench.testFinal:L1-dcache-stores       avgt    3   3.039 ±  0.864   #/op
FinalInitCnstrBench.testFinal:cycles                 avgt    3   7.278 ±  0.394   #/op
FinalInitCnstrBench.testFinal:instructions           avgt    3  25.314 ±  0.588   #/op

Observations

The constant propagation story in Java is complicated, and there are some interesting corner cases. Constant variables that are treated specially by the bytecode compiler is one of those corner cases. It is most likely you will blow yourself up on this in low-level benchmarking, not dealing with production code that initializes fields in constructors anyway. The need for capturing and quantifying these corner cases is one of the reasons why JMH has "perfasm" and "perfnorm" profilers are there to make sense of the results.


1. Actually it produces one less load-store pair too, which is the side effect of better register allocation.
2. Really, more sensible in the way just-in-time compilers should work, which is the theme for the next post I was writing before discovering my experiments are toasted because of this pitfall