Aleksey Shipilёv, JVM/Performance Geek,
Shout out at Twitter: @shipilev
Questions, comments, suggestions: aleksey@shipilev.net
Thanks to Richard Startin, Alex Blewitt, and others for reviews, edits and helpful suggestions!
1. Introduction
It is a recurrent question how much memory does a Java object take. In the absence of accessible sizeof
operator,[1] people left to wonder about the footprint impact on their code and/or resort to urban legends and tales from the wizards. In this post, we shall try to peek inside the Java objects and see what lies beneath. Once we do this, many tricks around object footprint would become apparent, some of the runtime footprint quirks would be explained, and some low-level performance behavior would hopefully be more clear.
This post is rather long, so you might want to consider reading it in pieces. The chapters in this post should be more or less independent, and you can get back at reading them after leaving for a while. In contrast to other posts, it was not very thoroughly reviewed before posting, and it would be updated and fixed up as people read it and identify mistakes, omissions, or have more questions. Use and/or trust this at your own risk.
2. Deeper Design and Implementation Questions (DDIQ)
In some sections, you might see the sidebars with more discussion about the design/implementation questions. These are not guaranteed to answer all the questions, but they do try to answer the most frequent ones. The answers there are based on my understanding, so it might be either inaccurate, incomplete, or both. If you wonder about something related to this post, send me an email, and maybe that would yield another DDIQ sidebar. Think about this as the "audience questions".
3. Methodology Considerations
This post assumes Hotspot JVM, the default JVM in OpenJDK and its derivatives. If you don’t know which JVM you are running, you most probably running Hotspot.
3.1. Tools
To do this properly, we need tools. When we acquire the tools, it is important to understand what tools can and cannot do.
-
Heap dumps. It might be enticing to dump the Java heap and inspect it. That seems to hinge on the belief that heap dump is a low-level representation of the runtime heap. But it unfortunately is not: it is a -lie- fantasy reconstructed (by GC itself, no less) from the actual Java heap. If you look at HPROF data format, you would see how high-level it actually is: it does not talk about field offsets, it does not tell anything about the headers directly, the only consolation is having the object size there, which is also a lie. Heap dumps are great for inspecting the whole graphs of objects and their internal connectivity, but it is too coarse to inspect the objects themselves.
-
Measuring free or allocated memory via MXBeans. We can, of course, allocate multiple objects and see how much memory they took. With enough objects allocated, we can smooth out the outliers caused by TLAB allocation (and their retirement), spurious allocations in background threads, etc. This does not, however, give us any fidelity in looking into the object internals: we can only observe the apparent sizes of the objects. This is a fine way to do research, but you would need to properly formulate and test hypotheses to arrive to a sensible object model that explains every result.
-
Diagnostic JVM flags. But wait, since JVM itself is responsible for creating the objects, then surely it knows the object layout, and we "only" need to get it from there.
-XX:+PrintFieldLayout
would be our friend here. Unfortunately, that flag is only available in debug JVM versions.[2] -
Tools that poke into object internals. With some luck, taking
Class.getDeclaredFields
and asking forUnsafe.objectFieldOffset
gives you the idea where the field resides. This runs into multiple caveats: first, it hacks into most classes with Reflection, which might be prohibited; second,Unsafe.objectFieldOffset
does not formally answers the offset, but rather some "cookie" that can then be passed to otherUnsafe
methods.[3] That said, it "usually works", so unless we do critically important things, it is fine to hack in. Some tools, notably JOL, do this for us.
In this post, we shall be using JOL, as we want to see the finer structure of Java objects. For our needs, we are good with JOL-CLI bundle, available here:
$ wget https://repo.maven.apache.org/maven2/org/openjdk/jol/jol-cli/0.10/jol-cli-0.10-full.jar -O jol-cli.jar
$ java -jar jol-cli.jar
Usage: jol-cli.jar <mode> [optional arguments]*
Available modes:
internals: Show the object internals: field layout and default contents, object header
...
For object targets, we would try to use the various JDK classes themselves, where possible. This would make the whole thing easily verifiable, as you would only need the JOL CLI JAR and your favorite JDK installation to run the tests. In more complicated cases, we would go to JOL Samples that cover some of the things here. As the last resort, we would be using the example classes.
If you prefer something more hands-on, you can play with the entire collection of JOL Samples instead of reading this post ;) |
3.2. JDKs
The most ubiquitous JDK version deployed in the world is still JDK 8. Therefore, we would be using it here as well, so that findings in this post would be immediately usable. There are no substantial changes in field layout strategies up until JDK 15, which we would talk in later sections. JDK classes layout themselves might change too, so we would still try to target classes that are the same in all JDKs. Additionally, we would need both x86_32 and x86_64 binaries at some point.
It is easier for me to just use my own binaries for this purpose:
$ curl https://builds.shipilev.net/openjdk-jdk8/openjdk-jdk8-latest-linux-x86_64-release.tar.xz | tar xJf -; mv j2sdk-image jdk8-64
$ curl https://builds.shipilev.net/openjdk-jdk8/openjdk-jdk8-latest-linux-x86-release.tar.xz | tar xJf -; mv j2sdk-image jdk8-32
$ curl https://builds.shipilev.net/openjdk-jdk/openjdk-jdk-latest-linux-x86_64-release.tar.xz | tar xJf -; mv jdk jdk15-64
$ jdk8-64/bin/java -version
openjdk version "1.8.0-builds.shipilev.net-openjdk-jdk8-b51-20200410"
OpenJDK Runtime Environment (build 1.8.0-builds.shipilev.net-openjdk-jdk8-b51-20200410-b51)
OpenJDK 64-Bit Server VM (build 25.71-b51, mixed mode)
$ jdk8-32/bin/java -version
openjdk version "1.8.0-builds.shipilev.net-openjdk-jdk8-b51-20200410"
OpenJDK Runtime Environment (build 1.8.0-builds.shipilev.net-openjdk-jdk8-b51-20200410-b51)
OpenJDK Server VM (build 25.71-b51, mixed mode)
$ jdk15-64/bin/java -version
openjdk version "15-testing" 2020-09-15
OpenJDK Runtime Environment (build 15-testing+0-builds.shipilev.net-openjdk-jdk-b1214-20200410)
OpenJDK 64-Bit Server VM (build 15-testing+0-builds.shipilev.net-openjdk-jdk-b1214-20200410, mixed mode, sharing)
4. Data Types And Their Representation
We need to start with some basics. In just about every JOL "internals" run, you would see this output (it would be omitted in future invocations for brevity):
$ jdk8-64/bin/java -jar jol-cli.jar internals java.lang.Object
...
# Field sizes by type: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
# Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
It means that Java references take 4 bytes (compressed references enabled), boolean
/byte
take 1 byte, char
/short
take 2 bytes, int
/float
take 4 bytes, double
/long
take 8 bytes. They take the same space when presented as array elements.
Why does it matter? It matters because Java Language Specification does not say anything about the data representation, it only says what values those types accept. It is possible, in principle, to allocate 8 bytes for all primitives, as long as math over them follows the specification. In current Hotspot, almost all data types match their value domain exactly, except for boolean
. int
, for example, is specified to support values from -2147483648
to 2147483647
, which fits 4 byte signed representation exactly.
As said above, there is one oddity, and that is boolean
. In principle, its value domain contains only two values: true
and false
, so it can be represented with 1 bit. All boolean
fields and array elements still take 1 full byte, and that is for two reasons: Java Memory Model guarantees the absence of word tearing for invididual fields/elements, which is hard to do with 1-bit boolean fields, and field offsets are addressed as memory, that is in bytes, which makes addressing boolean
fields awkward. So, taking 1 byte per boolean
is a practical compromise here.
5. Mark Word
Moving on to the actual object structure. Let us start from the very basic example of java.lang.Object
. JOL would print this:
$ jdk8-64/java -jar jol-cli.jar internals java.lang.Object
# Running 64-bit HotSpot VM.
# Using compressed oop with 3-bit shift.
# Using compressed klass with 3-bit shift.
Instantiated the sample instance via default constructor.
java.lang.Object object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 05 00 00 00 # Mark word
4 4 (object header) 00 00 00 00 # Mark word
8 4 (object header) 00 10 00 00 # (not mark word)
12 4 (loss due to the next object alignment)
Instance size: 16 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
It shows that the first 12 bytes are the object header. Unfortunately, it does not resolve its internal structure in greater detail, so we need to dive into the Hotspot source code to figure this out. In there, you would notice the object header consists of two parts: mark word and class word. Class word carries the information about the object’s type: it links to the native structure that describes the class. We will talk about that part in the next section. The rest of the metadata is carried in the mark word.
There are several uses for the mark word:
-
Storing the metadata (forwarding and object age) for moving GCs.
-
Storing the identity hash code.
-
Storing the locking information.
Note that every single object out there has to have a mark word, because it handles the things common to every Java object. This is also why it takes the very first slot in the object internal structure: VM needs to access it very fast on the time-sensitive code paths, for example STW GC. Understanding the use cases for mark word highlights the lower boundaries for the space it takes.
5.1. Storing Forwarding Data for Moving GCs
When GCs need to move the object, they need to record the new location for the object, at least temporarily. Mark word would encode this for GC code to coordinate the relocation and update-references work. This locks mark word to be as wide as the Java reference representation. Due to the way compressed references are implemented in Hotspot, this reference is always uncompressed, so it is as wide as machine pointer.
This, in turn, defines the minimum amount of memory the mark word takes in that implementation: 4 bytes for 32-bit platforms, and 8 bytes for 64-bit platforms.
We cannot, unfortunately, show the mark words that carry GC forwardings from the Java application (and JOL is a Java application), because either we are running with stop-the-world GC and they are already gone by the time we unblock from the pause, or concurrent GC barriers prevent us from seeing the old objects.
5.2. Storing Object Ages for GCs
We can, however, demonstrate the object age bits!
$ jdk8-32/bin/java -cp jol-samples.jar org.openjdk.jol.samples.JOLSample_19_Promotion
# Running 32-bit HotSpot VM.
Fresh object is at d2d6c0f8
*** Move 1, object is at d31104a0
(object header) 09 00 00 00 (00001001 00000000 00000000 00000000)
^^^^
*** Move 2, object is at d3398028
(object header) 11 00 00 00 (00010001 00000000 00000000 00000000)
^^^^
*** Move 3, object is at d3109688
(object header) 19 00 00 00 (00011001 00000000 00000000 00000000)
^^^^
*** Move 4, object is at d43c9250
(object header) 21 00 00 00 (00100001 00000000 00000000 00000000)
^^^^
*** Move 5, object is at d41453f0
(object header) 29 00 00 00 (00101001 00000000 00000000 00000000)
^^^^
*** Move 6, object is at d6350028
(object header) 31 00 00 00 (00110001 00000000 00000000 00000000)
^^^^
*** Move 7, object is at a760b638
(object header) 31 00 00 00 (00110001 00000000 00000000 00000000)
^^^^
Notice how with every move a few bits count upwards. That the recorded object age. It curiously stops at 6
after 7 moves. This fits the default setting for InitialTenuringThreshold=7
. If you increase that, the object would experience more moves until it reaches the old generation.
5.3. Identity Hash Code
Every Java object has a hash code. When there is no user definition for it, then identity hash code would be used.[4] Since identity hash code should not change after computed for the given object, we need to store it somewhere. In Hotspot, it is stored right in the mark word of the target object. Depending on the precision that identity hash code accepts, it may require as much as 4 bytes to store. Since mark word is already at least 4 bytes long due to the reasons from the last section, the space is available.
The changes in markword caused by identity hash code can be seen clearly with the relevant JOLSample_15_IdentityHashCode. Running it with 64-bit VM:
$ jdk8-64/bin/java -cp jol-samples.jar org.openjdk.jol.samples.JOLSample_15_IdentityHashCode
# Running 64-bit HotSpot VM.
# Using compressed oop with 3-bit shift.
# Using compressed klass with 3-bit shift.
**** Fresh object
org.openjdk.jol.samples.JOLSample_15_IdentityHashCode$A object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00
4 4 (object header) 00 00 00 00
8 4 (object header) 88 55 0d 00
12 4 (loss due to the next object alignment)
Instance size: 16 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
hashCode: 5ccddd20
**** After identityHashCode()
org.openjdk.jol.samples.JOLSample_15_IdentityHashCode$A object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 20 dd cd
4 4 (object header) 5c 00 00 00
8 4 (object header) 88 55 0d 00
12 4 (loss due to the next object alignment)
Instance size: 16 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
Notice that the hash code value is 5ccddd20
. You can spot it in the object header now: 01 20 dd cd 5c
. 01
is the mark word tag, and the rest is the identity hash code written in little-endian. And we still have 3 bytes to spare! But that is possible since we have large-ish mark word. What happens if we run with 32-bit VM, where the entire mark word is just 4 bytes?
This is what happens:
$ jdk8-32/bin/java -cp jol-samples.jar org.openjdk.jol.samples.JOLSample_15_IdentityHashCode
# Running 32-bit HotSpot VM.
**** Fresh object
org.openjdk.jol.samples.JOLSample_15_IdentityHashCode$A object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00
4 4 (object header) c0 ab 6b a3
Instance size: 8 bytes
Space losses: 0 bytes internal + 0 bytes external = 0 bytes total
hashCode: 12ddf17
**** After identityHashCode()
org.openjdk.jol.samples.JOLSample_15_IdentityHashCode$A object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 81 8b ef 96
4 4 (object header) c0 ab 6b a3
Instance size: 8 bytes
Space losses: 0 bytes internal + 0 bytes external = 0 bytes total
It is obvious that object header had changed. But it takes a keen eye to see where the 12ddf17
hashcode actually is. What you see in the header is identity hashcode shifted "right by one". So, one of the bits ends up in the first byte, yielding 81
, and the rest transforms into 12ddf17 >> 1 = 96ef8b
. Notice that it reduces the domain for identity hash code from 32 bits to "just" 25 bits.
5.4. Locking Data
Java synchronization employs a sophisticated state machine. Since every Java object can be synchronized on, the locking state should be associated with any Java object. Mark word holds most of that state.
Different parts of those locking transitions could be seen in object header. For example, when a Java lock is biased towards a particular thread, we need to record the information about that lock near the relevant object. This is captured by the relevant JOLSample_13_BiasedLocking example:
$ jdk8-64/bin/java -cp jol-samples.jar org.openjdk.jol.samples.JOLSample_13_BiasedLocking
# Running 64-bit HotSpot VM.
# Using compressed oop with 3-bit shift.
# Using compressed klass with 3-bit shift.
**** Fresh object
org.openjdk.jol.samples.JOLSample_13_BiasedLocking$A object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 05 00 00 00 # No lock
4 4 (object header) 00 00 00 00
8 4 (object header) c0 07 08 00
12 4 (loss due to the next object alignment)
Instance size: 16 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
**** With the lock
org.openjdk.jol.samples.JOLSample_13_BiasedLocking$A object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 05 b0 00 80 # Biased lock
4 4 (object header) b8 7f 00 00 # Biased lock
8 4 (object header) c0 07 08 00
12 4 (loss due to the next object alignment)
Instance size: 16 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
**** After the lock
org.openjdk.jol.samples.JOLSample_13_BiasedLocking$A object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 05 b0 00 80 # Biased lock
4 4 (object header) b8 7f 00 00 # Biased lock
8 4 (object header) c0 07 08 00
12 4 (loss due to the next object alignment)
Instance size: 16 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
Note how we recorded the native pointer to the lock descriptor in the header: b0 00 80 b8 7f
. That lock is now biased towards the thread pointed to that native pointer.
Similar thing happens when we lock without the bias, see JOLSample_14_FatLocking example:
$ jdk8-64/bin/java -cp jol-samples.jar org.openjdk.jol.samples.JOLSample_14_FatLocking
# Running 64-bit HotSpot VM.
# Using compressed oop with 3-bit shift.
# Using compressed klass with 3-bit shift.
**** Fresh object
org.openjdk.jol.samples.JOLSample_14_FatLocking$A object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00 # No lock
4 4 (object header) 00 00 00 00
8 4 (object header) c0 07 08 00
12 4 (loss due to the next object alignment)
Instance size: 16 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
**** Before the lock
org.openjdk.jol.samples.JOLSample_14_FatLocking$A object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 78 19 57 1a # Lightweight lock
4 4 (object header) 85 7f 00 00
8 4 (object header) c0 07 08 00
12 4 (loss due to the next object alignment)
Instance size: 16 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
**** With the lock
org.openjdk.jol.samples.JOLSample_14_FatLocking$A object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 0a 4b 00 b4 # Heavyweight lock
4 4 (object header) 84 7f 00 00
8 4 (object header) c0 07 08 00
12 4 (loss due to the next object alignment)
Instance size: 16 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
**** After the lock
org.openjdk.jol.samples.JOLSample_14_FatLocking$A object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 0a 4b 00 b4 # Heavyweight lock
4 4 (object header) 84 7f 00 00
8 4 (object header) c0 07 08 00
12 4 (loss due to the next object alignment)
Instance size: 16 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
**** After System.gc()
org.openjdk.jol.samples.JOLSample_14_FatLocking$A object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 09 00 00 00 # Lock recycled
4 4 (object header) 00 00 00 00
8 4 (object header) c0 07 08 00
12 4 (loss due to the next object alignment)
Instance size: 16 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
Here, we see the usual lifecycle for the lock: first object has no lock recorded, then it is acquired by other thread and (lightweight) synchronization lock is installed, then main thread contends on it, inflating it, then locking information still references the inflated lock after everyone had unlocked. And finally, at some later point the lock is deflated, and object frees its association with it.
5.5. Observation: Identity Hashcode Disables Biased Locking
But what if we need to store identity hashcode while biased locking is in effect? Simple: identity hashcode takes precedence, and biased locking gets disabled for that object/class. This can be seen with the relevant example, JOLSample_26_IHC_BL_Conflict:
$ jdk8-64/bin/java -cp jol-samples.jar org.openjdk.jol.samples.JOLSample_26_IHC_BL_Conflict
# Running 64-bit HotSpot VM.
# Using compressed oop with 3-bit shift.
# Using compressed klass with 3-bit shift.
**** Fresh object
org.openjdk.jol.samples.JOLSample_26_IHC_BL_Conflict$A object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 05 00 00 00 # No lock
4 4 (object header) 00 00 00 00
8 4 (object header) f8 00 01 f8
12 4 (loss due to the next object alignment)
Instance size: 16 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
**** With the lock
org.openjdk.jol.samples.JOLSample_26_IHC_BL_Conflict$A object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 05 b0 00 20 # Biased lock
4 4 (object header) e5 7f 00 00 # Biased lock
8 4 (object header) f8 00 01 f8
12 4 (loss due to the next object alignment)
Instance size: 16 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
**** After the lock
org.openjdk.jol.samples.JOLSample_26_IHC_BL_Conflict$A object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 05 b0 00 20 # Biased lock
4 4 (object header) e5 7f 00 00 # Biased lock
8 4 (object header) f8 00 01 f8
12 4 (loss due to the next object alignment)
Instance size: 16 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
hashCode: 65ae6ba4
**** After the hashcode
org.openjdk.jol.samples.JOLSample_26_IHC_BL_Conflict$A object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 a4 6b ae # Hashcode
4 4 (object header) 65 00 00 00 # Hashcode
8 4 (object header) f8 00 01 f8
12 4 (loss due to the next object alignment)
Instance size: 16 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
**** With the second lock
org.openjdk.jol.samples.JOLSample_26_IHC_BL_Conflict$A object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 50 f9 b8 29 # Lightweight lock
4 4 (object header) e5 7f 00 00 # Lightweight lock
8 4 (object header) f8 00 01 f8
12 4 (loss due to the next object alignment)
Instance size: 16 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
**** After the second lock
org.openjdk.jol.samples.JOLSample_26_IHC_BL_Conflict$A object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 a4 6b ae # Hashcode
4 4 (object header) 65 00 00 00 # Hashcode
8 4 (object header) f8 00 01 f8
12 4 (loss due to the next object alignment)
Instance size: 16 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
In this example, biased locking works on a fresh object, but the moment we ask its hashCode
, we end up computing its identity hash code (since there is no override for Object.hashCode
), which installs the computed value in the mark word. Subsequent locks could only displace the identity hash code value temporarily, but it would be there as soon as (non-biased) locking is released. Since there is no way to store biased locking information in mark word anymore, it does not work for that object from this moment on.
5.6. Observation: 32-bit VMs Improve Footprint
Since mark word size depends on target bitness, it is conceivable that 32-bit VMs take less space per object, even without (reference) fields involved. This can be demonstrated by inspecting the plain Object
layout on 32-bit and 64-bit VMs:
$ jdk8-64/bin/java -jar jol-cli.jar internals java.lang.Object
# Running 64-bit HotSpot VM.
# Using compressed oop with 3-bit shift.
# Using compressed klass with 3-bit shift.
Instantiated the sample instance via default constructor.
java.lang.Object object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 05 00 00 00 # Mark word
4 4 (object header) 00 00 00 00 # Mark word
8 4 (object header) 00 10 00 00 # Class word (compressed)
12 4 (loss due to the next object alignment)
Instance size: 16 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
$ jdk8-32/bin/java -jar jol-cli.jar internals java.lang.Object
# Running 32-bit HotSpot VM.
Instantiated the sample instance via default constructor.
java.lang.Object object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00 # Mark word
4 4 (object header) 48 51 2b a3 # Class word
Instance size: 8 bytes
Space losses: 0 bytes internal + 0 bytes external = 0 bytes total
Here, 64-bit VM mark word takes 8 (mark word) + 4 (class word) = 12 bytes, whereas 32-bit VM takes 4 (mark word) and + 4 (class word) = 8 bytes, respectively. With object alignment by 8 bytes, these get rounded up to 16 and 8 bytes, respectively. On this small object, the space savings are 2x!
6. Class Word
From the native machine perspective, every object is just a bunch of bytes. There are cases where we want to know what is the type of the object we are dealing with at runtime. The non-exhaustive list of cases where it is needed:
-
Runtime type checks.
-
Determining the object size.
-
Figuring out the target for virtual/interface call.
Class words can also be compressed. Even though class pointers are not Java heap references, they can still enjoy similar optimization.[5]
6.1. Runtime Type Checks
Java is a type-safe language, so it needs runtime type checking on many paths. Class word carries the data about the actual type of the object we have, which allows compilers to emit runtime type checks. The efficiency of those runtime checks depend on the shape the type metadata takes.
If metadata is encoded in a simple form, compilers can even inline those checks straight in the code stream. In Hotspot, class word holds the native pointer to the VM Klass
instance that carries lots of metainformation, including the types of superclasses it extends, interfaces it implements, etc. It also carries the Java mirror, which is the associated instance of java.lang.Class
. This indirection allows treating java.lang.Class
instances as regular objects and move them without updating every single class word during the GC: java.lang.Class
can move, while Klass
stays at the same location all the time.
6.2. Determining The Object Size
Determining the object size takes the similar route. In contrast to the runtime type checks that do not know the type of the object statically all the time, allocation does know the size of the allocating object more or less precisely: it is defined by the type of constructor used, array initializer used, etc. So, in those cases, reaching through the classword is not needed.
But there are cases in the native code (most notably, garbage collectors) that want to walk the parsable heap with code like:
HeapWord* cur = heap_start;
while (cur < heap_used) {
object o = (object)cur;
do_object(o);
cur = cur + o->size();
}
For that to work, native code needs to know what the size of current (untyped!) object is, and hopefully know it fast. So, for native code, it does very much matter how class metadata is arranged. In Hotspot, we can reach through the class word to the layout helper, that would give us information about object sizes.
6.3. Figuring Out The Target Of Virtual/Interface Call
When runtime needs to invoke the virtual/interface method on the object instance, it needs to determine where the target method is. While most of the time that can be optimized, there are cases where we need to do the actual dispatch. The performance of that dispatch also depends on how far away the class metadata is, so this cannot be neglected.
6.4. Observation: Compressed References Affect Object Header Footprint
Similarly to the observation about the mark word sizes depending on JVM bitness, we can also expect that compressed reference mode affects object sizes, even without reference fields involved. To demonstrate that, let’s take java.lang.Integer
on two heap sizes, small (1 GB) and large (64 GB). These heap sizes would have compressed references turned on and off by default, respectively. This would mean compressed class pointers are also on or off by default.
$ jdk8-64/bin/java -Xmx1g -jar jol-cli.jar internals java.lang.Integer
# Running 64-bit HotSpot VM.
# Using compressed oop with 3-bit shift.
# Using compressed klass with 3-bit shift.
Instantiated the sample instance via public java.lang.Integer(int)
java.lang.Integer object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00 # Mark word
4 4 (object header) 00 00 00 00 # Mark word
8 4 (object header) de 21 00 20 # Class word
12 4 int Integer.value 0
Instance size: 16 bytes
Space losses: 0 bytes internal + 0 bytes external = 0 bytes total
$ jdk8-64/bin/java -Xmx64g -jar jol-cli.jar internals java.lang.Integer
# Running 64-bit HotSpot VM.
Instantiated the sample instance via public java.lang.Integer(int)
java.lang.Integer object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00 # Mark word
4 4 (object header) 00 00 00 00 # Mark word
8 4 (object header) 40 69 25 ad # Class word
12 4 (object header) e5 7f 00 00 # (uncompressed)
16 4 int Integer.value 0
20 4 (loss due to the next object alignment)
Instance size: 24 bytes # AHHHHHHH....
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
Here, in VM with 1 GB heap, object header takes 8 (mark word) + 4 (class word) = 12 bytes, whereas 64G VM header takes 8 (mark word) and + 8 (class word) = 16 bytes, respectively. If there were no fields, both would round up to 16 bytes due to object alignment by 8. But, since there is an int
field, in 64 GB case, we need to allocate it past 16 bytes, and thus need another 8 bytes, taking 24 bytes in total.
7. Header: Array Length
Arrays come with another little piece of metadata: array length. Since the object type only encodes the array element type, we need to store the array length somewhere else.
This can be seen with the relevant JOLSample_25_ArrayAlignment:
$ jdk8-64/bin/java -cp jol-samples.jar org.openjdk.jol.samples.JOLSample_25_ArrayAlignment
# Running 64-bit HotSpot VM.
# Using compressed oop with 3-bit shift.
# Using compressed klass with 3-bit shift.
[J object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00 # Mark word
4 4 (object header) 00 00 00 00 # Mark word
8 4 (object header) d8 0c 00 00 # Class word
12 4 (object header) 00 00 00 00 # Array length
16 0 long [J.<elements> N/A
Instance size: 16 bytes
Space losses: 0 bytes internal + 0 bytes external = 0 bytes total
...
[B object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00 # Mark word
4 4 (object header) 00 00 00 00 # Mark word
8 4 (object header) 68 07 00 00 # Class word
12 4 (object header) 00 00 00 00 # Array length
16 0 byte [B.<elements> N/A
Instance size: 16 bytes
Space losses: 0 bytes internal + 0 bytes external = 0 bytes total
[B object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00 # Mark word
4 4 (object header) 00 00 00 00 # Mark word
8 4 (object header) 68 07 00 00 # Class word
12 4 (object header) 01 00 00 00 # Array length
16 1 byte [B.<elements> N/A
17 7 (loss due to the next object alignment)
Instance size: 24 bytes
Space losses: 0 bytes internal + 7 bytes external = 7 bytes total
[B object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00 # Mark word
4 4 (object header) 00 00 00 00 # Mark word
8 4 (object header) 68 07 00 00 # Class word
12 4 (object header) 02 00 00 00 # Array length
16 2 byte [B.<elements> N/A
18 6 (loss due to the next object alignment)
Instance size: 24 bytes
Space losses: 0 bytes internal + 6 bytes external = 6 bytes total
[B object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00 # Mark word
4 4 (object header) 00 00 00 00 # Mark word
8 4 (object header) 68 07 00 00 # Class word
12 4 (object header) 03 00 00 00 # Array length
16 3 byte [B.<elements> N/A
19 5 (loss due to the next object alignment)
Instance size: 24 bytes
Space losses: 0 bytes internal + 5 bytes external = 5 bytes total
...
[B object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00 # Mark word
4 4 (object header) 00 00 00 00 # Mark word
8 4 (object header) 68 07 00 00 # Class word
12 4 (object header) 08 00 00 00 # Array length
16 8 byte [B.<elements> N/A
Instance size: 24 bytes
Space losses: 0 bytes internal + 0 bytes external = 0 bytes total
There is a slot at +12 that carries the array length. As we allocate the byte[]
arrays of 0..8 elements, that slot keeps changing. Carrying the arraylength with the array instance helps to calculate its actual size for object walkers (as we seen in previous section for regular objects), and also do efficient range checks that have the array length very close by.
7.1. Observation: Array Base Is Aligned
The example above glossed over the important quirk in array layout, hidden by lucky alignments in default 64-bit mode. If we run with large heap (or disable compressed references explicitly) to disturb that alignment:
$ jdk8-64/bin/java -Xmx64g -cp jol-samples.jar org.openjdk.jol.samples.JOLSample_25_ArrayAlignment
# Running 64-bit HotSpot VM.
[J object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00 # Mark word
4 4 (object header) 00 00 00 00 # Mark word
8 4 (object header) d8 8c b0 a4 # Class word
12 4 (object header) 98 7f 00 00 # Class word
16 4 (object header) 00 00 00 00 # Array length
20 4 (alignment/padding gap)
24 0 long [J.<elements> N/A
Instance size: 24 bytes
Space losses: 4 bytes internal + 0 bytes external = 4 bytes total
...
[B object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00 # Mark word
4 4 (object header) 00 00 00 00 # Mark word
8 4 (object header) 68 87 b0 a4 # Class word
12 4 (object header) 98 7f 00 00 # Class word
16 4 (object header) 05 00 00 00 # Array length
20 4 (alignment/padding gap)
24 5 byte [B.<elements> N/A
29 3 (loss due to the next object alignment)
Instance size: 32 bytes
Space losses: 4 bytes internal + 3 bytes external = 7 bytes total
...
…or run with 32-bit binaries:
$ jdk8-32/bin/java -cp jol-samples.jar org.openjdk.jol.samples.JOLSample_25_ArrayAlignment
# Running 32-bit HotSpot VM.
[J object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00 # Mark word
4 4 (object header) 88 47 1b a3 # Class word
8 4 (object header) 00 00 00 00 # Array length
12 4 (alignment/padding gap)
16 0 long [J.<elements> N/A
Instance size: 16 bytes
Space losses: 4 bytes internal + 0 bytes external = 4 bytes total
[B object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00 # Mark word
4 4 (object header) 58 44 1b a3 # Class word
8 4 (object header) 05 00 00 00 # Array length
12 5 byte [B.<elements> N/A
17 7 (loss due to the next object alignment)
Instance size: 24 bytes
Space losses: 0 bytes internal + 7 bytes external = 7 bytes total
The array base is aligned by the machine word size, due to implementation quirk. Arrays with elements larger than machine word size are also aligned more aggresively, we would see more when talking about field alignments. All this means that arrays might take more space than we naively think.
8. Object Alignment
Up to this point, we have glossed over the actual need for object alignment, taking the alignment by 8 bytes as granted. Why is it 8 bytes?
There are several considerations that make this alignment practical.
First, we sometimes need to atomically update the mark word, which puts constraints at what addresses mark words can reside. For 8-byte mark word that needs the full update — for example, installing the forwarding pointer — the word needs to be aligned by 8. Since mark word is the first slot in the object, the entire object needs to be aligned by 8.
Second, the same thing applies to atomic accesses to volatile longs/doubles, which have to be read and written indivisibly. Even without the volatile modifier, we will have to accept the possibility of atomic access with the use-site volatility, e.g. via VarHandles
. Therefore, we are better off accepting that every field has to be naturally aligned. If we align the object externally by 8, then aligning the fields internally by 8/4/2 bytes would not break the absolute aligment.
Alignment by 8 bytes is not always a waste, though, as it enables compressed references beyond 4 GB heap. Alignment by 4 bytes would allow "only" 16 GB heap with compressed references, compared to 32 GB allowed by alignment by 8 bytes. In fact, some are increasing the object alignment to 16 bytes, in order to stretch the area where compressed references work.
In Hotspot, the alignment is technically the part of the object itself: if we round up all object sizes to 8, then we would naturally present the alignment shadow at the end of some objects. Allocating the object that is a multiple of 8 bytes in size does not break alignment, so if we start allocating from the right base (and we do), all objects are guaranteed to be aligned.
Let’s take for example java.util.ArrayList
:
$ jdk8-64/bin/java -Xmx1g -jar jol-cli.jar internals java.util.ArrayList
# Running 64-bit HotSpot VM.
# Using compressed oop with 3-bit shift.
# Using compressed klass with 3-bit shift.
# Objects are 8 bytes aligned.
Instantiated the sample instance via default constructor.
java.util.ArrayList object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00
4 4 (object header) 00 00 00 00
8 4 (object header) 46 2e 00 20
12 4 int AbstractList.modCount 0
16 4 int ArrayList.size 0
20 4 java.lang.Object[] ArrayList.elementData []
Instance size: 24 bytes
Space losses: 0 bytes internal + 0 bytes external = 0 bytes total
…and the same thing with -XX:ObjectAlignmentInBytes=16
:
$ jdk8-64/bin/java -Xmx1g -XX:ObjectAlignmentInBytes=16 -jar jol-cli.jar internals java.util.ArrayList
# Running 64-bit HotSpot VM.
# Using compressed oop with 4-bit shift.
# Using compressed klass with 4-bit shift.
# Objects are 16 bytes aligned.
Instantiated the sample instance via default constructor.
java.util.ArrayList object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00
4 4 (object header) 00 00 00 00
8 4 (object header) 93 2e 00 20
12 4 int AbstractList.modCount 0
16 4 int ArrayList.size 0
20 4 java.lang.Object[] ArrayList.elementData []
24 8 (loss due to the next object alignment)
Instance size: 32 bytes
With 8-byte alignment, ArrayList
takes exactly 24 bytes, since it is the multiple of 8. With 16-byte alignment, we get the alignment shadow: 8 bytes are lost at the end of the object to maintain the alignment for the next one.
8.1. Observation: Hiding Fields in Alignment Shadow
This observation immediately leads to one tangible observation: if there is an alignment shadow in the object, we can hide new fields there, without increasing the apparent size of the object!
Compare the example of java.lang.Object
:
$ jdk8-64/bin/java -jar jol-cli.jar internals java.lang.Object
# Running 64-bit HotSpot VM.
# Using compressed oop with 3-bit shift.
# Using compressed klass with 3-bit shift.
# Objects are 8 bytes aligned.
Instantiated the sample instance via default constructor.
java.lang.Object object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00
4 4 (object header) 00 00 00 00
8 4 (object header) a8 0e 00 00
12 4 (loss due to the next object alignment)
Instance size: 16 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
…and java.lang.Integer
:
$ jdk8-64/bin/java -jar jol-cli.jar internals java.lang.Integer
# Running 64-bit HotSpot VM.
# Using compressed oop with 3-bit shift.
# Using compressed klass with 3-bit shift.
# Objects are 8 bytes aligned.
Instantiated the sample instance via public java.lang.Integer(int)
java.lang.Integer object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00
4 4 (object header) 00 00 00 00
8 4 (object header) f0 0e 01 00
12 4 int Integer.value 0
Instance size: 16 bytes
Space losses: 0 bytes internal + 0 bytes external = 0 bytes total
Object
had the alignment shadow of 4 bytes, which Integer.value
field gladly took. In the end, the size of Object
and Integer
ended up being the same in that VM configuration.
8.2. Observation: Blowing Up Instance Sizes by Adding Small Fields
There is the opposite caveat to this story. Suppose we have the object that has zero-length alignment shadow:
public class A {
int a1;
}
$ jdk8-64/bin/java -jar jol-cli.jar internals -cp . A
# Running 64-bit HotSpot VM.
# Using compressed oop with 3-bit shift.
# Using compressed klass with 3-bit shift.
# Objects are 8 bytes aligned.
Instantiated the sample instance via default constructor.
A object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00
4 4 (object header) 00 00 00 00
8 4 (object header) 28 b8 0f 00
12 4 int A.a1 0
Instance size: 16 bytes
Space losses: 0 bytes internal + 0 bytes external = 0 bytes total
What happens if we add a boolean
field to it?
public class B {
int b1;
boolean b2; // takes 1 byte, right?
}
$ jdk8-64/bin/java -jar jol-cli.jar internals -cp . B
# Running 64-bit HotSpot VM.
# Using compressed oop with 3-bit shift.
# Using compressed klass with 3-bit shift.
# Objects are 8 bytes aligned.
Instantiated the sample instance via default constructor.
B object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00
4 4 (object header) 00 00 00 00
8 4 (object header) 28 b8 0f 00
12 4 int B.b1 0
16 1 boolean B.b2 false
17 7 (loss due to the next object alignment)
Instance size: 24 bytes
Space losses: 0 bytes internal + 7 bytes external = 7 bytes total
Here, we only needed one lousy byte to allocate the field, but since we need the satisfy alignment requirements for objects themselves, we ended up adding the entire slab of 8 bytes! There is a small consolation that fitting more fields in the rest of 7 bytes of the shadow would not increase the apparent object size.
9. Field Alignments
We have touched on this topic in the previous section when we were talking about the object alignments.
Many architectures dislike unaligned accesses, with different levels of animosity. On many, unaligned accesses carry a performance penalty. On some, unaligned access raises the machine exception. Then Java Memory Model comes in and requires atomic accesses to fields and array elements, at very least when those fields are volatile
.
This forces most implementations to align fields to their natural alignment. The object alignment by 8 bytes guarantees the offset 0 is aligned by 8 bytes, the largest natural alignment across all types we have. So, we "only" need to lay out the fields within the object with their natural alignment. This can be clearly seen with java.lang.Long
:
$ jdk8-64/bin/java -jar jol-cli.jar internals java.lang.Long
# Running 64-bit HotSpot VM.
# Using compressed oop with 3-bit shift.
# Using compressed klass with 3-bit shift.
# Objects are 8 bytes aligned.
Instantiated the sample instance via public java.lang.Long(long)
java.lang.Long object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00
4 4 (object header) 00 00 00 00
8 4 (object header) 18 11 01 00
12 4 (alignment/padding gap)
16 8 long Long.value 0
Instance size: 24 bytes
Space losses: 4 bytes internal + 0 bytes external = 4 bytes total
Here, long value
was placed at +16, because it would make the field aligned by 8. Note there is a gap before the field!
9.1. Observation: Hiding Fields in Field Alignment Gaps
Foreshadowing the discussion about field packing a bit: the existence of these field alignment gaps allows us to hide fields there. For example, adding another int
field to a long
-bearing class:
public class LongIntCarrier {
long value;
int somethingElse;
}
…would end up laid out like this:
$ jdk8-64/bin/java -jar jol-cli.jar internals -cp . LongIntCarrier
# Running 64-bit HotSpot VM.
# Using compressed oop with 3-bit shift.
# Using compressed klass with 3-bit shift.
# Objects are 8 bytes aligned.
Instantiated the sample instance via default constructor.
LongIntCarrier object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00
4 4 (object header) 00 00 00 00
8 4 (object header) 28 b8 0f 00
12 4 int LongIntCarrier.somethingElse 0
16 8 long LongIntCarrier.value 0
Instance size: 24 bytes
Space losses: 0 bytes internal + 0 bytes external = 0 bytes total
Compare with the java.lang.Long
layout: they take the same total instance space, and that is because new int
field took the alignment gap for it.
10. Field Packing
When multiple fields are present, a new task appears: how to distribute fields around the object? This is where field layouter comes in. Its job is to make sure every field is allocated at its natural alignment, and hopefully the object is as densely packed as possible. How exactly that one is achieved is heavily implementation-dependent. For all we know, the field "packer" can just put all fields in their declaration order, padding each field for its natural alignment. It would waste a lot of memory, though.
Consider this class:
public class FieldPacking {
boolean b;
long l;
char c;
int i;
}
The naive field packer would do this:
$ <32-bit simulation>
FieldPacking object internals:
OFFSET SIZE TYPE DESCRIPTION
0 4 (object header)
4 4 (object header)
8 1 boolean FieldPacking.b
9 7 (alignment/padding gap)
16 8 long FieldPacking.l
24 2 char FieldPacking.c
26 2 (alignment/padding gap)
28 4 int FieldPacking.i
Instance size: 32 bytes
…while a smarter one would do:
$ jdk8-32/bin/java -jar jol-cli.jar internals -cp . FieldPacking
# Running 32-bit HotSpot VM.
# Objects are 8 bytes aligned.
Instantiated the sample instance via default constructor.
FieldPacking object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00
4 4 (object header) 68 91 6f a3
8 8 long FieldPacking.l 0
16 4 int FieldPacking.i 0
20 2 char FieldPacking.c
22 1 boolean FieldPacking.b false
23 1 (loss due to the next object alignment)
Instance size: 24 bytes
Space losses: 0 bytes internal + 1 bytes external = 1 bytes total
…thus saving 8 bytes per object instance.
Anyhow, we can derive two immediate observations from this.
10.1. Observation: Field Declaration Order != Field Layout Order
First of all, given the field declaration order:
public class FieldOrder {
boolean firstField;
long secondField;
char thirdField;
int fourthField;
}
…we are not guaranteed to get the same order in memory. Field packer would routinely rearrange fields to minimize footprint:
$ jdk8-64/bin/java -jar jol-cli.jar internals -cp . FieldOrder
# Running 64-bit HotSpot VM.
# Using compressed oop with 3-bit shift.
# Using compressed klass with 3-bit shift.
# Objects are 8 bytes aligned.
Instantiated the sample instance via default constructor.
FieldOrder object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00
4 4 (object header) 00 00 00 00
8 4 (object header) 28 b8 0f 00
12 4 int FieldOrder.fourthField 0
16 8 long FieldOrder.secondField 0
24 2 char FieldOrder.thirdField
26 1 boolean FieldOrder.firstField false
27 5 (loss due to the next object alignment)
Instance size: 32 bytes
Space losses: 0 bytes internal + 5 bytes external = 5 bytes total
Note how layouter laid out fields in their data type size: first long
field was aligned at +16, then int
field supposed to go at +24, but layouter discovered there is a gap before long
field where it can be tucked into, so it did at +12, then char
got its natural alignment at +24, then boolean took the slot +26.
The field packing is a major caveat when you would consider the interaction with foreign/raw functions that expect fields to be at particular offsets. The field offsets depend on what field packer does (does it compact fields, and how exactly it does so?), and what are the starting conditions for it (bitness, compressed references mode, object alignment, etc).
Java code that uses sun.misc.Unsafe
to gain access to the fields has to read the field offsets at runtime to capture the actual layout in the given execution. It is hard to diagnose source of bugs to assume the fields are at the same offsets as they were in debugging session.
10.2. Observation: C-style Padding Is Unreliable
When False Sharing mitigation techniques are involved, people resort to padding the critical fields in order to isolate them in their own cache lines. The most frequently used way to deal with it is to introduce some dummy field declarations around the protected field. And, since typing out declarations is tedious, people expectedly resort to using the largest data type. So, to protect a contentious byte
field, you could see this done:
public class LongPadding {
long l01, l02, l03, l04, l05, l06, l07, l08; // 64 bytes
byte pleaseHelpMe;
long l11, l12, l13, l14, l15, l16, l17, l18; // 64 bytes
}
You would expect the pleaseHelpMe
field squeezed between two large long
blocks. Unfortunately, field packer does not think so:
$ jdk8-64/bin/java -jar jol-cli.jar internals -cp . CStylePadding
# Running 64-bit HotSpot VM.
# Using compressed oop with 3-bit shift.
# Using compressed klass with 3-bit shift.
# Objects are 8 bytes aligned.
Instantiated the sample instance via default constructor.
LongPadding object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00
4 4 (object header) 00 00 00 00
8 4 (object header) 28 b8 0f 00
12 1 byte LongPadding.pleaseHelpMe 0 # WHOOPS.
13 3 (alignment/padding gap)
16 8 long LongPadding.l01 0
24 8 long LongPadding.l02 0
32 8 long LongPadding.l03 0
40 8 long LongPadding.l04 0
48 8 long LongPadding.l05 0
56 8 long LongPadding.l06 0
64 8 long LongPadding.l07 0
72 8 long LongPadding.l08 0
80 8 long LongPadding.l11 0
88 8 long LongPadding.l12 0
96 8 long LongPadding.l13 0
104 8 long LongPadding.l14 0
112 8 long LongPadding.l15 0
120 8 long LongPadding.l16 0
128 8 long LongPadding.l17 0
136 8 long LongPadding.l18 0
Instance size: 144 bytes
Space losses: 3 bytes internal + 0 bytes external = 3 bytes total
You could suggest padding with byte
fields then? It would depend on implementation detail that field packer goes through the fields of the same width/type in the declaration order, but at least it would somewhat work:
public class BytePadding {
byte p000, p001, p002, p003, p004, p005, p006, p007;
byte p008, p009, p010, p011, p012, p013, p014, p015;
byte p016, p017, p018, p019, p020, p021, p022, p023;
byte p024, p025, p026, p027, p028, p029, p030, p031;
byte p032, p033, p034, p035, p036, p037, p038, p039;
byte p040, p041, p042, p043, p044, p045, p046, p047;
byte p048, p049, p050, p051, p052, p053, p054, p055;
byte p056, p057, p058, p059, p060, p061, p062, p063;
byte pleaseHelpMe;
byte p100, p101, p102, p103, p104, p105, p106, p107;
byte p108, p109, p110, p111, p112, p113, p114, p115;
byte p116, p117, p118, p119, p120, p121, p122, p123;
byte p124, p125, p126, p127, p128, p129, p130, p131;
byte p132, p133, p134, p135, p136, p137, p138, p139;
byte p140, p141, p142, p143, p144, p145, p146, p147;
byte p148, p149, p150, p151, p152, p153, p154, p155;
byte p156, p157, p158, p159, p160, p161, p162, p163;
}
$ jdk8-64/bin/java -jar ~/projects/jol/jol-cli/target/jol-cli.jar internals -cp . BytePadding
# Running 64-bit HotSpot VM.
# Using compressed oop with 3-bit shift.
# Using compressed klass with 3-bit shift.
# Objects are 8 bytes aligned.
Instantiated the sample instance via default constructor.
BytePadding object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00
4 4 (object header) 00 00 00 00
8 4 (object header) 28 b8 0f 00
12 1 byte BytePadding.p000 0
13 1 byte BytePadding.p001 0
...
74 1 byte BytePadding.p062 0
75 1 byte BytePadding.p063 0
76 1 byte BytePadding.pleaseHelpMe 0 # Good
77 1 byte BytePadding.p100 0
78 1 byte BytePadding.p101 0
...
139 1 byte BytePadding.p162 0
140 1 byte BytePadding.p163 0
141 3 (loss due to the next object alignment)
Instance size: 144 bytes
Space losses: 0 bytes internal + 3 bytes external = 3 bytes total
…unless you need to protect something of a different type:
public class BytePaddingHetero {
byte p000, p001, p002, p003, p004, p005, p006, p007;
byte p008, p009, p010, p011, p012, p013, p014, p015;
byte p016, p017, p018, p019, p020, p021, p022, p023;
byte p024, p025, p026, p027, p028, p029, p030, p031;
byte p032, p033, p034, p035, p036, p037, p038, p039;
byte p040, p041, p042, p043, p044, p045, p046, p047;
byte p048, p049, p050, p051, p052, p053, p054, p055;
byte p056, p057, p058, p059, p060, p061, p062, p063;
byte pleaseHelpMe;
int pleaseHelpMeToo; // pretty please!
byte p100, p101, p102, p103, p104, p105, p106, p107;
byte p108, p109, p110, p111, p112, p113, p114, p115;
byte p116, p117, p118, p119, p120, p121, p122, p123;
byte p124, p125, p126, p127, p128, p129, p130, p131;
byte p132, p133, p134, p135, p136, p137, p138, p139;
byte p140, p141, p142, p143, p144, p145, p146, p147;
byte p148, p149, p150, p151, p152, p153, p154, p155;
byte p156, p157, p158, p159, p160, p161, p162, p163;
}
$ jdk8-64/bin/java -jar jol-cli.jar internals -cp . BytePaddingHetero
# Running 64-bit HotSpot VM.
# Using compressed oop with 3-bit shift.
# Using compressed klass with 3-bit shift.
# Objects are 8 bytes aligned.
Instantiated the sample instance via default constructor.
BytePaddingHetero object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00
4 4 (object header) 00 00 00 00
8 4 (object header) 28 b8 0f 00
12 4 int BytePaddingHetero.pleaseHelpMeToo 0 # WHOOPS.
16 1 byte BytePaddingHetero.p000 0
17 1 byte BytePaddingHetero.p001 0
...
78 1 byte BytePaddingHetero.p062 0
79 1 byte BytePaddingHetero.p063 0
80 1 byte BytePaddingHetero.pleaseHelpMe 0 # Good.
81 1 byte BytePaddingHetero.p100 0
82 1 byte BytePaddingHetero.p101 0
...
143 1 byte BytePaddingHetero.p162 0
144 1 byte BytePaddingHetero.p163 0
145 7 (loss due to the next object alignment)
Instance size: 152 bytes
Space losses: 0 bytes internal + 7 bytes external = 7 bytes total
10.3. @Contended
This endless whack-a-mole in very performance-sensitive parts of JDK library was mitigated by introducing the private @Contended annotation. It is used sparingly thorough the JDK, for example in java.lang.Thread
for carrying thread-local random generator state:
public class Thread implements Runnable {
...
// The following three initially uninitialized fields are exclusively
// managed by class java.util.concurrent.ThreadLocalRandom. These
// fields are used to build the high-performance PRNGs in the
// concurrent code, and we can not risk accidental false sharing.
// Hence, the fields are isolated with @Contended.
/** The current seed for a ThreadLocalRandom */
@jdk.internal.vm.annotation.Contended("tlr")
long threadLocalRandomSeed;
/** Probe hash value; nonzero if threadLocalRandomSeed initialized */
@jdk.internal.vm.annotation.Contended("tlr")
int threadLocalRandomProbe;
/** Secondary seed isolated from public ThreadLocalRandom sequence */
@jdk.internal.vm.annotation.Contended("tlr")
int threadLocalRandomSecondarySeed;
...
}
…which makes them treated specially by the field layouter code:
$ jdk8-64/bin/java -jar jol-cli.jar internals java.lang.Thread
# Running 64-bit HotSpot VM.
# Using compressed oop with 3-bit shift.
# Using compressed klass with 3-bit shift.
# Objects are 8 bytes aligned.
Instantiated the sample instance via default constructor.
java.lang.Thread object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00
4 4 (object header) 00 00 00 00
8 4 (object header) 48 69 00 00
12 4 int Thread.priority 5
16 8 long Thread.eetop 0
...
96 4 j.l.Object Thread.blockerLock (object)
100 4 j.l.UEH Thread.uncaughtExceptionHandler null
104 128 (alignment/padding gap)
232 8 long Thread.threadLocalRandomSeed 0
240 4 int Thread.threadLocalRandomProbe 0
244 4 int Thread.threadLocalRandomSecondarySeed 0
248 128 (loss due to the next object alignment)
Instance size: 376 bytes
Space losses: 129 bytes internal + 128 bytes external = 257 bytes total
There are ways to achieve this effect without relying on internal annotations, by piggybacking on other implementation details, which we shall discuss next.
11. Field Layout Across The Hierarchy
A special consideration needs to be given about laying out fields in the hierarchy. Suppose we have these classes:
public class Hierarchy {
static class A {
int a;
}
static class B extends A {
int b;
}
static class C extends A {
int c;
}
}
The layouts of these classes would be like this:
$ jdk8-64/bin/java -jar jol-cli.jar internals -cp . Hierarchy\$A
# Running 64-bit HotSpot VM.
# Using compressed oop with 3-bit shift.
# Using compressed klass with 3-bit shift.
Hierarchy$A object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00
4 4 (object header) 00 00 00 00
8 4 (object header) 28 b8 0f 00
12 4 int A.a 0
Instance size: 16 bytes
Space losses: 0 bytes internal + 0 bytes external = 0 bytes total
Hierarchy$B object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00
4 4 (object header) 00 00 00 00
8 4 (object header) 08 ba 0f 00
12 4 int A.a 0
16 4 int B.b 0
20 4 (loss due to the next object alignment)
Instance size: 24 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
Hierarchy$C object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00
4 4 (object header) 00 00 00 00
8 4 (object header) 08 ba 0f 00
12 4 int A.a 0
16 4 int C.c 0
20 4 (loss due to the next object alignment)
Instance size: 24 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
Note: all classes agree at where A.a
super-class field is. This allows blind casts to A
from any subtype, and then accessing a
field there, without looking back at the actual type of the object. That is, ((A)o).a
would always go to the same offset, regardless of whether we are dealing with instance of A
, B
, or C
.
This looks as if superclass fields are always taken care of first. Does it mean superclass fields are always first in the hierarchy? That is an implementation detail: prior JDK 15, the answer is "yes"; after JDK 15 the answer is "no". We shall quantify that with a few observations.
11.1. Superclass Gaps
Prior to JDK 15, field layouter only worked locally on current class declared fields. Which means if there are superclass gaps that subclass fields could take, they would not be taken. Let’s split the prior LongIntCarrier
example into subclasses:
public class LongIntCarrierSubs {
static class A {
long value;
}
static class B extends A {
int somethingElse;
}
}
$ jdk8-64/bin/java -jar jol-cli.jar internals -cp . LongIntCarrierSubs\$B
# Running 64-bit HotSpot VM.
# Using compressed oop with 3-bit shift.
# Using compressed klass with 3-bit shift.
Instantiated the sample instance via default constructor.
LongIntCarrierSubs$B object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00
4 4 (object header) 00 00 00 00
8 4 (object header) 08 ba 0f 00
12 4 (alignment/padding gap)
16 8 long A.value 0
24 4 int B.somethingElse 0
28 4 (loss due to the next object alignment)
Instance size: 32 bytes
Space losses: 4 bytes internal + 4 bytes external = 8 bytes total
Note there is the same gap we have seen before, caused by long
alignment. Theoretically, B.somethingElse
could have taken it, but field layouter implementation quirk makes it impossible. Therefore, we lay out fields of B
after the fields of A
, and waste 8 bytes.
11.2. Hierarchy Gaps
Another quirk prior to JDK 15 is that field layouter counted the field blocks in the integer units of reference size, which made the subclass field block start from much farther offset. This is most visible with something that carries a few very small fields:
public class ThreeBooleanStooges {
static class A {
boolean a;
}
static class B extends A {
boolean b;
}
static class C extends B {
boolean c;
}
}
$ jdk8-64/bin/java -jar jol-cli.jar internals -cp . ThreeBooleanStooges\$A
$ jdk8-64/bin/java -jar jol-cli.jar internals -cp . ThreeBooleanStooges\$B
$ jdk8-64/bin/java -jar jol-cli.jar internals -cp . ThreeBooleanStooges\$C
# Running 64-bit HotSpot VM.
# Using compressed oop with 3-bit shift.
# Using compressed klass with 3-bit shift.
ThreeBooleanStooges$A object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00
4 4 (object header) 00 00 00 00
8 4 (object header) 28 b8 0f 00
12 1 boolean A.a false
13 3 (loss due to the next object alignment)
Instance size: 16 bytes
Space losses: 0 bytes internal + 3 bytes external = 3 bytes total
ThreeBooleanStooges$B object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00
4 4 (object header) 00 00 00 00
8 4 (object header) 08 ba 0f 00
12 1 boolean A.a false
13 3 (alignment/padding gap)
16 1 boolean B.b false
17 7 (loss due to the next object alignment)
Instance size: 24 bytes
Space losses: 3 bytes internal + 7 bytes external = 10 bytes total
ThreeBooleanStooges$C object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00
4 4 (object header) 00 00 00 00
8 4 (object header) e8 bb 0f 00
12 1 boolean A.a false
13 3 (alignment/padding gap)
16 1 boolean B.b false
17 3 (alignment/padding gap)
20 1 boolean C.c false
21 3 (loss due to the next object alignment)
Instance size: 24 bytes
Space losses: 6 bytes internal + 3 bytes external = 9 bytes total
The loss is very substantial! We waste 3 bytes per class instance, and then might lose even more when object alignment kicks in.
It is even worse on larger heaps and/or without compressed references:
$ jdk8-64/bin/java -Xmx64g -jar jol-cli.jar internals -cp . ThreeBooleanStooges\$C
# Running 64-bit HotSpot VM.
Instantiated the sample instance via default constructor.
ThreeBooleanStooges$C object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00
4 4 (object header) 00 00 00 00
8 4 (object header) b0 89 aa 37
12 4 (object header) b0 7f 00 00
16 1 boolean A.a false
17 7 (alignment/padding gap)
24 1 boolean B.b false
25 7 (alignment/padding gap)
32 1 boolean C.c false
33 7 (loss due to the next object alignment)
Instance size: 40 bytes
Space losses: 14 bytes internal + 7 bytes external = 21 bytes total
11.3. Observation: Hierarchy Tower Padding Trick
This implementation pecularity allows constructing a rather weird padding trick that more resilient than a C-style padding.
public class HierarchyLongPadding {
static class Pad1 {
long l01, l02, l03, l04, l05, l06, l07, l08;
}
static class Carrier extends Pad1 {
byte pleaseHelpMe;
}
static class Pad2 extends Carrier {
long l11, l12, l13, l14, l15, l16, l17, l18;
}
static class UsableObject extends Pad2 {};
}
…yields:
$ jdk8-64/bin/java -jar jol-cli.jar internals -cp . HierarchyLongPadding\$UsableObject
# Running 64-bit HotSpot VM.
# Using compressed oop with 3-bit shift.
# Using compressed klass with 3-bit shift.
Instantiated the sample instance via default constructor.
HierarchyLongPadding$UsableObject object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00
4 4 (object header) 00 00 00 00
8 4 (object header) c8 bd 0f 00
12 4 (alignment/padding gap)
16 8 long Pad1.l01 0
24 8 long Pad1.l02 0
32 8 long Pad1.l03 0
40 8 long Pad1.l04 0
48 8 long Pad1.l05 0
56 8 long Pad1.l06 0
64 8 long Pad1.l07 0
72 8 long Pad1.l08 0
80 1 byte Carrier.pleaseHelpMe 0
81 7 (alignment/padding gap)
88 8 long Pad2.l11 0
96 8 long Pad2.l12 0
104 8 long Pad2.l13 0
112 8 long Pad2.l14 0
120 8 long Pad2.l15 0
128 8 long Pad2.l16 0
136 8 long Pad2.l17 0
144 8 long Pad2.l18 0
Instance size: 152 bytes
Space losses: 11 bytes internal + 0 bytes external = 11 bytes total
See, we squeeze the field we want to protect between two classes, exploiting a freaky implementation detail!
11.4. Super/Hierarchy Gaps in Java 15+
Now we turn to JDK 15 and its overhaul of field layout strategy. Now both superclass and hierarchy gaps are closed. Running our previous examples reveals it:
$ jdk15-64/bin/java -jar jol-cli.jar internals -cp . LongIntCarrierSubs\$B
# Running 64-bit HotSpot VM.
# Using compressed oop with 3-bit shift.
# Using compressed klass with 3-bit shift.
Instantiated the sample instance via default constructor.
LongIntCarrierSubs$B object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 05 00 00 00
4 4 (object header) 00 00 00 00
8 4 (object header) 4c 7d 17 00
12 4 int B.somethingElse 0
16 8 long A.value 0
Instance size: 24 bytes
Space losses: 0 bytes internal + 0 bytes external = 0 bytes total
Finally, B.somethingElse
took the alignment gap before super-class A.value
.
Hierarchy gaps are also gone:
$ jdk15-64/bin/java -jar jol-cli.jar internals -cp . ThreeBooleanStooges\$C
# Running 64-bit HotSpot VM.
# Using compressed oop with 3-bit shift.
# Using compressed klass with 3-bit shift.
Instantiated the sample instance via default constructor.
ThreeBooleanStooges$C object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 05 00 00 00
4 4 (object header) 00 00 00 00
8 4 (object header) 90 7d 17 00
12 1 boolean A.a false
13 1 boolean B.b false
14 1 boolean C.c false
15 1 (loss due to the next object alignment)
Instance size: 16 bytes
Space losses: 0 bytes internal + 1 bytes external = 1 bytes total
Perfect!
11.5. Observation: Hierarchy Tower Padding Trick Collapse in JDK 15
Unfortunately, this collapses the naive hierarchy padding trick that relied on implementation quirks! See:
$ jdk15-64/bin/java -jar jol-cli.jar internals -cp . HierarchyLongPadding\$UsableObject
# Running 64-bit HotSpot VM.
# Using compressed oop with 3-bit shift.
# Using compressed klass with 3-bit shift.
Instantiated the sample instance via default constructor.
HierarchyLongPadding$UsableObject object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 05 00 00 00
4 4 (object header) 00 00 00 00
8 4 (object header) 08 7c 17 00
12 1 byte Carrier.pleaseHelpMe 0 # WHOOPS
13 3 (alignment/padding gap)
16 8 long Pad1.l01 0
24 8 long Pad1.l02 0
32 8 long Pad1.l03 0
40 8 long Pad1.l04 0
48 8 long Pad1.l05 0
56 8 long Pad1.l06 0
64 8 long Pad1.l07 0
72 8 long Pad1.l08 0
80 8 long Pad2.l11 0
88 8 long Pad2.l12 0
96 8 long Pad2.l13 0
104 8 long Pad2.l14 0
112 8 long Pad2.l15 0
120 8 long Pad2.l16 0
128 8 long Pad2.l17 0
136 8 long Pad2.l18 0
Instance size: 144 bytes
Space losses: 3 bytes internal + 0 bytes external = 3 bytes total
Now that pleaseHelpMe
is allowed to take the gaps in the superclasses, field layouter pulls it out. Whoops.
The way out I see is to pad with the smallest data type:
public class HierarchyBytePadding {
static class Pad1 {
byte p000, p001, p002, p003, p004, p005, p006, p007;
byte p008, p009, p010, p011, p012, p013, p014, p015;
byte p016, p017, p018, p019, p020, p021, p022, p023;
byte p024, p025, p026, p027, p028, p029, p030, p031;
byte p032, p033, p034, p035, p036, p037, p038, p039;
byte p040, p041, p042, p043, p044, p045, p046, p047;
byte p048, p049, p050, p051, p052, p053, p054, p055;
byte p056, p057, p058, p059, p060, p061, p062, p063;
}
static class Carrier extends Pad1 {
byte pleaseHelpMe;
}
static class Pad2 extends Carrier {
byte p100, p101, p102, p103, p104, p105, p106, p107;
byte p108, p109, p110, p111, p112, p113, p114, p115;
byte p116, p117, p118, p119, p120, p121, p122, p123;
byte p124, p125, p126, p127, p128, p129, p130, p131;
byte p132, p133, p134, p135, p136, p137, p138, p139;
byte p140, p141, p142, p143, p144, p145, p146, p147;
byte p148, p149, p150, p151, p152, p153, p154, p155;
byte p156, p157, p158, p159, p160, p161, p162, p163;
}
static class UsableObject extends Pad2 {};
}
…which fills out all gaps, not letting our protected fields to float around:
$ jdk15-64/bin/java -jar jol-cli.jar internals -cp . HierarchyBytePadding\$UsableObject
# Running 64-bit HotSpot VM.
# Using compressed oop with 3-bit shift.
# Using compressed klass with 3-bit shift.
Instantiated the sample instance via default constructor.
HierarchyBytePadding$UsableObject object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 05 00 00 00
4 4 (object header) 00 00 00 00
8 4 (object header) 08 7c 17 00
12 1 byte Pad1.p000 0
13 1 byte Pad1.p001 0
...
74 1 byte Pad1.p062 0
75 1 byte Pad1.p063 0
76 1 byte Carrier.pleaseHelpMe 0 # GOOD
77 1 byte Pad2.p100 0
78 1 byte Pad2.p101 0
...
139 1 byte Pad2.p162 0
140 1 byte Pad2.p163 0
141 3 (loss due to the next object alignment)
Instance size: 144 bytes
Space losses: 0 bytes internal + 3 bytes external = 3 bytes total
In fact, that is what JMH does now.
This still relies on implementation detail that fields from Pad1
would be handled first and plug whatever holes there are in the superclass.
12. Conclusion
Java object internals story is complicated and full of static and dynamic trade-offs. The size of Java object can change depending on internal factors, like JVM bitness, JVM feature set, etc. The size can change depending on runtime configuration, like heap size, compressed references mode, GC used.
Looking at footprint story from JVM side, it becomes clear that compressed references play the extensive role in it. Even without references involved, they affect whether class word is compressed. Mark word would get more compact in 32-bit VMs, so that would also improve the footprint. (That also does not mention that VM-native pointers and machine-word-wide types would become much narrower).
From the Java (developer) perspective, knowing about object internals allows hiding fields in object alignment shadow, in field alignment gaps, without exploding the apparent instance size. On the other hand, adding just a single little field may baloon the instance size up considerable, and explaining why that happened inevitably involves reasoning about finer Object structure.
Last, but not least, tricking field layouter to put the fields in some order is quite hard and depends on implementation quirks. Those are still usable, there are less safer and more safer things to rely on. It needs additional verification for every JDK update you run with, anyway. You should definitely re-verify what you do when running on JDK 15 and later.