What Heap Dumps Are Lying To You About

Aleksey Shipilёv, @shipilev, aleksey@shipilev.net

Note	This post is also available in ePUB and mobi.

This is an updated version of the Russian post I had roughly one year ago, way before Java Object Layout was available under OpenJDK, and the draft I had on GitHub was in infancy. The rationale for this post is to give the "runtime guy" perspective on every single blog post about Java object sizes and layouts.

Part I. Myths

Myth 0. You can figure out the object size once and for all

In reality, it depends on number of things. Target VM: what exactly you are running with? HotSpot? JRockit? J9? ahem Dalvik? Platform bitness: reference sizes are probably different, or even the basic types take a different number of bytes. Potential and actual optimizations, a la object inlining, scalarized fields, paddings, etc.

Myth 1. It is enough to sum up the field sizes to get the instance size

In reality, VMs are free to choose the representation for the basic types. Moreover, VMs are even forced to widen the representation beyond necessary for values domain in order to maintain basic language semantics (e.g. read/write atomicity) or to fit the underlying hardware requirements (e.g. always do aligned accesses). Moreover, the objects themselves also need to be aligned to maintain the field alignments intact. And then there are object headers…

Myth 2. It is enough to sum up the field sizes, PLUS the well-known header size

In reality, this data has questionable "shelf life", because VMs are free to choose the object representations, including the headers format. For example, current HotSpot can have the header sizes from 8 to 16 bytes in different conditions. JRockit will mostly have 8 bytes headers. That will change once HotSpot adopts the same object header format (that requires significant rework in locking schematics though).

Myth 3. Any sensible tool will show up correct instance size

In reality, it depends. HPROF will dump the instance size as calculated in Myth 1. That means, every single tool using HPROF (e.g. jhat, Eclipse MAT, VisualVM, Yourkit) as the source data is doomed to under/overestimate instance sizes, because they are forced to guess.

The moral here: we need online tools, which will give the instance size info on the spot.

Part II. Reality

The significant trouble for offline tools is HPROF format itself. HPROF is so "VM agnostic" that you don’t even know if you have been running 32- or 64-bit VM (yeah, there is "ID size", but is it really the reference size?). Good luck knowing if compressed references enabled or not. Forget about field layouts info, and object headers either, this info is completely lost. You only have the bitscale of instance field data with the class data describing what particular bits mean there. Hence, tools parsing HPROF need to guess the runtime layout of the objects.

Offline tools shootout

Let’s illustrate this last point. Make up some class, have a single instance of it, do the heapdump, and open the dump in various tools. Here’s our specimen for today:

public class Main {

    private static Object obj;

    public static void main(String[] args) throws Exception {
        obj = new D();
        System.in.read();
    }

    static class A {
        long a;
    }

    static class B extends A {
        boolean b;
    }

    static class C extends B {
        int c;
    }

    static class D extends C {
        boolean d;
    }
}

Now, we can get a few heap dumps in various modes. For the sake of completeness, here is the exact environment the data is gathered. For today’s experiment, I downloaded the latest and greatest revisions of the tools available, as well as some of the recent JDKs:

Linux x86_64, 3.12+
HotSpot 24.0-b56, JDK 7u40
JRockit R28.2.5-20-152429-1.6.0_37-20120927-1915-linux-*, JDK 6u37
Visual VM 1.3.6
Eclipse Memory Analyzer Tool 1.3.0.20130517
YourKit Profiler 2013 (build 13066), evaluation license

This is what we get:

VM	Bitness	CompRef	`Main$D` instance size, bytes
			VisualVM	Eclipse MAT	YourKit
HotSpot (7u40)	x86	no (N/A)	22	32	32
HotSpot (7u40)	x86_64	yes (default)	30	32	40
HotSpot (7u40)	x86_64	no	30	48	48
JRockit (6u37)	x86	no (N/A)	22	32	32
JRockit (6u37)	x86_64	yes (default)	30	32	40
JRockit (6u37)	x86_64	no	30	48	48

You can easily see the tools are disagreeing with each other on number of things, and that is because tools are forced to guess. In order to get the true runtime object layout, we need some heavy artillery. You can actually use Unsafe to get the field layout in the object, read the field values there, deduce the basic type lengths, object alignment, etc. We already did that in Java Object Layout (JOL), which significantly shortens this post.

Exploring specimens with JOL

Let us start with HotSpot. You can see different things there:

HotSpot has 8-byte, 12-byte, and 16-byte object headers. This is because the header contains two parts, markword (metainfo about the object), and classword (the reference to class). In 32/64-bit mode markword takes up either 4 or 8 bytes. Classword is "just" the reference, and hence it can be compressed in 64-bit mode.
The objects are aligned by 8 bytes. This is required to maintain the field alignments, although pessimistic. The object itself has the trailing gap, where nothing else could be allocated, since next object should also be aligned by 8 bytes
C.c is aligned by 4 bytes, and A.a is aligned by 8 bytes. There are substantial number of gaps between the fields because of these aligments

My research on how we can improve this layout scheme can be found here.

Running 32-bit HotSpot VM.
Objects are 8 bytes aligned.

Main.D object internals:
 OFFSET  SIZE    TYPE DESCRIPTION                    VALUE
      0     8         (object header)                N/A
      8     8    long A.a                            N/A
     16     1 boolean B.b                            N/A
     17     3         (alignment/padding gap)        N/A
     20     4     int C.c                            N/A
     24     1 boolean D.d                            N/A
     25     7         (loss due to the next object alignment)
Instance size: 32 bytes

Running 64-bit HotSpot VM.
Objects are 8 bytes aligned.

Main.D object internals:
 OFFSET  SIZE    TYPE DESCRIPTION                    VALUE
      0    16         (object header)                N/A
     16     8    long A.a                            N/A
     24     1 boolean B.b                            N/A
     25     7         (alignment/padding gap)        N/A
     32     4     int C.c                            N/A
     36     4         (alignment/padding gap)        N/A
     40     1 boolean D.d                            N/A
     41     7         (loss due to the next object alignment)
Instance size: 48 bytes

Running 64-bit HotSpot VM.
Using compressed references with 3-bit shift.
Objects are 8 bytes aligned.

Main.D object internals:
 OFFSET  SIZE    TYPE DESCRIPTION                    VALUE
      0    12         (object header)                N/A
     12     4         (alignment/padding gap)        N/A
     16     8    long A.a                            N/A
     24     1 boolean B.b                            N/A
     25     3         (alignment/padding gap)        N/A
     28     4     int C.c                            N/A
     32     1 boolean D.d                            N/A
     33     7         (loss due to the next object alignment)
Instance size: 40 bytes

And this is JRockit:

Note that it has the same header size in all VM modes. That will confuse tools a lot.
Other than that, it suffers from the same losses as the HotSpot.

Running 32-bit JRockit (experimental) VM.
Objects are 8 bytes aligned.

Main.D object internals:
 OFFSET  SIZE    TYPE DESCRIPTION                    VALUE
      0     8         (object header)                N/A
      8     8    long A.a                            N/A
     16     1 boolean B.b                            N/A
     17     3         (alignment/padding gap)        N/A
     20     4     int C.c                            N/A
     24     1 boolean D.d                            N/A
     25     7         (loss due to the next object alignment)
Instance size: 32 bytes

Running 64-bit JRockit (experimental) VM.
Using compressed references with 0-bit shift.
Objects are 8 bytes aligned.

Main.D object internals:
 OFFSET  SIZE    TYPE DESCRIPTION                    VALUE
      0     8         (object header)                N/A
      8     8    long A.a                            N/A
     16     1 boolean B.b                            N/A
     17     3         (alignment/padding gap)        N/A
     20     4     int C.c                            N/A
     24     1 boolean D.d                            N/A
     25     7         (loss due to the next object alignment)
Instance size: 32 bytes

Running 64-bit JRockit (experimental) VM.
Objects are 8 bytes aligned.

Main.D object internals:
 OFFSET  SIZE    TYPE DESCRIPTION                    VALUE
      0     8         (object header)                N/A
      8     8    long A.a                            N/A
     16     1 boolean B.b                            N/A
     17     3         (alignment/padding gap)        N/A
     20     4     int C.c                            N/A
     24     1 boolean D.d                            N/A
     25     7         (loss due to the next object alignment)
Instance size: 32 bytes

Tools shootout (updated)

Now that we have the reference point, we can update the table:

VM Bitness CompRef Main$D instance size, bytes

VM	Bitness	CompRef	`Main$D` instance size, bytes
			VisualVM	Eclipse MAT	YourKit	JOL
HotSpot (7u40)	x86	no (N/A)	22 `lies`	32	32	32
HotSpot (7u40)	x86_64	yes (default)	30 `lies`	32 `lies`	40	40
HotSpot (7u40)	x86_64	no	30 `lies`	48	48	48
JRockit (6u37)	x86	no (N/A)	22 `lies`	32	32	32
JRockit (6u37)	x86_64	yes (default)	30 `lies`	32	40 `lies`	32
JRockit (6u37)	x86_64	no	30 `lies`	48 `lies`	48 `lies`	32

VisualVM

Eclipse MAT

YourKit

JOL

HotSpot (7u40)

x86

no (N/A)

22 lies

HotSpot (7u40)

x86_64

yes (default)

30 lies

32 lies

HotSpot (7u40)

x86_64

30 lies

JRockit (6u37)

x86

no (N/A)

22 lies

JRockit (6u37)

x86_64

yes (default)

30 lies

40 lies

JRockit (6u37)

x86_64

30 lies

48 lies

We can see there are no ideal tools for the job. And don’t get me wrong: both MAT and YourKit are really trying! This is not the failure of the tools, this is the failure of VM protocol which could not possibly communicate the layout information to the tools reliably. When the tools are running inside the VM themselves, they could get the precise layout info, just like JOL did.

Part III. Going Deeper

Sad part

The layout gaps are actually quite rare to confuse tools. Most of the time VMs are very good at packing the field data densely, so that easy sum over the field sizes is enough to somewhat accurately account for the instance sizes. However, this might change at any moment. Consider, for example, the class which uses @Contended:

public static class ContendedTest {
    @Contended  private int int1;
                private int int2;
}

The very effect of @Contended is to isolate the int1 field from all other fields. This yields the drastically different layout:

Running 64-bit HotSpot VM.
Using compressed references with 3-bit shift.
Objects are 8 bytes aligned.

ContendedTest  object internals:
 OFFSET  SIZE    TYPE DESCRIPTION                    VALUE
      0    12         (object header)                N/A
     12     4     int ContendedTest.int2             N/A
     16   128         (alignment/padding gap)        N/A
    144     4     int ContendedTest.int1             N/A
    148   128         (alignment/padding gap)        N/A
    276     4         (loss due to the next object alignment)
Instance size: 280 bytes

…and all the HPROF-based tools would happily report something around 24 bytes for the instance, while the actual size is ten times larger! There is no way for HPROF to convey this kind of info.

Happy part

But in the end, we can sometimes use this knowledge to our own benefit. We can add fields ninja-style to the existing classes without increasing the percieved instance sizes! Like this:

 public class A {
     boolean a;
 }

 public class NinjaA {
     boolean a;
     boolean b, c, d; // can add these
 }

No increase in instance size, "hiding" the fields in the alignment shadow:

Running 64-bit HotSpot VM.
Using compressed references with 3-bit shift.
Objects are 8 bytes aligned.

A object internals:
 OFFSET  SIZE    TYPE DESCRIPTION                    VALUE
      0    12         (object header)                N/A
     12     1 boolean A.a                            N/A
     13     3         (loss due to the next object alignment)
Instance size: 16 bytes

Running 64-bit HotSpot VM.
Using compressed references with 3-bit shift.
Objects are 8 bytes aligned.

NinjaA object internals:
 OFFSET  SIZE    TYPE DESCRIPTION                    VALUE
      0    12         (object header)                N/A
     12     1 boolean NinjaA.a                       N/A
     13     1 boolean NinjaA.b                       N/A
     14     1 boolean NinjaA.c                       N/A
     15     1 boolean NinjaA.d                       N/A
Instance size: 16 bytes

Part IV. Epilogue

TL;DR:

Most of the HPROF-based tools have problems with deducing the actual instance footprint; the special crafted example in this article shows >25% difference between actual and estimated instance size, which can lead the analysis in the wrong direction. However, the cases like that are rare, and most analyses should be fine, especially when dealing with gigabytes worth of memleaks.
There were talks to get HPROF fixed up, but they winded down with the bottom-line "we just need a better format", because tools already have reservations about what HPROF actually means. JEP, anyone?
Online tools are the best to figure out actual instance footprint. Use JOL, run it via command-line, embed it into your projects. Don’t you ever, ever guess the object layouts by looking at the heap dumps.
If you are interested in more examples, you may want to look through the runnable Code Samples shipped with JOL itself.