Very large heaps in Java

Java HeapI’ve been wanting to write up a juicy post on how we deal with very large heaps in Java to reduce GC pauses.  Unfortunately I keep getting side tracked getting the data together.  The latest bump in the road is due to a JVM bug of sorts.

Backstory:  Todd Lipcon’s twitter post pointed me to the JVM option -XX:PrintFLSStatistics=1 to be able to get out some good information about heap fragmentation. He was even kind enough to provide the Python and R scripts! I figured that it would be a few minutes of fiddling and I’d have some good data for a post. No such luck. Our JVM GC/heap options are -XX:+UseConcMarkSweepGC -Xms65g -Xmx65g. When -XX:PrintFLSStatistics=1 is used with this, the following output is seen:

Statistics for BinaryTreeDictionary:
Total Free Space: -1824684952
Max   Chunk Size: -1824684952
Number of Blocks: 1
Av.  Block  Size: -1824684952
Tree      Height: 1

A few seconds of digging into the Hotspot source reveals:

void BinaryTreeDictionary::reportStatistics() const {
  gclog_or_tty->print("Statistics for BinaryTreeDictionary:\n"
  size_t totalSize = totalChunkSize(debug_only(NULL));
  size_t    freeBlocks = numFreeBlocks();
  gclog_or_tty->print("Total Free Space: %d\n", totalSize);
  gclog_or_tty->print("Max   Chunk Size: %d\n", maxChunkSize());
  gclog_or_tty->print("Number of Blocks: %d\n", freeBlocks);
  if (freeBlocks > 0) {
    gclog_or_tty->print("Av.  Block  Size: %d\n", totalSize/freeBlocks);
  gclog_or_tty->print("Tree      Height: %d\n", treeHeight());

in hotspot/src/share/vm/gc_implementation/concurrentMarkSweep/binaryTreeDictionary.cpp. (“%d” just doesn’t cut it with a “long”‘s worth of data.)  I filed a hotspot bug so hopefully it will be fixed in some release in the not-too-distant-future.

I can work around this but it has slowed down my getting to the juicy blog post. Stay tuned!