Week 8

This document summarizes the work being done in week 8.

1 Examining the Old LGC

The single-largest source of issues for me in this project stem from the way that the JVM works: first, it uses a ClassLoader implemented in native code to load a set of 'boot' classes. Then, a new ClassLoader is created to load classes on the classpath and so on. It is hard to instrument these boot class files, because there is lots of native code, and problems arising from the fact that the JVM is not fully functioning. For instance, the lack of hashCode I faced earlier.

The issue is, that a lot of important Java classes (like a lot of java. or javax.* classes) are a part of the boot class path. They are used very commonly, so they must be instrumented (or so I thought). This creates a dilemma.

Now, while examining the old LGC, I noticed that it does not instrument these boot class files at all, only the 'new' class files that have been created to test the old LGC out. In other words, consider the following source:

public class AC {
  ArrayList a = new ArrayList();
  public AC() {
    a.append(new Integer(4));

Currently, this will only create an instrumented version of AC.class. However, if we choose to instrument the files in the boot class path as well, we will get an instrumented Object, AbstractList, AbstractCollection, ArrayList, and Integer.

Here, I realized that it might be reasonable to test how much the boot class path objects actually contribute to memory usage (as opposed to the 'new' classes we define) in a typical Java application. For this, a JavaAgent is the quickest way, so I quickly wrote a JavaAgent for this.

2 Measurement

The metric I used for measurement was simple, just the total size of objects loaded by the boot classloader, and the total size of objects loaded by the application classloader, over a set of SPECjvm2008 benchmark groups.

The sizes are in an arbitrary unit, since they are measured using java.lang.Instrumentation, which provides a method called getObjectSize, which is documented as "Returns an implementation-specific approximation of the amount of storage consumed by the specified object. The result may include some or all of the object's overhead, and thus is useful for comparison within an implementation but not between implementations."


Benchmark Group Boot Classloader Application Classloader
compiler 78167744.0 48779272.0
crypto 768946040.0 19801432.0
xml 717591800.0 31597320.0
scimark.small 3317592.0 12958232.0

In conclusion: assuming that the benchmark provides a fair example of typical usage, it is not a good idea to ignore the classes loaded by the boot class loader..

3 Profiler Improvements

The profiler that I came up with after week 6 had quite a few shortcomings. I fixed several of them, and sped it up as well.

3.1 Tracking Object Allocation and Use

Earlier, I was using a list of reachableObjects, to which I added to on every allocation. I had some issues with the hashCode for several boot classes. A remedy for that was found in System.identityHashCode, which provides an always-unique integer for all objects (two objects a and b have the same identityHashCode if and only if a == b), and is always available.

This allowed me to change my model to consist of three things:

  • Map, identityHashCode \(\rightarrow\) lastUseTime
  • Map, identityHashCode \(\rightarrow\) collectionTime
  • ReferenceQueue, with WeakReference

The profiler runs in the following way (square bracket at start indicates what thread it is on):

  • [MAIN] Set up the instrumentation by modifying methods and constructors.
  • [APPLICATION-DEPENDENT] On each allocation, a new entry in the maps above is created, and a WeakReference is registered with our ReferenceQueue. This can be found in ObjStore.allocateObject. Further, it is checked if GC is required.
  • [APPLICATION-DEPENDENT] On each method call, the identityHashCode is used to look up the object and update its lastUse in the map above. This can be found in ObjStore.useObject.
  • [COLLECTION] This thread polls the ReferenceQueue, and on getting an object, sets its collectionTime. This can be found at ObjStore.startCollectionWatch.
  • [SHUTDOWN-HOOK] This runs at the death of the application, and for all the objects still alive, marks them "dead" (since the OS will collect them anyway). Finally, this is in-charge of collecting all the data and outputting it in a JSON format. This can be found at ObjStore.lastCycle.

3.2 Tracking Object Deallocation

The use of ReferenceQueue made it much more efficient than last time. Instead of iterating over all objects that were reachable to see which ones were recently garbage collected, I stored objects inside WeakReference linked to a ReferenceQueue. As soon as the objects became unreachable in the eyes of the GC, they were added to the ReferenceQueue.

3.3 Limiting the Classes Profiled

As discussed above, a lot of the "slowness" of the profiler may stem from the fact that it is instrumenting a large number of (boot) classes.

A large number of classes instrumented means that more objects are tracked, which leads to:

  • Larger number of GC attempts (main source of slowdown)
  • Larger data structure size that the profiler needs to maintain

But then again, as evident in the graph above, profiling boot classes might be important. Thus, I came up with a compromise, that you can choose which classes to profile.

3.4 JSON Output

The profiler now outputs JSON logs for easy post-processing. There is an issue with this, because the JSON logs are created at the end and it is often very slow to write them (there might be hundreds of thousands of objects, and millions of lines in the JSON file).

Here is a sample of what the JSON logs look like:

    "stack": [
    "drag": 256,
    "class": "C"
    "stack": [
    "drag": 1280,
    "class": "B"
    "stack": [
    "drag": 960,
    "class": "java.net.URLClassLoader$1"

4 Speed Comparison

I ran small benchmarks off SPECjvm2008 to compare the speed of instrumented and uninstrumented execution.


Benchmark With Profiler (ops/m) Without Profiler (ops/m)
compress (5s, infix:`spec`) 1.14 94.3
startup.helloworld (5s, infix:`java`) 63.5 238.0
xml.validation (1s, infix:`spec.benchmark`) 66.95 72.52
scimark.fft.small (1s, infix:`spec`) 63.18 207.17

5 TabUseProf Results

I also profiled my browser usage, with and without the TabUseProf browser extension. There seems to be a clear advantage to using the extension, the memory saved is around 750MB.

The usage was measured on Firefox Nightly 60, with around 50 tabs open (with one pinned tab), for half an hour.

5.1 Without Extension


5.2 With Extension


Author: Milind Luthra

Emacs 25.3.1 (Org mode 8.2.10)