Week 3

This document summarizes work done/being done in week 3.

1 Implementation + Issues

A major part of the week was spent trying to come up with an implementation for the instrumentation.

I explored two of the approaches discussed in the last week, JVMTI and Java Agent. In terms of goals, anything I used needed to achieve the following goals:

  • Callback on object allocation.
  • Callback on object access (a field or a instance method call, both).
  • Callback in object deallocation, or alternatively a way to traverse the reachable heap.

1.1 JVMTI

I decided to use JVMTI as my first attempt to profiling, because it provides a method to traverse the heap based on the reachable set, so I could compare the reachability of an object with its last use info to come up with what objects/allocation sites are the main problem.

JVMTI does not have callbacks for object allocation, so I decided to track the calls of the "<init>" method (which is a method used internally by the JVM whenever any constructor is called).

However, JVMTI is not capable of finding the caller of a method once it is called, so it is rather hard to track what object got created, and this is problematic for both object allocation and instance method calls (when we want to track object access).

To deal with this, the way ahead seemed to be instrumenting the bytecode (redefine classes at runtime). I even found an example in the JDK's demo code to this effect (which was tracking just object allocation), however, I did not find any library (for C/C++) to instrument the bytecode, which is not an easy task by itself. So, I decided to switch to JavaAgent.

The code which I wrote is, as usual, available on my repository.

1.2 Java Agent

The main takeaway from the last attempt was that, at some point or the other, I will need to instrument bytecode. For this, I decided to use a library, ASM.

Briefly, the idea was to do the following:

  • The JVM is a stack machine, between the opcode "new" is called, and when we "invokespecial" the "<init>" method, I insert a call to a method which will record the object creation. I can use "dup" after the call to new to get a reference to the newly created object, something which is not possible without instrumentation.
  • Similar to this, the the opcode "invokevirtual" has the objectref as its last argument. We duplicate this and call our recording method to record a use for this object before "invokevirtual" is called.
  • Finally, deallocation - unlike JVMTI, Java Agents written in Java do not offer a mechanism to callback at deallocation, or trace the heap. Instead, we rely on PhantomReferences - for each object that we register at allocation, we maintain a PhantomReference to. PhantomReferences are active after the objects have been marked for collection, so we can do any postmortem operations on them.

With this idea in mind, I basically worked on the first part, for which the code is available online. I encountered certain problems, some of which I still have not been able to solve, in part because I am unaware of the exact mechanics of class loading for a JVM, and in part because I am using ASM for the first time.

The code which I wrote is, as usual, available on my repository.

Author: Milind Luthra

Emacs 25.3.1 (Org mode 8.2.10)

Validate