Week 6

This document summarizes the work being done in week 6. This week, I completed the profiler I had been working on for the past few weeks. This document describes the structure and use of the profiler, and the issues I faced while making it, as well as the issues in the profiler itself.

1 Structure

The basic idea behind how the profiler works is simple. First, it tracks object creation/deletion, and object access. It partitions the run of the application into several "cycles" (cycles change when a certain number of bytes are allocated by the JVM, for instance, the default cycle size of 1000 bytes).

The profiler checks for the cycle where the object is last accessed, and then compares it with the cycle till where the object is reachable, and then calculates "drag". A higher drag means that we should collect that object using a liveness based collector rather than a reachability based one.

$$ drag = objectSize \times cycleSize \times (lastReachableCycle - lastLiveCycle) $$

A library called ByteBuddy is used to transform the bytecode of classes being loaded at runtime. The Advice API is used to insert some code into the entry and exit of each method and constructor.

fig1.png

1.1 Object Allocation/Destruction

Object allocation is tracked by transforming the constructors. In every constructor, the following code is inserted on the exit (by the time this is defined)

store.allocateObject(new ObjData(this,
                                 inst.getObjectSize(this),
                                 store,
                                 Thread.currentThread().getStackTrace()));

The ObjData stores a WeakReference to the object, as well as its size and a stack trace (to track down what line we created the object so later we can decide where to use liveness based collection).

Object death is tracked in a somewhat more involved way. At the end of each cycle, we call System.gc(), and then iterate through the entire list of reachableObjects. The objects which have been newly GC'd have the internal reference in their WeakReference set to null. We find such objects, and move them to unreachableObjects. We can also print drag and other information about these objects to the screen.

1.2 Intercepting Methods

There are two sorts of accesses that we need to track. The first, and far more common access to obj is through obj.method(). To track this, we have used something similar to the constructor instrumentation given above. At the end of every method call, we insert code to find out if we are holding a reference to obj in our reachableObjects, and then we update the lastUsedCycle for the corresponding ObjData by calling the objectUsed() method.

The second, less common access that we need to track is access to the instance members of the obj, corresponding to the java bytecode opcode getfield (eg a = obj.member) and to the opcode putfield (eg obj.member = a). Right now, I am not tracking these accesses - see the problems section below.

1.3 Speeding Up (Multithreading)

Currently, the entire instrumentation code runs on the main thread. This has adverse impact on the perfomance of the program, thus I moved a lot of code to a separate thread. Initially, I was calling allocateObject and objectUsed() both on the main thread, and each time I called those methods, I checked if the allocationSinceLastCycle had exceeded cycleSize, and in that case, invoked nextCycle, which is a time consuming method, since it loops over all reachable objects.

I have shifted the calls to the nextCycle method to another thread which is started by the premain function. This means that the loop can go on in parallel to program execution till the program tries to allocate/use an object. This has an adverse effect, that if the JVM does not schedule this thread often enough, then we might exceed the cycleSize by a large amount before calling nextCycle. However, running it on a few examples, this does not seem to be the case.

I used the synchronized keyword to control access to the store, which will otherwise be read/written at the same time when we are running nextCycle and cause a ConcurrentModificationException.

2 Usage Instructions

See README.

To enable/disable/change what is printed in this, please take a look at prettyPrintObjectDeath in ObjStore.

To change cycle size, please look at cycleSize in ObjStore.

3 Problems I Faced

3.1 Non-Existence of hashCode (?)

Look at the following code, which I am using to update the last use information for an object obj (called at method exit, as described above).

for (ObjData objData: store.getReachableObjects()) {
    if (objData.getThiz().get() == obj) {
        objData.objectUsed();
        break;
    }
}

This is rather inefficient, as it loops over all reachable objects. A much easier and faster solution, at a glance, would be to use a Map with the key as something unique to the object (this is most usually the hashCode of the object) and the value as the ObjData corresponding to that. We cannot use a reference to the object, as that would mean preventing its collection.

However, the hashCode method seems to fail for several bootstrap classes at the initial load time, thus, this method cannot be used, and instead, I have used the inefficient method.

Possible Solution: Use a try/catch block to skip over the classes which have this issue. The number of such classes is pretty low, and what's more, application classes are not a part of this.

3.2 Instance Variable Accesses

Unlike method accesses, instance variable accesses are much more difficult to track. The actual accesses are not so hard, however, obtaining a reference to the corresponding object is the hard part. Currently, the library I am using does not support this functionality, and it is not trivial to implement it either.

Thus, I am simply ignoring the instance variable accesses (in most cases, instance variables have private access only, so it should not matter so much).

3.3 StackMap

StackMaps are basically constructs added to Java from Java 6 onwards that make bytecode verification at runtime faster. A more detailed explanation can be found here. Basically, to manipulate the bytecode, we also have to change some information in the StackMapTable, but only in the case where we are using instructions which involve jumps etc.

In essence, any instruction like if/else, for, try/catch will be impossible to use with this unless we change the StackMapTable, which is not easy. Thus, I currently require my test programs to be compiled to Java 5.

4 Problems in Profiler

4.1 Speed

The profiler is rather slow. There is around a 20-60 times slowdown when we are running the profiler as opposed to when we are running just the program, caused mainly due to us calling System.gc() whenever we increment the cycle. A slowdown like this is not feasible for the profiler, since it runs with the application.

4.2 Manual Tweaking of Cycle Size

The cycle size needs to be set manually for each different application we are running. If we have a very large cycle size, then the accuracy is hampered, since many objects will be created and collected in the same cycle, thus not allowing us to pinpoint where we need to use liveness based garbage collection. If we have a very small cycle size, it will be very slow because we will gc too often.

Author: Milind Luthra

Emacs 25.3.1 (Org mode 8.2.10)

Validate