Week 9

This document summarizes the work being done in week 9.

1 Measurements

This week, I benchmarked the profiler against proper benchmarks (with larger intervals of time). Then, based on my results, I tried to improve the speed as much as I could. The table/graph below is a summary of the results. It is similar to the table last time, but the time per benchmark is 240s (as recommended in the SPEC guide) instead of 1-5s.

fig1.png

The second graph has a log scale on the y axis, so `compress (spec.)` is actually a hundred times slower with the profiler.

fig2.png

Benchmark With Profiler (ops/m) Without Profiler (ops/m)
compiler.compiler (spec.) 4.26 346.36
xml.validation (spec.) 359.51 394.39
scimark.fft.small (java.,spec.,com.sun.) 44.82 200.11
xml.transform (spec.,java.) 29.59 161.39
mpegaudio (spec.) 67.25 65.52
compress (spec.,java.) 1.02 93.22
compress (spec.) 1.09 98.22
compress (java.) 88.69 109.83

2 Speed Improvements

Since there seems to be a large, unacceptable penalty in some of the benchmarks. I used VisualVM supplied with the JDK to take snapshots of the run and to find the function which was taking the largest amount of CPU. The culprit turned out to be useObject. This function is invoked every time an object is accessed - and since most java programs consist of objects being accessed, it adds up to a lot, around 70% of all-time CPU in the compress benchmark.

However, since I don't care about each access (I only care about the last access), it is not necessary to record all the accesses for a particular object. Further, the time I am using is the total number of bytes allocated till now, so I do not even need to record usage more than once if I have not allocated any new object.

Concretely, in benchmarks like compress, the same set of 25-30 objects are accessed thousands/hundreds of thousand times, without any new objects being allocated in that interval. I need to record their access only once. Keeping this in mind, I came up with several improvements. Note that the variable usage is incremented each time we use an object, and set to zero when we finally record all the uses in some interval.

  • Recording only when I need to: Instead of immediately setting the lastUsedTime for each object as soon as it is used, I just store its hashcode in a collection, and record the usage at the end, that is, whenever usage exceeds usageLimit and we have allocated some new object (time has incremented).
  • Discarding redundant records: the collection I am using is a Set - thus, even if I have a thousand uses of one object, it is recorded only once since duplicates are automatically discarded.
  • The above conditions are checked only once every usageSkip object uses, otherwise, usage is incremented, that is all.

The tradeoff that one faces is the loss of granularity that the profiler offers. The values for usageSkip and usageLimit can be set in the ObjStore.java file. There is another variable, called cycleSize, which controls how often we call the GC. Larger values mean a faster profiler, which is less accurate.

3 Usability Improvements and Feature Addition

One of the small usability issues last time was the size of the JSON logs could be in the thousands or more, which made writing them to the filesystem an expensive and tedious operation. What's more, we care mostly about the objects which have a large drag, and thus, I stored the dead objects in a priority queue based on their net drag (plus lag). The size of the priority queue can be controlled by looking at the variable jsonSize inside ObjStore.java.

I also implemented lag, as discussed in this paper.

Author: Milind Luthra

Emacs 25.3.1 (Org mode 8.2.10)

Validate