Week 0

This document summarizes work done/being done in week 0.

1 Reading

1.1 DONE Improving Liveness Based GC In Java

One of the issues with this GC is that it's slow because it needs to scan the object graph again and again: unlike normal mark and sweep, which needs only one scan of the object graph.

Consider the example below for instance.

fig1.png

In this image, the black lines represent the object graph, the red dashed lines repesent the access graph for the first three roots, and the green dashed lines represent the access graph for the last root.

In this case, we need to scan the entire graph four times. If we end up at an unvisited node, we traverse the entire subtree, just like a usual DFS. But if we end up at a visited node – one that has been visited as a part of a different access graph, we still need to traverse its entire subtree. This is because there might be descendants not covered by the previous access graph(s).

For instance, consider the nodes E or H, which will not be covered in the first three searches, even though their ancestors are, and will be uncovered only in the fourth search.

What this means is that we may need to scan the entire graph #roots times.

I had a preliminary method to to reduce the number of mark steps needed. I'm trying to find out all the flaws that will probably exist in this, since this is just an idea as of now.

[Runtime]

Take the union of all the subgraphs attached to the root set. This takes one traversal of the entire object graph, since each node need not be visited more than once. This will give us the object graph (nodes and edges). The nodes can be uniquely identified by the hashCode (not really, but most good implementations of the method mean that there will mostly not be a collision). The edges will have a label, which will be the name of the property used to access them, that is, if obj1.prop1 = obj2, then the edge from (obj1, obj2) will have the label prop1.

To complete this graph, add a node OverallRoot and add edges for all n (OverallRoot, Rn) with label K where Rn is an object of the root set and K is an unused label.

[Compile Time]

At each point, instead of having access graphs for the rootset, have only one access graph for the OverallRoot.

So, an access graph like this:

fig2.png

Becomes something like this:

fig3.png

Note that we can apply NFA minimization at this point. A basic look around shows that this is by no means a trivial/short job, but it is better to make a time trade-off here than at runtime.

[Runtime]

The final step is another mark step with our object graph and overall access graph. This proceeds the same as the current mark step – we need to visit every node once. So, finally, we need to do two mark steps instead of #roots mark steps.

The main motivation for this method came from an earlier paper I read which was about cycle collection (detailed here) which had a similar problem which they solved by taking a transitive closure of their subgraphs.

Here is a paper containing some results related to NFA minimization (I basically read the introduction which was enough for me to be aware of the issue with NFA minimization).

1.3 TODO Heap Reference Analysis (I plan to read this maybe this week + the next)

2 Implementation

Right now, I'm following along the MMTk part of the JikesRVM user guide and the tutorial alongsideso I can get a feel for writing a new GC.

Author: Milind Luthra

Emacs 25.3.1 (Org mode 8.2.10)

Validate