Building JBurgAnnotation Subclasses
Overview
Objective
Previous JBurg implementations used a general-purpose JBurgAnnotation object to compute information
about i-nodes; these annotation objects had a variable-sized Vector of children, and they allocated
int arrays to track cost and rule information.
Once the annotation was allocated, the BURM immediately and exhaustively computed the cost and rule
numbers for every possible reduction -- including reductions that could never be used in context.
For example, given the grammar:
stat = FOO(foo);
stat = BAR(bar);
foo = ID: veryExpensiveCostFunction();
bar = ID;
The BURM called veryExpensiveCostComputation() on all ID nodes, not just the ID nodes that were subtrees
of a FOO node.
The rebuilt JBurgAnnotation subclasses have the following major improvements:
- For fixed-arity patterns, subtree information is stored in fixed fields.
- Costs are only cached for pattern matches with subtrees.
- Cost and rule information is computed on-demand (and, in the case of costs,
subsequently cached if appropriate). The ID subtree of a BAR node in the
example above will never be asked the for result of veryExpensiveCostComputation().
Anatomy of an Annotation
This simple annotation class comes from the pattern ADD(expression, expression)
class JBurgAnnotation_ADD_2 extends JBurgSpecializedAnnotation
{
// Fixed-arity storage for the node's two subtrees.
private JBurgAnnotation subtree0;
private JBurgAnnotation subtree1;
// The cached cost of an expression.
private int cachedCostFor_expression = -1;
public int getCost( int goalState)
{
switch( goalState )
{
case __expression_NT:
{
if ( cachedCostFor_expression == -1 )
{
// Pattern-match rules evaluated out-of-line
cachedCostFor_expression = getCostForRule_78105738(goalState);
}
return(cachedCostFor_expression);
}
}
return(Integer.MAX_VALUE);
}
public int getRule( int goalState)
{
int rule = -1;
int bestCost = Integer.MAX_VALUE;
switch( goalState )
{
case __expression_NT:
{
int currentCost = getCostForRule_78105738(goalState);
if ( ( bestCost > currentCost ) )
{
rule = 1;
}
break;
}
}
return(rule);
}
public int getArity( )
{
return(2);
}
}
The rest of the class is essentially bookeeping code, except for the cost functions:
private int getCostForRule_78105738( int goalState)
{
return(normalizeCost((long)1 + (long) (((long)this.getNthChild(1).getCost(__expression_NT)) + ((long)this.getNthChild(0).getCost(__expression_NT))) ));
}
normalizeCost(long) does overflow-safe addition.
Selecting a JBurgAnnotation Subclass
In the simple cases, selecting an i-node's annotation class is done by checking the node's operator and arity:
public JBurgAnnotation getJBurgAnnotation( TestINode node)
{
switch( node.getOperator() )
{
case ADD:
{
if ( node.getArity() == 2 )
return(new JBurgAnnotation_ADD_2(node));
break;
}
case INT:
{
if ( node.getArity() == 0 )
return(new JBurgAnnotation_INT_0(node));
break;
}
}
throw new IllegalStateException("No annotation match for " + node);
}
This fails if the i-node can have fixed-arity and variable-arity rules that can overlap:
stmt = FOO(bar, bar);
stmt = FOO(bar, bar+);
Fortunately, JBurg's pattern matchers have been always able to disambiguate these patterns,
so the only cost is using a variable-arity subtree store where a fixed-arity store might
have sufficed.
Implementation
Sorting the Patterns
The first thing that JBurg does is to sort the patterns by operator and by arity. As noted
above, variable-arity patterns and fixed-arity patterns that encroach on the variable-arity
patterns' space all get lumped into a common annotation; fixed-arity patterns with arity
less than the smallest viable variable-arity match get their own fixed-arity annotation.
This is done by dumping the rules into an associative store, JBurgGenerator.RulesByOperatorAndArity.
RulesByOperatorAndArity sorts the rules into their proper operator/arity buckets and returns an Iterable
view of the annotations for each opcode.
Generating the Annotations
JBurg 1.9 supports both the old, generalized annotation system and the new one.
This is because the C++ target hasn't been converted to use the new system.
JBurgGenerator.generate() selects the proper regime based on the code emitter's capabilities.
For specialized annotations, the BURG calls emitCompressedAnnotations(PrintStream) to emit
the specialized annotations, bypassing calls to emit computeCostMatrix() etc.
emitCompressedAnnotations
emitCompressedAnnotations() calls the RulesByOperatorAndArity routines that yield the buckets
of rules for each annotation, and calls emitAnnotation() to emit the class.
emitCompressedAnnotations() then walks the buckets again to emit getJBurgAnnotation().
emitAnnotation
Preparation for Semantic Analysis
emitAnnotation() first constructs a ClosureGraph of all nonterminal states reachable from
the rules in the current rule set.
Next it factors out common subtrees in the pattern; this factoring turned out to be not as
beneficial as hoped, but it still contributes a little bit to readability so it's still in.
Note that the immediate children of the annotation's own i-node are never factored; the
arity check guarantees they're present.
Once all the rules have been processed and added to the closure graph, we construct a merged
set of rules and closures. Rules and closures implement the JBurgProduction interface, which
has the methods required for further semantic analysis of the merged set.
Interesting Information in the Merged Set of Productions
The most interesting question a production can answer is whether its cost is constant.
For a rule, this is decidable by simply looking at the cost term: it's going to be one of:
- An integer literal, which is always constant.
- A cost function call, which is never constant.
- An identifier, which may be constant (at compiler-compile time) if it references
a JBurg.Constant manifest constant.
A closure has a constant cost if its own cost is constant and the cost of its antecedent
is constant. Closures are used in many contexts, so it's necessary to have the set of
viable productions at the particular call site to decide this. This is the principal
reason emitAnnotation forms this collection of productions.
Compiler-Compile Time Dynamnic Programming
Once we can decide if a given production is constant, we can also decide if an entire
collection of productions all have a constant cost; and since we can perform computations
with a constant cost at compiler-compile time, we can decide which of this collection of
productions has the least cost. Thus we can precompute the cost (and rule) for a surprisingly
large number of annotations.
Emitting the Annotation's getCost() and getRule() Methods
If the dynamic programming can't be performed at compiler-compile time,
then the BURG emits logic of this general format:
currentCost = costForCurrentAlternative; // may be a constant, an identifier, or a function call
if ( currentCost < bestCostSoFar )
{
bestCostSoFar = currentCost;
// rule = currentAlternative's rule, if this is a rule computation
}
This logic can be optimized several ways:
- If there is only one production, and the cost isn't cached, just return it.
- If a cost is constant, it can be used directly instead of indirectly via currentCost.
- The first alternative in a cost computation doesn't need to check its cost against bestCostSoFar.
- The last alternative in a rule computation doesn't need to propagate its cost into bestCostSoFar.
- If the cost is to be cached, store directly into the cache variable instead of bestCostSoFar.
Emitting the Other Bits of the Annotation
getArity()
If the annotation is fixed-arity, then getArity() returns a constant value.
If the annotation may be variable-arity, then getArity() returns the size of the fixed-size prefix
(if any), plus the number of children in the variable-sized child collection.
getNthChild()
If the annotation is fixed-arity, then the indexes are mapped to the corresponding field, which is returned.
If the annotation is variable-arity, then the indexes are mapped either to a field holding a child that appears
in the fixed-arity prefix, or to a member of the variable-sized child collection.
Why does a variable-arity node use both methods? Good question. Probably shouldn't.
Caching Cost Function Results
Cost functions' costs are cached in the annotation, to conform to its contract to only call the function once.
These costs are saved in fields, and accessed by helper methods that check the fields and call the cost functions
if the fields haven't been initialized.
Emitting Pattern-Matching Functions
The pattern-matching logic that was formerly in computeCostMatrix() moves to a set of helper methods, one helper
for each pattern-matching rule. These methods could be inlined but separating them out helps readability.