Implementing enhanced for loops in Bytecode
While implementing Owlscript I didnt want to crowd the grammar by having two different types of control flow which only differed by their syntax. This is the case in C-like languages were for loops serve more as syntactic sugaring of the while loop construct. Simultaneously, I've grown quite fond of the the "enhanced for loops" (foreach) found in modern C++ and Java and wanted a similar construct for iterating over lists and strings in owlscript.
I decided to take the "best of both" approach by implementing "regular" while loops which are familiar to any programmer for normal looping tasks:
let counter := 0;
while (counter < 10) {
println counter;
counter++;
}
And additionally adding a foreach loop to allow for quickly iterating over the built-in list and string types:
for (i in [ 1 .. 5 ]) {
println i+i;
}
While syntactically simple in appearance, there is actually alot going on "behing the scenes" in the truest meaning of syntactic sugaring. Todays post is going to talk about the process of encapsulating the traversal mechanics of owlscripts built-in list type and how it is translated to bytecode by the compiler.
A Quick Recap of Looping
As a control structure loops are made up of two main parts: The test expression and the loop body. The test expression is used to determine If the loop body should be executed. The last statement of the loop body is an unconditional jump back to the test expression. If the test expression fails, execution jumps to the first statement after the loop body.
1) Evaluate the test expression
2) If the test expression evaluates to false, we jump to the first statement after the loop body, exiting the loop.
3) if the result of the test expression is true, execute the statements in the loops body
4) jump to the test Expression
5) repeat from step one.
Encapsulating Iteration
Iteration is the process of visiting every element of a collection in an order determined by the collection type. In order to perform this traversal we need the ability to know a) what our current position in the collection is, b) what the next position in the collection is and c) what the last position in the sequence is. With these three pieces of information, we can traverse any sequence using the following idom:
let itr = list.first();
while (itr != list.end()) {
process(itr.get());
itr.next();
}
By separating the traversal from the container, It doesn't matter if the underlying implementation is an array or a linked list, the same strategy holds:
int a[] = {1,2,3,4,5};
int i = 0, len = 5;
while (i < len) {
printf("%d\n", a[i]);
i++;
}
node* it = head;
while (it != NULL) {
printf("%d\n", it->info);
it = it->next;
}
This is the idea underlying the iterator pattern.
Initialization
We initialize an iterator object - in the array case, an integer to index into the array, for link lists, a pointer to a node.
Test for Completion
the test expression is a bounds check on the iterator objects current possition.
Processing Elements
If the current possition is valid, we process the current position accessed by the iterator object.
Traversal
Finally we move to the next position in the sequence before executing our test expression again.
The Game Plan
The overall goal is to generate code which performs a user supplied block of code for every element of a list.
It helps to think of how we would implement it using only the constructs currently available to us in owlscript to help guide our instruction selection. In essence, we our compiler to read in this block of code:
for (x of [1 .. 5]) {
println x;
}
and output the bytecode as if had read this block of code:
{
let idx := 0;
let list := [1 .. 5];
while (idx < list.size()) {
let i := list[idx];
println i;
idx++;
}
}
To the AST I've added two new node types, one statement type and one expression type. The new statement is named FOREACH_STMT, and the expression type is ITERATOR_EXPR. A FOREACH_STMT node's left child points to an ITERATOR_EXPR, and it's right child is a BLOCK_STMT. An ITERATOR_EXPR node has an ID_EXPR for a left child, which is the name to be bound to the list item in the loop body, and its right child is a list or string.
With the addition of the new AST node types, the example from above parses to the following AST:
[FOREACH_STMT] for
[ITERATOR_EXPR] of
[ID_EXPR] x
[LISTCON_EXPR] [
[RANGE_EXPR] ..
[CONST_EXPR] 1
[CONST_EXPR] 5
[BLOCK_STMT] {
[PRINT_STMT] println
[ID_EXPR] x
To aid in our iteration, we will introduce two "invisible" variables to the environment. Their names are "__clti" - the current list to iterate and "__itr" the iterator index. These variables are called "invisible" because they are not declared by nor require the interaction of the programmer.
Translating to Bytecode
As can be seen from the transliteration above, each foreach loop takes place in its own block scope, and so the first instructions performed are those to open a new scope.
void ByteCodeGenerator::emitForeach(astnode* n) {
//foreach loops take place in there own block scope
emit(Instruction(entblk));
symTable.openFunctionScope(n->token.getString(), -1);
Once in the new scope we lookup the addresses allocated during the resolution phase for the "hidden" variables for the index counter and list alias.
int IDX = symTable.lookup("itr").addr;
int SEQ = symTable.lookup("clti").addr;
// set up Iterator object
astnode* itexpr = n->left;
emitIterator(itexpr, IDX, SEQ);
This code handles ITERATOR_EXPR nodes, and is responsible for initializing the iterator index ("__itr") to 0, and aliasing the list to iterate with __clti.
void ByteCodeGenerator::emitIterator(astnode* n, int IDX, int SEQ) {
//sets current index to 0
emit(Instruction(ldconst, StackItem(0.0)));
emit(Instruction(ldaddr, IDX));
emit(Instruction(stlocal));
//evaluate list expression and assign the result a temp name
//this way if passed a list constructor the list only gets built once.
//as we use this temporary name to refer to the list moving forward.
genExpression(n->right, false);
emit(Instruction(ldaddr, SEQ));
emit(Instruction(stlocal));
}
Once the iterator has been initalized we are ready to begin writing the actual looping instructions. We begin by getting the current code point and storing it in P1 so we can jump back to this location later, as this is the address of the test expression. Our test expression compares the current index position to the size of the list.
// loop test expr
int P1 = skipEmit(0);
emit(Instruction(ldlocal, IDX)); //current index into list
emit(Instruction(ldlocal, SEQ)); //current list to iterate
emit(Instruction(list_len)); // obtain its length
emit(Instruction(binop, VM_LT)); //more to go?
Once we've written the test expression we need to perform a little book keeping. We need to once again record the current code position as L1, the start of loop body. This time though we also want skip a space, as we need to backpatch the branch from the test expression - but only once we know where it should jump to!
int L1 = skipEmit(0);
skipEmit(1);
emit(Instruction(ldlocal, SEQ)); //current list were iterating
emit(Instruction(ldlocal, IDX)); // index of current position
emit(Instruction(ldidx)); // get data at that index
emit(Instruction(ldaddr, symTable.lookup(itexpr->left->token.getString()).addr)); //address of runtime iterator
emit(Instruction(stlocal)); //store data to iterator
At the beginning of each iteration we assign the value at the current list position to the name supplied by the user to the ITERATOR_EXPR. Next, the user supplied portion of the loop body is emitted.
//whatever code user wants to perform
genStatement(n->right, false);
After processing the user code, we move the iterator one place forward and jump back to the test expression for the next iteration.
//get us ready for next run through
emit(Instruction(ldlocal, IDX)); //get value of current index
emit(Instruction(incr)); //increment it by 1
emit(Instruction(ldaddr, IDX)); //save new index
emit(Instruction(stlocal));
emit(Instruction(jump, P1)); //jump back up to test expression
And now we are finally ready to backpatch the branch-false instruction which dictates whether to fall through the loop body or jump over the loop entirely.
//backpatch test-expression failure branch
int L2 = skipEmit(0);
skipTo(L1);
emit(Instruction(brf, L2));
restore();
And lastly when exiting the loop we need to close the scope opened when initialzing the for loop.
//close scope we opened for loop
emit(Instruction(retblk));
symTable.closeScope();
}
-
Implementing enhanced for loops in Bytecode
-
Top-Down Deletion for Red/Black Trees
-
Implementing Closures in Bytecode VMs: Heap Allocated Activation Records & Access Links
-
Pascal & Bernoulli & Floyd: Triangles
-
A Quick tour of MGCLex
-
Compiling Regular Expressions for "The VM Approach"
-
Composable Linked Digraphs: An efficient NFA Data Structure for Thompsons Construction
-
Improving the Space Efficiency of Suffix Arrays
-
Augmenting B+ Trees For Order Statistics
-
Top-Down AST Construction of Regular Expressions with Recursive Descent
Leave A Comment