Repeating yourself: Compiling While Loops to P-Code

mgcadmin 15-02-2025

In my previous post on code generation we got our feet wet with compiling some basic expressions to p-code. In todays post I'm going to pick up where we left off and jump right in with control statements. In particular, I'll be covering how to generate p-code for while loops.

By adding conditional iteration we gain a considerable amount of computational power when combined with our ability to assign names to variables and evaluate expressions. At the end of this post, we will have everything in place to be able to compile (from an ast) a program such as the following iterative fibonacci algortihm down to p-code:

let prev := 0;
let curr := 1;
let next := 0;
while (next < 100) {
       next := prev + curr;
       prev := curr;
       curr := next;
       println next;
}

In addition to the code generation methods and p machine instructions covered in my previous post on p-code generation, the facilitation of these higher level constructs require that todays post cover several new p-machine instructions in addition to some additional code generation procedures to support them. If you havent yet read my earlier post on p-code generation I suggest giving that a read first. We've got a bit of ground work to cover before we can start actually compiling control flow statements, so lets get to it.

Jumps and Labels

There are two types of jump instructions in the p-machine: unconditional jumps, and conditional jumps. Uncoditional jumps are the direct equivelant to a GOTO, in that they are supplied an address to jump to, which they jump to no matter what. Conditional jumps,on the other hand only jump to the provided address if, and only if, the value at the top of the is false. These Instructions are provided the mnemonics JMP and JNE, respectively.

JMP <target address> - unconditional jump
JNE <target address> - conditional jump

Labels are used to mark the entrance point, or "target" for a jump. They give us something to aim at in that they occupy an address in teh code, but they are a "do nothing" instruction, they exist to say "you want the location after me". And that actually comes in more handy than you might think. The syntax for a label is as follows:

LAB "label_name"

We require all label names to be unique. We cant have just one label "procedure_exit_point" and use it for all of our procedure exit points or we'd end up always jumping to the same position after any function call. This is more than likely not what we intended. By requiring every label to be unique we thus gurantee that every label only references one location.

Another thing to keep in mind is that these labels are meant for use by the compiler, not necessarily for being read by humans, so they dont need to be descriptively named, like "procedure_exit_point", a simple text counter will do:

string makeLabel() {
       static int label_num = 0;
       return "LBL" + to_string(++label_num);
}

Now when we have a block of code that we want to jump into, we preface it with a label instruction. The process of creating a label and emitting a label is handled with the aptly named emitLabel() procedure, which handles both the generation of the unique name, and the emitting of the label in one tidy package.

string emitLabel() {
      string label = makeLabel();
      emit(LAB, makeString(label));
      return label; 
}

We return the generated label so that we know how to be able to find the label again. It's hard to look for something when you don't know what it looks like! The target address of the jump instructions is the address of the label, which we find with the getLabelAddr(string) method:

int getLabelAddr(string label) {
     int i = 0;
     while (i < codepage.size()) {
           if (codepage[i].instruction == LAB && label == toStdString(codepage[i].operand))
                  break;
           i++;
     }
      return i;
}

So far so good, nothing too difficulet, but the one fly in the ointment is when we need to make a reference to a piece of code that hasnt been generated yet. How do we label a block of code that doesnt yet exist? This is accomplished with a technique called "backpatching", which I will explain in the next section.

Backpatching: Handling forward references with grace.

To be able to handle forward references, we're going to need to introduce a few instructions to augment our emit() instruction. Specefically, we need a way to "save a space for later" while we are emitting instructions, as well as a way to move the position in our output to which we emit our instructions. This will be accomplished by using the skipEmit(num spaces), backup(int addr), and restore() instructions. In addition to these three instructions, we also introduce a variable to track the highest position written to thus far, as that will be used by restore() to resume the normal flow of output after performing a backpatching operation.

The skipEmit() instruction is provided a number of spaces to skip in the output, and returns the address of the first skipped position.

int skipEmit(int spaces) {
      int old = cPos;
      cPos += spaces;
      if (highCI < cPos) highCI = cPos;
      return old;
}

The if guard at the end to check if the new position is the furthest yet reached is also added to our emit() instructions:

void emit(Inst op, Value operand) {
      codepage[cPos++] = Instruction(op, operand);
      if (highCI < cPos) highCI = cPos;
}

void emit(Inst inst) {
      codepage[cPos++] = Instruction(inst);
      if (highCI < cPos) highCI = cPos;
 }

The backup() instruction is used to reposition the output point to a previously skipped index, or otherwise saved position in the output, and restore() being used to resume normal output after backpatching, as mentioned previously.

void backup(int addr) {
       cPos = addr;
}
void resotre() {
      cPos = highCI;
}

Now that we have the tools needed, lets take a stab at implementing one of the easier constructs: while loops.

Implementing while loops with jumps

The while loop is one of the most basic of control structures. It also happens to be, along with subroutines and function calls, one of the higher level constructs which eliminated the use of raw GOTO statements. To understand how to implement a while loop like the following, well begin by taking a look at the AST produced by parsing it.

while (x < 10) do begin
      println fact(x);
      x := x + 1;
end

A while statement can be broken into two fundamental pieces: The 'test' expression, which is used to evaluate the condition upon which the looping is predicated, and the loop 'body' which is the list of statements to execute should the test expression evaluate to true. In our generated AST, a while statement is a STMT_NODE with the left most child being the test expression, in this case a RELOP_EXPR. The second child pointer points to the loop body.

[WHILE_STMT] [TK_WHILE, while]                                                                
  [RELOP_EXPR] [TK_LT, <]                   <-- This is the test expression        
      [ID_EXPR] [TK_ID, x]                                                                            
      [CONST_EXPR] [TK_NUM, 10]                                                         
  [PRINT_STMT] [TK_PRINT, println]   <--- loop body starts           
      [FUNC_EXPR] [TK_ID, fact]                                                                                                
       [ID_EXPR] [TK_ID, x]                                                                                                       
    [EXPR_STMT] [TK_ID, x]                <--- this is the second statement of the loop body
       [ASSIGN_EXPR] [TK_ASSIGN, :=]
          [ID_EXPR] [TK_ID, x]
          [BINOP_EXPR] [TK_ADD, +]
             [ID_EXPR] [TK_ID, x]
             [CONST_EXPR] [TK_NUM, 1]

Taking what we know thus far we can begin to develop an execution plan for while loops, using a series of steps.

1) Evaluate the test expression

2) If the test expression evaluates to false, we jump to the first statement after the loop body, exiting the loop.

3) if the result of the test expression is true, execute the statements in the loops body and jump to the test Expression

4) repeat from step one.

While it may seem straight forward, care must be taken when generating the p-code to perform the steps in the correct order. As an example, it isnt until the second half of step 3 where we learn the first thing that we should: label the position of the test expression so it can be jumped back to later. Had we followed the steps verbatim we would lose the jump position.

As such, begin with a call to emitLabel() to denote the position of the test expression and then proceed to call genCode() on the subtree which represents it. We know that during the flow of execution our next course of action is dependant on the result of the test expression, and so should be a conditional jump.

void genWhileStmt(ASTNode* node, bool isAddr) {
       string test_label = emitLabel();
       genCode(node->child[0], isAddr);
       int skipPos = skipEmit(1);

Because we dont yet have a place for it to jump too, we save this space to backpatch the conditional jump instruction into later by using the skipEmit() procedure we implemented earlier. This may seem a bit adhoc, but everything comes together in a moment, i promise.

Next, we call the genCode() procedure on subtree for the loop body, and follow that up with an unconditional jump to the address of the test_expression

       genCode(node->child[1], isAddr);
       emit(JMP, makeInt(getLabelAddr(test_label)));

.Now, we emit a label to mark the first location after the loop body, This is where we would jump to should the test expression evaluate to false. We then call the backup() procedure to the position we saved earlier, and emit our conditional jump expression to the address of the exit label.

       string exit_label = emitLabel();
       backup(skipPos);
       emit(JPC, makeInt(getLabelAddr(exit_label)));
       restore();
}

A call to restore() places us back at the normal output position, and we have ourselves a while loop!

Let's take a look at the generated P-Code:

0: ( LDA, 5000, 0)   ;  push address of x
1: ( LDC, 0, 0)      ;  push 0
2: ( STN, 0, 0)      ;  store 0 to address of x
3: ( LAB, L0, 0)     ;  test expression
4: ( LOD, 5000, 0)   ;  push x
5: ( LDC, 6, 0)      ;  push 6
6: (  LT, 0, 0)      ;  x < 6 ?
7: ( JPC, 16, 0)     ;  jump to end if not
8: ( LOD, 5000, 0)   ;  push x;
9: (PRINT, 0, 0)     ;  print x;
10: ( LDA, 5000, 0)  ;  push address of x
11: ( LOD, 5000, 0)  ;  push value of x
12: ( LDC, 1, 0)     ;  push 1
13: ( ADD, 0, 0)     ;  add 1 to x
14: ( STO, 0, 0)     ;  store in address of x
15: ( JMP, 3, 0)     ;  jump to test expression
16: (HALT, 0, 0)     ;  All done.

That's all for today, but stick around because in my next post we're going to tackle If/else statements. Until then, Happy Hacking!

MaxGCoding.com

For the love of programming

Repeating yourself: Compiling While Loops to P-Code

Jumps and Labels

Backpatching: Handling forward references with grace.

Implementing while loops with jumps

Leave A Comment

Procedural Map Generation with Binary Space Partitioning

Exact Match String Searching: The Knuth-Morris-Pratt Algorithm

The visitor pattern: OOP takes on the Expression Problem

Improving mgcLisp's define syntax

Separating the Men from the Boys, Knuth Style

Reducing rotations during deletion from AVL trees

Parsing Lisp: From Data To Code

Swiz Heaps: A deterministic modification of Randomized Meldable Heaps

Let's talk Eval/Apply

BST Deletion: Removal By Merge