Compiling Procedures to P-Code

Having previously covered conditional statements and loops, there is only one topic left to cover in my series on compiling control flow constructs to pcode, and of course I have saved the best for last. In todays post we're going to talk about compiling user defined procedures and function calls. We'll once again be introducing new p-code instructions, as well as covering the calling sequence for procedures on our P-MachineVM.

At the end of the post we'll be able to use what we've covered in the previous two posts in combination with todays material to take the AST of a program like the following and produce p-code to execute it on a p-machine.

def fib(var m) {
     if (m < 2) {
          return 1;
     }
     return fib(m-1)+fib(m-2);
}

let i := 1;
while (i <= 10) {
    println fib(i);
    i := i + 1;
}

As usual we will start from the AST level and work our way down to p-code. We've got a bit of theory to cover as well in this one. We need to know a bit more about the inner workings of the P-Machine then we did in the previous two posts, but nothing crazy, so let's begin.

A word on terminology

Procedures, Functions, Subroutines, Methods... depending on the programming language, the author, time period of writing, etc. all of these names have at some point been used in the literature to refer to the same thing: A named block of code. Some authors choose to distinguish between procedures and functions such that a function will return a value while procedures do not, though this is far from a universal truth. One distinguishing charachteristic amongst them is that methods in particular are understood to be procedures or functions that are also class members in object oriented languages. 

Regardless of any differences, real or perceived, when it comes to generating code all of the above names can be used interchangably because it is done exactly the same, whether the function returns a value, or a procedure doesnt, or a method is a class member, the actions performed and instructions generated are the same. With this being the case, I will be using the names interchangeably throughout the post.

P-Machine Instructions For Procedures

In addition to the previously introduced branch and label instructions, additional p-machine instructions are required to support procedures and perform function calls. Procedure calls are one of the most complicated operations requiring changes to the stack, the saving of registers, and changing of pointers. The following list is the instructions used in performing procedure calls on the P-Machine VM:

MST - Mark Stack, allocates space on the stack for the new stack frame
INC - increments the top of stack pointer, essentially reserving space on the stack.
CAL - Call procedure - sets the return address in the stack frame and increments the instruction pointer to the address of the procedure
ENT - Enter Procedure - Has no side effect, simply marks the beginning of the procedure
RET - Return From Procedure - cleans up stack by popping the stack frame and locals from the stack and leaving the result (if there is one, null value if not) on top of the stack.

When a function call is performed a series of actions termed the calling sequence must take place. The calling sequence is setup so that when instructions are executed in the correct order, then the runtime environmnet is in the correct state for them to complete successfully. This involves (but isnt limited to) allocating space for and creating a new stack frame, allocating space for parameters and local variables, passing parameters to the function and returning values upon completion. 

 

Some of the instructions in the calling sequence are executed in the context of calling the procedure while others are executed in the context of the procedure being called. For this reason, the calling sequence is divided between the callers responsibilities and the callee's.

The Caller

The first instruction used in a procedure call on the p-machine is the MST instruction. MST increments the stack pointer five spaces to allocate the new stack frame on the call stack. This leaves us with slots for a return value, the static link, dynamic link, return address, and new top of the stack. Once this is done, any arguments to the procedure are pushed on to the stack, followed by incrementing the stack pointer to accomodate any local variables for the procedure with the INC instruction.

The P-Machine Stack Frame[1]

The VM is now in a state where it is ready for the CAL instruction. The CAL instruction sets the return address in the stack frame, and performs a jump to the procedures code body by setting the instruction pointer to the address of the ENT instruction. At this point in execution the local context is now that of the callee. The the enclosing scope is the local context of the caller.

The Callee

The ENT instruction does nothing to change the state of the VM, with the exception of when the instruction pointer increments, the next instruction is the first instructrion of the procedures body. For all intents and purposes, ENT is a special purpose LBL that is reserved for use with procedures.

When leaving a procedure, the RET instruction is executed. While the mnemonic RET is short for "RETURN FROM PROCEDURE", in reality it is the dual of the MST instruction. The RET instruction is responsible for popping the current stack frame from the stack, resetting the instruction pointer to the return adress, and resetting the base pointer to the enclosing scopes bp, as well as positioning any return value at the top of the stack.

Generating Code For Procedure Definitions

At an AST level, procedures are parsed into a tree with a Function Definition Statement Node as the root, Its first child pointer being the parameter list, and the second child pointer being the body of the code.

def fib(var n) {
     if (n < 2) {
         return 1;
     }
     return fib(n-2)+fib(n-1);
}

[FUNC_DEF_STMT] [TK_ID, fib]
   [LET_STMT] [TK_ID, n]
   [IF_STMT] [TK_IF, if]
    [RELOP_EXPR] [TK_LTE, <=]
     [ID_EXPR] [TK_ID, n]
     [CONST_EXPR] [TK_NUM, 1]
    [RETURN_STMT] [TK_RETURN, return]
     [CONST_EXPR] [TK_NUM, 1]
  [RETURN_STMT] [TK_RETURN, return]
    [BINOP_EXPR] [TK_ADD, +]
      [FUNC_EXPR] [TK_ID, fib]
       [BINOP_EXPR] [TK_SUB, -]
        [ID_EXPR] [TK_ID, n]
        [CONST_EXPR] [TK_NUM, 2]
      [FUNC_EXPR] [TK_ID, fib]
       [BINOP_EXPR] [TK_SUB, -]
        [ID_EXPR] [TK_ID, n]
        [CONST_EXPR] [TK_NUM, 1]

Code generation for procedures begins by creating a new scope in the symbol table for the procedure being created, so that offsets for memory addresses of any local variables or parameters can be correctly computed. We then immeditely use skipEmit() to save a position for backpatching a JMP passed the procedure.

Every procedure is prefaced with an uncodtional jump over the entire procedure. This is so that the code of a procedure is only executed when explicitly jumped to it's ENT instruction.

void genFunctionDefinition(ASTNode* node, bool isAddr) {
        st.openScope(node->data.strval);
        int s1 = skipEmit(2);
        emit(ENT, makeString(node->data.strval));
        genCode(node->child[1], isAddr);
        emit(RET);
        int c1 = skipEmit(0);
        backup(s1);
        emit(JMP, makeInt(c1));
        emit(INC, makeInt(st.scopeSize(node->data.strval)));
        restore();
        st.closeScope();
}

As you may have guessed, we emit() an ENT with the name of procedure as its operand before calling genCode() for the procedures body. After emiting the RET instruction we call skipEmit() with a value of 0 to get the current position. We use this value as the target for the unconditional jump from the beginning which we is now backpatched into place.

0: (  JMP,     21,     0)   
1: (  INC,     1,     0)    
2: (  ENT,     fibo,     0) 
3: (  LOD,     0,     0)    
4: (  LDC,     1,     0)    
5: (  LTE,     (nil),     0)
6: (  JPC,     9,     0)    
7: (  LDC,     1,     0)    
8: (  JMP,     20,     0)   
9: (  MST,     (nil),     0)
10: (  LDP,     0,     0)   
11: (  LDC,     2,     0)
12: (  SUB,     (nil),     0)
13: (  CAL,     2,     0)
14: (  MST,     (nil),     0)
15: (  LDP,     0,     0)
16: (  LDC,     1,     0)
17: (  SUB,     (nil),     0)
18: (  CAL,     2,     0)
19: (  ADD,     (nil),     0)
20: (  RET,     (nil),     0)

Having generated the code for the procedure, we can now turn our attention to generating the code to call it.

Generating Code For Procedure Calls

Generating the code for the actual function calls themselves are much less involved: We need to issue the MST instruction, place any arguments to the procedure on the stack, followed by the CAL instruction. Procedure calls are represented in the AST by a FUNC_EXPR node, with the function name as an attribute and any parameters located on it's first child pointer. As can be seen, our procedure has one parameter: the constant 6. 

println fib(6);

[PRINT_STMT] [TK_PRINT, println]
    [FUNC_EXPR] [TK_ID, fib]
     [CONST_EXPR] [TK_NUM, 6]

As previously mentioned, we begin by emiting the Mark Stack (MST) instruction. Parameters are linked in a statement list by their next pointers, and so we traverse the list calling genCodeNS() on each parameter. Once the code for the parameters has been generated, we emit the CAL instruction with the procedures address as it's operand.

void genFunctionCall(ASTNode* node, bool isAddr) {
        emit(MST);
        ASTNode* t = node->child[1];
        genparam = true;
        while (t != nullptr) {
             genCodeNS(t, isAddr);
             t = t->next;
        }
        genparam = false;
        emit(CAL, makeInt(getFunctionAddr(node->data.strval)));
}
21: (  MST,     (nil),     0)
22: (  LDC,     6,     0)
23: (  CAL,     1,     0)
24: (PRINT,     (nil),     0)
25: ( HALT,     (nil),     0)

Well, that's what I got for today. Until next time, Happy Hacking!

[1] Pemberton, Steven "Pascal Implementation", 


Leave A Comment