The Interpreter Pattern: Implementing Interpreters the OOP way
When implementing any type of programming language, be it a enterprise scale systems language like Java, a small Domain Specific Language and practically anything in between there is a myriad of design decisions to be made that will influence the overall design of the parser, interpreter, and just about everything other part of the compiler. And with good reason: compilers/interpreters are pretty complex systems. Like most complex systems, an interpreter can be broken down into a series of interacting parts. Object Oriented Programming provides us with a convenient abstraction for implementing those interacting pieces.
My preferred language, C++ is a multiparadigm language which due to its design inherited a whole bunch of baggage from its progenitor: the venerable C programming language. One side effect of all that C baggage however is that design patterns developed with C in mind, a decidedly non-OOP language, can be readily implemented in C++ often with little change to the code required. One of those C design patterns is the tagged union style of composing dynamic objects - which can and very often is used to build Abstract Syntax Trees. The tagged union design pattern has seen successful use for decades of compiler writers using both C and C++ as well as in other languages which facilitate this particular paradigm.
The Procedural Approach
For the uninitiated, the tagged union object building design pattern looks something like this
enum NodeKind {
STMT_NODE, EXPR_NODE
};
enum ExprType {
ID_EXPR, CONST_EXPR, BINOP_EXPR
};
enum StmtType {
PRINT_STMT, WHILE_STMT, LET_STMT
};
struct ASTNode {
NodeKind kind;
union { ExprType expr; StmtType stmt; } type;
struct { char* str; int line_num; Symbol sym; } Attributes;
ASTNode* child[3];
};
There is one "base" struct, whos type and properties are modified and determined through the use of flags, often an enumerated type as shown above. Code which processes these objects often make frequent use of switch statements to guide execution, branching conditionally based on the value of those enumerated types.
Don't get me wrong, this post is not meant to be a critique or attack on using tagged unions - I use this design pattern quite frequently. It's just that it is a bit dated. It is also very much NOT an object oriented design. On on hand its very simple: set a few flags, assign a few properties, and you have your node. This same simplicity is also why reading code written in such a style will take much more effort to determine the intention of that code when reading it.
The interpreter pattern is an Object Oriented approach popularized in the "Gang of Four" book and is itself cobbled together from various other design patterns. Instead of using a single node type, the interpreter pattern makes use of the composite design pattern to replaces the above enumerated types with a class hierarchy. Lets take a gander at what that looks like.
Utilizing the Composite Pattern to Implement AST's via Class Hierarchies
To start with, we need some base types from which two derive the rest of our nodes. As can be seen from the enumeration we have two basic types of nodes, from which to derive further specialized types: Expressions, and statements. Statements are executed for the side effects they produce, such as to display output, or declare a loop, and thus are not expected to produce a result directly. For this reason, we can use a void method for execute().
Expressions on the other hand are evaluated for their value, and so should return the result of the given expressions evaluation. For the time being, that will be an integer. Binary Expressions are internal nodes, and perform an operation on the values obtained by evaluating their two children
class Statement {
public:
virtual void execute(Context& cxt) = 0;
};
class Expression {
public:
virtual int evaluate(Context& cxt) = 0;
};
class BinaryExpression : public Expression {
public:
virtual Expression* left() = 0;
virtual Expression* right() = 0;
};
Both statements and expressions expect to be passed a reference to a context while being executed or evaluated. Context is synonymous with "environment", and is where any named values are saved,
Side Quest: Context
For lack of a better word, A context object is a symbol table which binds names to values, The only two operations a context object must support is add(id, value) and lookup(id). Delete can be useful, but not so much so as to demand its implementation, Speaking of implementation, I'll leave that as your choice, for simplicity sake I used an unordered linked list. Hash tables and search trees are two data structures which see very common use for this task
class Context {
private:
struct node {
string key;
int value;
node* next;
node(string k, int v, node* n) {
key = k; value = v; next = n;
}
};
node* head;
int count;
public:
Context() {
count = 0;
head = nullptr;
}
void add(string id, int value) {
for (node* it = headl; it != nullptr; it = it->next) {
if (id == it->key) {
it->value = value;
return;
}
}
head = new node(id, value, head);
count++;
}
int lookup(string id) {
for (node* it = head; it != nullptr; it = it->next)
if (id == it->key)
return it->value;
return -1;
}
};
It is important that add() first check if the value being assigned is present in the context and update it if so, only creating a new binding in the case that no binding for the given name already exists.
And now back to our main presentation.
A Class Hierarchy for Abstract Syntax Trees
In the scope of our previous scheme, we represent statements and expressions of our grammar using the following interfaces to replace the previous enumerated types
class Statement {
public:
virtual void execute(Context& cxt) = 0;
};
class Expression {
public:
virtual int evaluate(Context& cxt) = 0;
};
To represent a print statement using the previous scheme we would first create an ASTNode object, set the node kind flag to STMT_NODE and then set the statement type flag to the PRINT_STMT. Now, we will create a PrintStatement class which implements our Statement interface
class PrintStatement : public Statement {
private:
Expression* expr;
public:
PrintStatement(Expression* expression) : expr(expression) {
}
void execute(Context& cxt) {
cout<<expr->evaluate(cxt)<<endl;
}
};
You will immediately notice that this Object looks much less like a traditional node in a tree structure than the our previous ASTNode example did. The Interpreter pattern allows us to implement far more specialized objects. A print statement outputs the result of an expression, and this exactly what our PrintStatement node does. Speaking of Expressions, lets take a look at how they are implemented, as they tend to be a bit more interesting.
Const expressions and ID expressions are used to represent constant values, and named variables respectively. As such they are Leaf nodes, and have no child pointers. a Const Expression stores a constant value, and an ID expression stores the name of the variable it represents
class ConstExpression : public Expression {
private:
int _cval;
public:
ConstExpression(int cval) {
_cval = cval;
}
int evaluate(Context& cxt) {
return _cval;
}
int getValue() {
return _cval;
}
};
class IDExpression : public Expression {
private:
string _id;
public:
IDExpression(string label) {
_id = label;
}
int evaluate(Context& cxt) {
return cxt.lookup(_id);
}
string getID() {
return _id;
}
};
A Binary Expression node is used to represent a binary operator, and thus is is an internal node, with a left and right child of Type expression. When a Binary Expression nodes evaluate() method is called, it first calls the evaluate method of both of its child nodes so that it can then apply the operator it represents to them
class BinaryExpression : public Expression {
public:
virtual Expression* left() = 0;
virtual Expression* right() = 0;
};
class PlusExpression : public BinaryExpression {
private:
Expression* _left;
Expression* _right;
public:
PlusExpression(Expression* l, Expression* r) {
_left = l; _right = r;
}
Expression* left() {
return _left;
}
Expression* right() {
return _right;
}
int evaluate(Context& cxt) {
return _left->evaluate(cxt) + _right->evaluate(cxt);
}
};
class MinusExpression : public BinaryExpression {
private:
Expression* _left;
Expression* _right;
public:
MinusExpression(Expression* l, Expression* r) {
_left = l; _right = r;
}
Expression* left() {
return _left;
}
Expression* right() {
return _right;
}
int evaluate(Context& cxt) {
return _left->evaluate(cxt) - _right->evaluate(cxt);
}
};
class MultExpression : public BinaryExpression {
private:
Expression* _left;
Expression* _right;
public:
MultExpression(Expression* l, Expression* r) {
_left = l; _right = r;
}
Expression* left() {
return _left;
}
Expression* right() {
return _right;
}
int evaluate(Context& cxt) {
return _left->evaluate(cxt) * _right->evaluate(cxt);
}
};
class AssignExpression : public BinaryExpression {
private:
IDExpression* _left;
Expression* _right;
public:
AssignExpression(IDExpression* l, Expression* r) {
_left = l; _right = r;
}
Expression* left() {
return _left;
}
Expression* right() {
return _right;
}
int evaluate(Context& cxt) {
cxt.add(_left->getID(), _right->evaluate(cxt));
return cxt.lookup(_left->getID());
}
};
In addition to the arithmetic operators, we can also use our BinaryExpression type to implement the relational operators, allowing us to implement Conditional Expressions:
class LessThanExpression : public BinaryExpression {
private:
Expression* _left;
Expression* _right;
public:
LessThanExpression(Expression* l, Expression* r);
Expression* left();
Expression* right();
int evaluate(Context& cxt);
};
LessThanExpression::LessThanExpression(Expression* l, Expression* r) {
_left = l; _right = r;
}
Expression* LessThanExpression::left() {
return _left;
}
Expression* LessThanExpression::right() {
return _right;
}
int LessThanExpression::evaluate(Context& cxt) {
cout<<"< less than >"<<endl;
int lhs = _left->evaluate(cxt);
int rhs = _right->evaluate(cxt);
return lhs < rhs;
}
class EqualToExpression : public BinaryExpression {
private:
Expression* _left;
Expression* _right;
public:
EqualToExpression(Expression* l, Expression* r);
Expression* left();
Expression* right();
int evaluate(Context& cxt);
};
EqualToExpression::EqualToExpression(Expression* l, Expression* r) {
_left = l; _right = r;
}
Expression* EqualToExpression::left() {
return _left;
}
Expression* EqualToExpression::right() {
return _right;
}
int EqualToExpression::evaluate(Context& cxt) {
cout<<"< equal to > "<<endl;
return _left->evaluate(cxt) == _right->evaluate(cxt);
}
As the name would imply, the interpreter pattern is designed to make interpreting statements and expressions easier. This is accomplished by calling the root nodes execute or evaluate method.
int main(int argc, char* argv[]) {
Context cxt;
cxt.add("t1", 24);
Expression* asExpr = new AssignExpression(
new IDExpression("val"),
new PlusExpression(
new IDExpression("t1"),
new PlusExpression(
new ConstExpression(13),
new ConstExpression(29)))
);
Statement* ps = new PrintStatement(new IDExpression("val"));
asExpr->evaluate(cxt);
ps->execute(cxt);
return 0;
}
Take note of the structure of the constructors within constructors - starting to look pretty tree like, huh? It's not a coincidence. Binary expression nodes make up internal nodes. Statements represent the root nodes of their respective subtrees, and IDExpressions and ConstExpressions are Leaf nodes. StatementList's are a type of statement which has a dynamic number of children, and are used to implement sequences of statements
class StatementList : public Statement {
private:
std::list<Statement*> _statememts;
public:
StatementList(std::list<Statement*>& statements);
StatementList();
StatementList& addStatement(Statement* stmt);
void execute(Context& cxt);
};
StatementList::StatementList(std::list<Statement*>& statements) {
_statememts = statements;
}
StatementList::StatementList() {
}
StatementList& StatementList::addStatement(Statement* stmt) {
_statememts.push_back(stmt);
return *this;
}
void StatementList::execute(Context& cxt) {
for (Statement* stmt : _statememts) {
stmt->execute(cxt);
}
}
The StatementList type can be combined with Conditional Expressions to easily implement the functionality of a while loop
class WhileStatement : public Statement {
private:
Expression* _conditionExpr;
StatementList* _loopBody;
public:
WhileStatement(Expression* cond, StatementList* stmts);
void execute(Context& cxt);
};
WhileStatement::WhileStatement(Expression* cond, StatementList* stmts) {
_conditionExpr = cond;
_loopBody = stmts;
}
void WhileStatement::execute(Context& cxt) {
bool cond = _conditionExpr->evaluate(cxt);
while (cond) {
_loopBody->execute(cxt);
cond = _conditionExpr->evaluate(cxt);
}
}
Another useful Statement is the Expression Statement, which as the name would imply, is a statement which when executed, evaluates an expressions. This may sound a bit strange at first, but is actually a powerful abstraction
class ExprStatement : public Statement {
private:
Expression* _expr;
public:
ExprStatement(Expression* _expr);
void execute(Context& cxt);
};
ExprStatement::ExprStatement(Expression* _expr) {
_expr = _expr;
}
void ExprStatement::execute(Context& cxt) {
_expr->evaluate(cxt);
}
Using these more complex statements together with the expressions from above we can perform more intricate computations. The following example calculates and prints the first N fibonacci numbers:
void fibonacci(int n) {
Context* cxt = new Context();
std::list<Statement*> initStmts = {
new ExprStatement(
new AssignExpression(
new IDExpression("prev"),
new ConstExpression(0)) //prev = 0;
),
new ExprStatement(
new AssignExpression(
new IDExpression("curr"),
new ConstExpression(1)) //curr = 1;
)
};
Expression* loopConditionExpr = new LessThanExpression(
new IDExpression("i"),
new ConstExpression(n)); // i < n
list<Statement*> loopBodyStmts = {
new ExprStatement(
new AssignExpression(
new IDExpression("fib"),
new AddOpExpression(
new IDExpression("prev"),
new IDExpression("curr"))) //fib = prev + curr
),
new PrintStatement(
new IDExpression("fib") //print fib
),
new ExprStatement(
new AssignExpression(
new IDExpression("i"),
new AddOpExpression(
new IDExpression("i"),
new ConstExpression(1))) //i = i + 1;
),
new ExprStatement(
new AssignExpression(
new IDExpression("prev"),
new IDExpression("curr")) //prev = curr;
),
new ExprStatement(
new AssignExpression(
new IDExpression("curr"),
new IDExpression("fib")) //curr = fib
)
};
StatementList* init = new StatementList(initStmts);
StatementList* loopBody = new StatementList(loopBodyStmts);
Statement* loop = new WhileStatement(loopConditionExpr,
loopBody);
init->execute(cxt);
loop->execute(cxt);
}
It's important to note that the interpreter pattern does not specify a way of constructing the syntax tree, only it's structure and behavior are defined. Some type of parser or other utility for constructing the tree must still be utilized - but that is a topic for separate post. Until then, Happy Hacking.
Leave A Comment