MGCBasic
MGCBasic is my implementation of the BASIC programming language. Written in C++ and sporting an REPL-like BASIC environment reminiscent of that found on the Radio Shack TRS-80. All of the Tiny BASIC syntax, with the exception of single line functions has been implemented, with additional well known BASIC constructs, such as for loops.
Having grown from the ashes of my project to implement PL/0 for my stack machine project, the lexical analyzer and parser for MGCBasic share some of the code from my PL/0 compiler. While the PL/0 project petered out as I was increasingly busy with work at the time, the desire of implementing a programming language never left me. Despite providing the early influence, this project ultimately bears little resemblance to my PL/0 compiler.
During the course of writing 2 recent blog posts, the first on implementing a regular expressions engine with Non-deterministic Finite Automatons, and the other on evaluating algebraic equations, I decided to give implementing a programming language another shot. Further reading on the subject offered me some key insights which influenced the direction and decision making process of the project.
Moments of Insight
There were some key insights that coalesced to ultimately make this project work:
- All programming languages are abstractions of a problem in an effort to make solving that problem easier. BASIC is a fairly thin abstraction layer over assembly language, making its implementation rather straightforward. This is because BASIC (mostly) lacks block structuring and scope.
- A key/value map ADT and an eval() procedure will bring you a decent way towards a working BASIC interpreter - both of which I had already developed several robust implementations of each, and thus had on hand.
- Because Tiny BASIC is not structured, each line of code is mostly "independent" of the other lines. You can parse it and interpret it one line at time. And with only a few minor exceptions, the lines are then executed sequentially. The nature of BASIC syntax means that with the exception of algebraic expressions (which are handled differently via the eval() procedure) the Abstract Syntax Tree is more like an Abstract Linked List.*
- And finally: an interpreter that works is a heck of a lot better than a compiler that doesn't. Its always helpful to learn to crawl before you run. While this isn't a technical insight, it persuaded me to try implementing a BASIC interpreter before reviving my struggling pascal compiler project.
My initial plan was to implement a recursive descent parser. After looking at the BNF grammar for Tiny BASIC that is provided on it's Wikipedia page, and in combination with insight number 3 from above, I went with a somewhat different strategy.
MGCBasic is for all intents and purposes a tree walking interpreter, even if that tree happens to be an array of linked lists. When it comes to evaluating mathematical expressions, I used dijkstra's two stack algorithm which I discussed from my post on evaluating algebraic expressions.
I wanted to use as much code as I could that I had previously developed for examples on my blog where ever it was applicable. In addition to the eval() procedure mentioned above, the symbol tables used during lexing and parsing, as well as storing runtime variables is from my post on iterable hash tables.
Additionally, line management for the REPL and interpreter is performed by the iterative AVL tree discussed on my blog. Small portions of the lexing and parsing methods borrow code from my articles on PL/0 and stack machines.
Any data structure is a linked list with the right attitude*
Getting MGCBasic
MGCBasic is free and opensource, the code of which is made available under the MIT license on github at https://github.com/maxgoren/MGCBasic and has been built and tested on Windows, MacOS*, and Linux*.
A precompiled windows binary made from a snapshot of the development branch is available here.
Requires C++11 compliant compiler(tested with clang++ and g++)*
Current Features:
This project is still in its early stages and should not be considered "stable", nor should this list be considered "complete".C
- Run .bas source files from the command line or in the REPL.
- REPL - this is more like an old school basic environment as you'd find on the TRS-80, etc. than a modern read-eval-print-loop
- evaluate infix math expressions, expressions can intermix variable names with numbers (gasp!)
- assign integers to variable names using 'let var := ' (yes, assignment is done with :=, my implementation my choice :D)
- variables can be updated without using 'let' once they've been initialized(declared variables must be initialized to a value at time of declaration).
- Automatic line numbering in REPL. Line numbers increments by 5, but you can specify whatever numbering scheme you wish.
- print statements can print strings in single quotes, or variable names. every print statement includes the new line automatically.
- Loops are performed via goto. Eat your heart out Dijkstra! (Just kidding. his shunting yard algorithm is crucial to the eval() method).
Using the REPL
While I call it a REPL, it really is a BASIC environment like what you find on an early micro computer. If those environments are considered REPL's are a matter of debate. Regardless, some of the features of the REPL are:
- interactive development using built in BASIC environment.
- load source from file using ".load filename"
- can edit previously entered line by entering new line with same line number.
- if you omit line numbers, they will be assigned automatically, starting at 10 and incrementing by 5. Don't ask why they don't start at 5.
- .list will show you the program entered so far, including line numbers.
- .run executes the program you've entered or loaded
- .done to exit the REPL
Below is an example session showing the REPL in action, you can see that lines can be entered "out of order", and that omitted line numbers will be generated automatically.
max@MaxGorenLaptop:~/mgcpl$ ./mgcpl
MGCBasic 0.1b (c) 2023 maxgcoding.com
-------------------------------------
repl> let a:= 2;
repl> 10 let a := 2;
repl> let b := 3;
repl> print c;
repl> 25 let c := a + b;
repl> .list
10 let a := 2;
20 let b := 3;
25 let c := a + b;
30 print c;
repl> .run
5
repl> .quit
repl> .done
max@MaxGorenLaptop:/mnt/c/Users/mgoren/Desktop/pmpc/mgcpl$
Loading source from a pre-written file from the REPL:
max@MaxGorenLaptop:/mnt/c/Users/mgoren/Desktop/pmpc/mgcpl$ ./mgcbasic
MGCBasic 0.1b (c) 2023 maxgcoding.com
repl> .load fibonacci.bas
fibonacci.bas:
repl> .list
10 let prev := 0;
15 let curr := 1;
20 let next := curr + prev;
25 print prev;
30 print curr;
35 prev := curr;
40 curr := next;
45 if (next < 50) then
50 print next;
55 next := curr + prev;
60 goto 35;
65 end
repl> .run
0
1
1
2
3
5
8
13
21
34
repl> .done
max@MaxGorenLaptop:/mnt/c/Users/mgoren/Desktop/pmpc/mgcpl$
Embedding MGCBasic in Other Projects
An example of supplying a program as a vector of strings. This makes it possible to use MGCBasic as an embedded scripting language for your projects.
#include <iostream>
#include <vector>
#include "mgcpl_interpreter.hpp"
using namespace std;
int main() {
vector<string> program = {
"10 let x := 10;",
"20 let y := 5;",
"30 let z := x + y;",
"40 if (z < 21) then",
"45 z := z + 1;",
"50 print z; ",
"60 goto 40;",
"70 end;",
"80 print 'done.';"
};
MGCBasic mgc;
mgc.runProgram(program);
return 0;
}
max@MaxGorenLaptop:~/mgcpl$ g++ -g mgcpl.cpp
max@MaxGorenLaptop:~/mgcpl$ ./a.out
16
17
18
19
20
done.
max@MaxGorenLaptop:~/mgcpl$
Debugging
MGCBasic has some built in debugging facilities available from the REPL. .tokens will output the AST generated by the parser. .symbols shows all of the variables that have been created and their values.
max@MaxGorenLaptop:/mnt/c/Users/mgoren/Desktop/pmpc/mgcbasic$ ./mgcbasic
MGCBasic 0.1b (c) 2023 maxgcoding.com
repl> let a := 1;
repl> .run
repl> .tokens
<[ Token: NUM, line no: 10 , index: 0 - 2, String: 10 ]>
<[ Token: LETSYM, line no: 10, index: 3 - 6, String: let ]>
<[ Token: IDSYM, line no: 10, index: 7 - 8, String: a ]>
<[ Token: ASSIGNSYM, line no: 10, index: 9 - 11, String: := ]>
<[ Token: NUM, line no: 10, index: 12 - 13, String: 1 ]>
<[ Token: SEMICOLON, line no: 10, index: 13 - 14, String: ; ]>
repl> .symbols
a: 1
repl> .exit
max@MaxGorenLaptop:/mnt/c/Users/mgoren/Desktop/pmpc/mgcbasic$
And with that I just want to say...
"She might be primitive, But I'm still Damn Proud!"
-
Parsing Array Subscript Operators with Recursive Descent
-
Map & Filter in Scheme & C
-
Persistent Symbol Tables for Lexical Scoping
-
The Festival of 1 + n + f(n-1) Lights
-
The Heart of Pratt Parsing: Top-Down Operator Precedence
-
Compiling expressions to P-Code by AST Traversal
-
Ternary Search Tries: String Specific Ordered Symbol Tables
-
Digital Search Trees
-
Lossless Compression Part III: Huffman Coding
-
Lossless Compression Part II: The LZ77 Algorithm
Leave A Comment