MGCLex is a "Lexer Generator" in the spirit of Flex or ScanGen. Input is in the form of a specification file containing a list of Token-rule pairs, 1 per line. Each pair consists of a regular expression pattern and an identifier for the pattern. MGCLex
I've mentioned before that when it comes to the implementation of regular expression matching, the conversation as it appears in the literature tends to begin with NFA's, gives a quick run down of Thompsons Construction, and ends at DFA's with
When it comes to implementing Finite State Automata picking a data structure with which to model the machine is an interesting problem. On the one hand the choice is obvious: Finite state machines be them deterministic or NFAs are generally viewed as d
Suffix Arrays are a neat data structure, to say the least. They allow us to perform efficient substring searchs, "keyword in context" searches, in addition to enabling us to compute the longest common prefix of a string in linear time. Suffix arrays ar
B+ trees are used heavily to implement index structures for many of the leading RDBMS vendors, traditionally for externally stored data. As computer architectures have evolved B+ trees are increasingly finding use as in-memory data structures
-
A Quick tour of MGCLex
-
Compiling Regular Expressions for "The VM Approach"
-
Composable Linked Digraphs: An efficient NFA Data Structure for Thompsons Construction
-
Improving the Space Efficiency of Suffix Arrays
-
Augmenting B+ Trees For Order Statistics
-
Top-Down AST Construction of Regular Expressions with Recursive Descent
-
Balanced Deletion for in-memory B+ Trees
-
Building an AST from a Regular Expression Bottom-up
-
The Aho, Sethi, Ullman Direct DFA Construction Part 2: Building the DFA from the Followpos Table
-
The Aho, Sethi, Ullman Direct DFA Construction, Part 1: Constructing the Followpos Table