YARE YARE


  • Regular Expression Engine


Mentors :

  • Shashank Balaji & Tirthankar Mazumder

Mentees :

  • 5-6


We will be writing a blazing fast 🚀 regular expression engine in C++20. We will implement the Thompson NFA algorithm to achieve speeds hitherto undreamt of.

If there is time, we can look into benchmarking our engine against the standard library regex engine, and look into implementing ECMAScript compatible regular expressions.

Resources:
- https://swtch.com/~rsc/regexp/regexp1.html
- https://kean.blog/post/lets-build-regex
- https://262.ecma-international.org/12.0/#sec-regexp-regular-expression-objects
Prerequisites:
Being enthusiastic about C++. In your proposal, mention the following:
1. Previous experience in programming, apart from CS 101 (in any language, preferably C++).
2. Motivation for studying basic automata theory and parsing.
3. Previous experience in working with medium-sized codebases (if any, not mandatory).
4. Some basic knowledge of regular expressions.
5. Anything else you feel is worth mentioning.

Tentative Timeline :

Week Work
Week 1 Read up on the relevant theory (basic parsing, basic automata theory, etc.)
Week 2 Implement the parser and AST.
Week 3-4 Implement the "lego blocks" for the NFA.
Week 5 Construct the NFA from the AST using the "lego blocks".
Week 6 Brainstorm the Runner API and implement it.
Week 7 Integrate the NFA class with the Runner class.
Week 8 Integrate the Google Test framework into the project, write tests for the engine and make them pass.