I appreciate how wonderfully simple and dependency free this is. More often than not people just want to write a compiler of sorts without bison or yacc, or LLVM, and just convert expressions into assembler or vm-like instructions that _just run_. This is a great starting point for that, and I wish I had something like it 10 years ago.
(Crenshaw's Let's Build a Compiler is an excellent source too if you want to go one step lower and go standard library free, focusing only on function call stacks: https://compilers.iecc.com/crenshaw/)
I’d argue that you should start with doing an interpreter and push it as long as you can.
But once you’re ready to generate code, use llvm or something like that because you are going to hit very problematic roadblocks early in your effort that will kill your interest in compilers otherwise.
Some examples are: ensuring you can compare expressions for equality, ensuring that changed expressions don’t violate dominance, SSA conversion, avoiding infinite loops when analyzing cfgs. If you don’t start with knowledge of several things like this, you are looking at rewriting your compiler a dozen times. That is a great way to learn about compilers, but perhaps not the first compiler you make.
That's right. Crenshaw's LBAC can make you an interpreter too. LBAC's magic lies in single token read ahead, single character identifiers, and single character digits. It parses its BNF grammar only with recursive function calls (no stacks, no tables for operator precedence, etc). The final language you build is almost useless, but it is done without a standard library (eg. resizable strings, resizable arrays, stacks, queues, associative containers), and serves as an introduction to get one hooked on more advanced compiler theory like SSA. I love OP's work. I consider OP's implementation (and similarities) to be a modern day LBAC. It's just that LBAC is one of those things I'd take to the grave. I have never found such a concise introductory text that produces a working final example in such delightful piecewise prose.
That's great advance in the general, modern case.. however, I want to write small compilers for an archaic minicomputer where no llvm or similar exists or can exist, and for that purpose articles like this are inspirational and very useful.
Shameless plug for another relatively small compiler written in Python with no dependencies: http://cwerg.org
Not to take away from the author or you, but we had things like this 10 years ago https://github.com/ymyzk/tinyc/tree/master/tinyc
Unless I am mistaken, this uses LLVM?
I'm the author of ymyzk/tinyc and was surprised to see my project mentioned here after more than 10 years. Since the README is written in Japanese, let me answer in English here. The project has two backends: one for NASM (x86) and another for LLVM IR.
It can. It can also output x86 nasm, here's the relevant part.
https://github.com/ymyzk/tinyc/blob/master/tinyc/generator/n...
But if otherwise anyone want to try with Yacc/Lex I have an online interpreter/editor with more than 250 examples to try/study at https://mingodad.github.io/parsertl-playground/playground/ and just included a grammar for tinycompiler there (select "Tinycompiler parser" from "Examples" then click "Parse" to see a parse tree for the content in "Input source").
lovely from-scratch implementation, without the magic and complexity of lex and yacc. brings back memories from 30+ years ago when I implemented a subset of C in C using lex and yacc in a compiler course.
in your blog, i would have liked to see the BNF grammar (without having to peek into the code).
now, heres your next assignment :) - do the same for functional programming (lambda calc) or logic programming (prolog). That would round out the education on languages and compilers for computation.