The Most Expensive Crappy Toy Compiler Ever Built?
You've probably seen the stream of posts. Anthropic let AI agents using Opus 4.6 build a Rust-based C compiler. It cost 20 000 USD in API fees, which is nothing compared to the cost of having real compiler engineers build an industrial-strength optimizing retargetable C compiler. What previously took hundreds of engineers decades to create can now be easily done by AI. And so on. And so forth. But is that really true?
What the agents actually produced is a crappy toy. I hesitate even to call it a C compiler because it ignores so many parts of the C standard that it almost feels like a crime against computer science. In low-level programming, "looks like it's working" is often worse than "not working at all" because it introduces silent bugs.
I ran some very basic tests (see the screenshots). The compiler completely ignored the const keyword. It didn't mind me defining the same variable multiple times in a row with different types. Compiling with -O3 produced the exact same binary as with -O0. The optimization passes are also toy-level compared to GCC's.
These are just some random findings. It's not that there are some finishing touches left; the problems with this compiler are massive. What we have here is a happy-path compiler at best, and the happy path is the easy part. The GCC torture tests seem like an impressive test suite to pass with a 99% success rate - particularly if you focus on the name. But they're precisely the test suite to focus on if you have these types of problems and wish to pass a lot of tests anyway.
The whole story is a masterclass in marketing for agentic workflows, of course. It might also be a masterclass in agentic orchestration, but it's a failure in compiler engineering. Sure, this might be the worst this compiler will ever be, as they always say, but to me, it proves that AI isn't anywhere near ready to replace compiler engineers. A cheap but crappy alternative is nice for a random startup's web app demo, but not for a compiler.
Another take on this is that AI had to cheat by using GCC as an oracle to pull this off. It wasn't enough that Opus 4.6 surely has been trained on the content of numerous compiler books, compiler source code, compiler theory lecture notes, etc. It hasn't actually "mastered" the C standard; it's performing a high-speed trial-and-error search until it matches the output of a human-built tool. A very tricky part of building a C compiler front-end is understanding and translating the standard text into code. The agents did nothing of the kind.
Could the agents have created a C compiler from only the standard document, without ever being trained on compiler source code? With the current LLM architecture, the answer is almost certainly no. It feels like we are witnessing the Dunning-Kruger effect of AI agents. I admit it's an interesting experiment, well executed, but it doesn't deserve the amount of "wow" reactions it's getting right now.