Published
February 20, 2026
AI
I ran Anthropic's AI-written C compiler through my novelty-scoring pipeline, expecting to confirm my public position that GenAI can't do systems programming. The data forced me to retune my own metrics. What I found instead was a sharper question for anyone running an engineering team: what's your ratio?
Published
February 20, 2026
Reading time
6 min read
Author
Fadi Labib

I set out to prove AI can't build serious systems software. The data proved me wrong.
That sentence is uncomfortable to write, because a day before the evidence landed, I had taken the opposite position in public.
I maintain a pipeline that scores the novelty of software projects. It leans on established metrics and known FOSS tools, and I use it for two things: evaluating my students' work and dissecting interesting repositories. It is the closest thing I have to an objective instrument for the question "is this code actually original, or is it a remix?"
The day before Anthropic published CCC, their AI-written C compiler, I had posted about why generative AI is not suited to systems programming. Compilers, kernels, drivers: the argument was that this class of software demands a kind of judgment and rigor that next-token prediction does not have.
So when CCC dropped, I ran it through the pipeline expecting confirmation. A victory lap, basically.
The results were strong enough that I had to retune my weights, replace some metrics, and rerun the whole analysis. Still impressive. There is a particular kind of humility in watching your own measurement tool side against you the day after you planted a flag.
The numbers that made me rerun the pipeline:
Any one of these would be notable. Together they ended my "AI can't do systems software" position as a blanket claim.
The same pipeline that humbled me also surfaced the cracks:
None of these stop the compiler from working. All of them matter the moment a human team has to maintain it, extend it, or trust it in production. The gap between "it works" and "you can ship it" is enormous, and that gap is where engineering judgment lives.
Building a compiler is a well-understood, bounded problem. There are hundreds of references, decades of literature, and test suites like GCC torture that define success precisely. The AI had a clear target to optimize against.
Getting something working under those conditions is not the hard part. The hard part is building something efficient, maintainable, and robust under conditions the test suite never anticipated. CCC was playing a game with published rules. Most real systems work is not.
CCC is described as a clean-room implementation. But the model was trained on vast amounts of code, certainly including compiler implementations. My pipeline confirms it did not copy from specific projects. What it cannot tell me is whether the model internalized compiler-construction patterns from training and reproduced them in a different language.
And it was a clever choice to build it in Rust. Most reference compilers are written in C or C++, so the language switch alone inflates novelty scores. My instrument measures token and structural similarity, not conceptual lineage.
Here is where the analysis turned on me a second time.
Looking back at my own career, the genuinely novel problems were maybe less than 10% of the work. The rest was engineers detecting patterns, relating the current situation to one they had solved before, and adapting. That is pattern recognition. It is exactly what LLMs do.
So if AI can handle 80% of the bounded, pattern-based work at this speed and cost, does it matter that it is not original?
I think the question for executives is not "Can AI replace my engineers?" It is:
"What percentage of my work is the kind AI handles well, and what percentage requires judgment and ambiguity-handling?"
If you do not know your ratio, you are making AI adoption decisions in the dark. You will either over-trust the tool on the judgment-heavy 20% or waste your engineers on the bounded 80%.
Know your ratio.
And maybe the real question is not whether AI can be original. Maybe it is whether originality was ever as important as we thought.
Originally shared on LinkedIn.
Keep reading

I asked Claude and ChatGPT to design a hunting game with a gun controller. I got an €86 bill of materials, a 16-month plan, and a €237,000 launch budget. In 1984, Nintendo solved the same problem with a photodiode, a comparator circuit, and a screen flash. One model even name-checked Duck Hunt in its first sentence, then designed a Wii anyway.

Before AI: same input, same output, every time. After AI: same input, different output, every time. Same quality gates. After a year of daily Claude, Codex, and Gemini CLI use, these are the six gates I run on every AI-assisted task, and the five numbers I measure to know whether they work.

Two research papers from Google and DeepSeek landed in October from completely different domains. One processes speech, the other processes documents. Neither bothers converting anything to text first. This exposes something fundamental about how we have been training perception systems for decades.