Two leaks in five days, right before an IPO. 512,000 lines of Claude Code source in an npm package, 100K GitHub stars in a day. I was sure it was staged. Then I read the source, the announcements, and the filings, and changed my mind. The lesson that survives either way is about governing the vendor you depend on.
Two leaks in five days. First an unsecured blog cache exposed a draft about an unreleased model with "unprecedented cybersecurity risks." Then 512,000 lines of Claude Code source shipped inside an npm package. The repo hit 100K GitHub stars in a day, the fastest-growing in the platform's history. All of it weeks before a reported $380B IPO.
If you don't at least ask whether this was deliberate, you're being naive. So I asked.
My first read was that this was theater:
.npmignore miss on 512,000 lines of code isn't even a junior mistake at $380B.If someone planned this, it worked.
Calling it staged turned out to be unfair:
So: probably not staged. Probably the thing that breaks at every fast-growing company, you ship faster than your process can handle, something gives, and then it's blamed on engineering.
I build on Claude Code every day, thousands of lines of custom automation. The product is good. But my eyes are more open now: you cannot trust AI coding in production without a human in the loop, and the company building the tool just demonstrated it has the same blind spots I'm guarding against in my own work.
The judgment isn't "is this tool good." It's "how do I govern a dependency that can do this to itself." If a vendor leaks its own source twice in five days on the way to an IPO, that's not bad luck, it's misalignment, and for serious work it means having alternatives. (It's part of why I keep experimenting with local models.)
If your team depends on a vendor and that vendor just leaked its own source code, what changes in how you govern that dependency?
Fadi Labib runs this field lab. 15 years in automotive, robotics, and embedded systems; ESMT Berlin EMBA. I give AI real engineering problems, then check its work. More about the lab →
Two models dropped in one week: Gemma 4, the 12B I run locally, and Fable 5, a frontier model that was officially pulled days later. I spent that short window using Fable as a blind judge for 120 debates and reasoning rounds between five local models. Gemma 4 won 73% as the slowest model on the board, the fastest model came near the bottom, and the one with 'reasoning' in its name finished dead last. The shared failure was calibration: fluent, confident, and unwilling to admit doubt, even from the winner.
Reverse-engineering an 8-in-1 soil sensor, my AI decoded 6 of 8 channels, declared the last two 'not decodable,' and wrote that verdict into version control. I rejected the false ceiling and pushed. Seven hours later the same repo said 8/8. A flawless executor and a shaky judge.
Claude trained a gradient boosting model mapping raw soil-sensor bytes to readings on 2,347 points: pH 0.98, EC 0.99, temperature 0.999 R² in cross-validation. On 59 held-out points from real soil, EC crashed to -0.56 R², worse than predicting the mean. The model overfit the rig, not the world.