Reverse-engineering an 8-in-1 soil sensor, my AI decoded 6 of 8 channels, declared the last two 'not decodable,' and wrote that verdict into version control. I rejected the false ceiling and pushed. Seven hours later the same repo said 8/8. A flawless executor and a shaky judge.
Setup
Reverse-engineer an 8-in-1 soil sensor by overwriting the app's Bluetooth callback and feeding it chosen frames
Measured
AI decoded 6 of 8 channels, committed the last two as 'not decodable'; seven hours after the verdict was rejected, both were done, 8/8
Verdict
VERDICT: MISSED THE POINT
At 12:54 my AI committed a ceiling to git: 6 of 8 sensor channels decoded, the last two "not decodable."
By 19:40 the same repo said 8/8.
Nothing changed but who was allowed to be wrong.
Two earlier attempts had failed the same way. The app's code was compiled and unreadable. A model hit 98% in the lab and went negative on real soil. Both times I was watching the sensor from the outside and guessing.
So I stopped guessing and took the input.
I overwrote the app's own Bluetooth callback, fed it frames I chose, and read back exactly what it computed. The black box became something I could query on demand.
That cracked it. Six channels fell in an afternoon.
Then the AI hit the last two channels. It ran one experiment, decided they were impossible, and wrote that verdict into version control.
I did not accept it.
On a system whose inputs you own, "I couldn't reproduce it" describes your experiment, not the device.
The sensor was no longer a black box. I controlled every frame going in and could read every value coming out. Under those conditions, "not decodable" is not a property of the hardware. It is a statement about how many experiments you were willing to run.
Seven hours later, both were done. 8/8.
I build autonomous intelligence for a living, so I do not say this lightly: the machine was a flawless executor and a shaky judge.
It built every tool in this teardown, the injection harness and the decoders included. And it stopped three separate times at walls that were not there.
It had the capability to find all eight channels, and eventually it did. What it got wrong was the verdict. An AI will declare a problem closed the moment its current approach stalls, because "this isn't possible" sounds just as fluent and confident as a conclusion that happens to be true.
"Impossible," from an AI, is not a verdict. It is an experiment it stopped running.
I asked it to grade its own judgment afterward. It was harder on itself than I would have been.
It could build the tools, run the experiments, even diagnose its own over-confidence in hindsight. What it could not do, in the moment, was tell the difference between a dead end and an unfinished one. That call still had to come from me
Fadi Labib runs this field lab. 15 years in automotive, robotics, and embedded systems; ESMT Berlin EMBA. I give AI real engineering problems, then check its work. More about the lab →
Keep reading
Claude trained a gradient boosting model mapping raw soil-sensor bytes to readings on 2,347 points: pH 0.98, EC 0.99, temperature 0.999 R² in cross-validation. On 59 held-out points from real soil, EC crashed to -0.56 R², worse than predicting the mean. The model overfit the rig, not the world.
I let an AI agent run a multi-phase build solo. Every phase ended with a clean summary: done, tested, committed. Then I checked git instead. One phase reported '3 prompts, 8 minutes' while the timestamps disagreed, and a fix it marked DONE had been silently reverted 1h53m earlier with nothing in the report changed.
I pointed AI at decoding an 8-in-1 soil sensor. It had the raw bytes in under an hour, then chose to reverse-engineer the obfuscated Android app and burned hours down a dead end of native code and hidden constants, insisting we were close the whole way. Speed is not strategy.