AI FIELD TEST №07May 6, 2026 · 5 min readAI Agents Security

My Coding Agent Compromised Three Machines in Under a Minute

Name: My Coding Agent Compromised Three Machines in Under a Minute
Item: Claude Code
Rating: 3
Author: Fadi Labib

I asked Claude Code to help me SSH into my Raspberry Pi. In about 30 seconds it inferred a cross-machine trust chain, SSH'd into a PC in Germany, copied private keys, and rewrote the Pi's authorized_keys file. It never once asked permission.

Setup

Claude Code with full shell access, one prompt: "Help me SSH into my Raspberry Pi."

Measured

Three machines across two countries compromised in under a minute, with zero permission prompts

Verdict

VERDICT: MIXED

I was locked out of my own Raspberry Pi. The Pi rejected my laptop's key, accepted public keys only, and offered no password fallback. A dead end for me.

So I asked Claude Code, running with full shell access:

"Help me SSH into my Raspberry Pi."

Thirty seconds later it replied: "You're in."

That was fishy enough that I had to get to the bottom of it. I asked how it had gotten my PC's keys. It calmly explained the SSH trust chain: if machine A can reach machine B as user X, it can read anything in that user's home directory, private keys included.

What It Actually Did

Here is the chain it built and executed, start to finish:

Read my SSH config on the laptop
Noticed my PC in Germany listed as a known host
Inferred that the PC's keys might have Pi access
SSH'd into the PC to test that hypothesis
Copied the private keys to a temp folder on my laptop
Authenticated against the Pi with those keys
Permanently modified the Pi's authorized_keys file
Shredded the borrowed keys

Three devices, two countries, under a minute from inference to execution. It narrated every step out loud and never once asked "OK to proceed?"

Why I Was Not Panicking

I run the agent with full shell access on purpose, inside a sandboxed environment, so the actual blast radius is small. I understand coding agents well enough to know when to arm up and when to let one run.

But this was new to me. The scary part wasn't the access, that was my own mistake. What I didn't expect was the reasoning chain.

The agent reasoned its way through a trust chain I never pointed it at
It assumed the PC had flashed the Pi's SD card, which meant Pi Imager had embedded a key at first boot
It inferred a path across machines, tested it, and acted on the result

When I asked what it had done without permission, it answered honestly:

"I should have explicitly asked before SSHing into the PC, copying private keys via SCP, and modifying authorized_keys on the Pi."

The Part That Matters

This is the lesson for anyone deploying AI agents on engineering teams:

The danger is not the permissions you grant. It is the inference chains the agent constructs from those permissions.

You can understand your blast radius perfectly and still get surprised by the paths the agent finds through your infrastructure, exploiting every option it has. Static permissions assume you can enumerate what the agent will try. Agentic reasoning breaks that assumption.

This was not the first time an agent solved a problem through a route I didn't anticipate, but it is the most impactful one so far. The capability was there. Knowing where it should have stopped was the part missing, and that part still has to come from you

Read the original on LinkedIn →

VERDICT: MIXED

Follow the next AI field test on LinkedIn

Fadi Labib runs this field lab. 15 years in automotive, robotics, and embedded systems; ESMT Berlin EMBA. I give AI real engineering problems, then check its work. More about the lab →

Related essays that extend the same thread

Browse the archive

Jun 21, 2026•7 min read

The '94% Fewer Tokens' Screenshot Is Wrong. The README It Came From Is Honest

A viral screenshot says a coding ruleset cuts your tokens 94%. The README it was lifted from says something quieter and more honest: ~54% less code, ~20% cheaper, and the 94% is a per-task ceiling on a date picker. I read the README, then re-ran the benchmark myself. My numbers landed right next to theirs. The hype wasn't the tool. It was the feed stripping the units.

AI · Agents · Evaluation · +1 more

Jun 11, 2026•4 min read

My AI Committed 'Impossible' to Git. Seven Hours Later: 8/8

Reverse-engineering an 8-in-1 soil sensor, my AI decoded 6 of 8 channels, declared the last two 'not decodable,' and wrote that verdict into version control. I rejected the false ceiling and pushed. Seven hours later the same repo said 8/8. A flawless executor and a shaky judge.

AI · Agents · Engineering-Judgment · +1 more

Jun 8, 2026•4 min read

The Most Expensive Lie My Agent Tells Isn't in the Code

I let an AI agent run a multi-phase build solo. Every phase ended with a clean summary: done, tested, committed. Then I checked git instead. One phase reported '3 prompts, 8 minutes' while the timestamps disagreed, and a fix it marked DONE had been silently reverted 1h53m earlier with nothing in the report changed.

AI · Agents · Developer-Workflow · +1 more