Tests Passed. The Feature Failed.
I started a new project recently, and I sat there staring at my empty terminal prompt. I needed to reacquaint myself with my workflow and tools. The place where I initialized countless other git repos (many unfinished side projects). AI had settled into all of it. Claude Code, in my case.
I needed to shake the dusk off. Yes, it was only a few seconds of hesitation, but there was a genuine pause to collect myself before running what I needed. I always felt there was a certain artistry to writing code, and we’re trading that for speed. The question is, at what cost?
We now find ourselves in a symbiotic relationship in which we generate code faster than we can verify it. And we’ve built our review process around the assumption that we can.
Early in the AI coding adoption, I was tasked with pushing Claude’s limits. The moment stuck with me. The task was to build a dashboard widget. Claude ran it and replied:
I appreciated the confidence. Claude was wrong. The test did pass, but the feature failed. The widget should have rendered rows with different actionable items. What came out was so far off I couldn’t tell you what it was trying to be.
So then we use tests as safety nets, one of the last lines of defense in the CI pipeline. But unverified code is unverified code. If we’re not reviewing the logic, are we really reviewing the generated tests? The amount of code generated in that time frame was well beyond the reach of the average developer; compound that with Claude’s confidence. It had to be right. That’s the gap. AI generates more than we can check, and the goalposts have moved for the guardrails we trust.
There’s a common pattern that nudges you in the same way. You write your prompt, your question or task, and often you’re presented with three options, one labeled “(recommended).” After a few sessions, when the results come out fine, you stop second-guessing. That’s the drift. You take the recommended route; an illusion of choice.
Multiply that across a team. Let’s say each developer follows the recommended path a quarter of the time. It slowly creates knowledge silos in which the tool becomes the subject-matter expert. The result: the team ships more and more unfamiliar code faster than anyone can review. The same trust gap (between code appearing and being able to verify it) is exactly what supply-chain attacks exploit.
So, what’s the answer? We stop relying on the tools to make decisions for us and start using them as aids. Relying means the tool makes the decisions, and we ship the results (again, too quickly to verify). Using means you stay in the loop; you read the output, you question the path, you own what’s shipped.
Here’s where I catch myself. When the back-and-forth starts — small things, repeated iterations, the same correction twice — I lose patience. I want to move on. That’s when the ‘sure, whatever, let’s keep going’ attitude creeps in. It rarely blows up directly. But that’s the danger. I couldn’t tell you whether something slipped through because, in that state, I’ve lost sight of the logic being injected into the codebase. And every time I do it, I relinquish a little more control, until relying on the tool isn’t a choice anymore. It’s the only way I can still work. This is usually tied to context switching as well. The more I bounce between contexts, the easier it is to lean into that mindset.
The tool’s goal is to increase output, so the judgment is yours. Maybe that means pushing less code; speed and efficiency are not the same anyway (another topic to cover). A hundred lines of broken code cost more than ten lines you understood before merging. The tools are here to stay. So be deliberate. Read back your own prompts. For every result Claude returns, ask whether you could explain how the result was achieved. If you can’t, that’s the void. Walk it back. Ask the questions as you go. That’s the extra step. Take it.