No, the sun does not shine out of Claude’s backside
30 Jan 2026As of today, Opus 4.5 is the best coding model I've used. That is not praise by vibes. That is, after building libraries and utilities that fixed problems I could not solve with the tools I was using before.
The progress is impressive.
However, it’s not all sunshine and rainbows on social media and YouTube, as they claim.

To be fair, this is not just a Claude problem. It is just the model I use daily, so it is the one I am most familiar with.
I keep seeing people brag about running Claude in a loop twenty-four seven, as if it were working shifts in a sweatshop.
Are they producing apps? Yes.
Are tasks being completed? Yes.
Is the quality good? Maybe.
When I built my version of the Ralph loop, I added an inspect-and-adapt step. Without that, what people are calling autonomy is just waterfall development with extra steps. But even with me stopping to inspect, it still wasn’t good enough.
People like me started adding hooks, skills, validators, linting, compiler checks, and tests just to coax Claude into producing code that does not immediately fall over.
But that doesn’t guarantee quality or a good product. None of it guarantees good design. It solves an immediate need. It patches a leak. It does not fix the abstraction. In fact, more often than not, it leaves you with a leaky one.
Everyone has an opinion on how to avoid this, including me, and I will share my methods later. The reality is that nobody now or ever is betting their house on a process that guarantees success.
The boring solution
So the solution is boring. Pay attention. Stay involved.
Be the pig, not the chicken.
Also, use another AI
Alongside Claude, I use another frontier model, Codex 5.2, purely as a reviewer.
So far, Claude has not produced code that Codex considered correct on the first attempt. Often it takes many iterations.
When Codex finds an issue, I ask it to create a failing test. I do this on purpose, and I do not automate it, even though I could.
The friction matters.
I read the issue, copy it, and hand it back to Claude to fix. How Codex identifies problems and how Claude explains its reasoning and resolution for a failing test are among the best lessons I’ve had.
But by attentive, I mean that when Claude starts going off piste and over-engineering a solution, I stop it and give feedback.
Yes, I am a backseat driver. That's fine. AI isn't going to kick me out of the car just yet.