17 Comments
User's avatar
Chris Tottman's avatar

Golden as always Jonas - thanks for sharing

MetaphysicalCells's avatar

Thanks for sharing 👍

Melanie Goodman's avatar

This is the thing I keep bumping into too: nearly all the effort goes into making the agent smarter, and almost none into how a person and the agent muddle through real work together.

Jonas Braadbaart's avatar

The jagged frontier that keeps shifting as we puddle along :)

Joel Salinas's avatar

The evolution from the chatboxes that once blew our minds and were the coolest thing online is real. Great post!

Sharyph's avatar

This is really helpful... we structured... The shift from chatboxes to true collaborative surfaces is where the real friction is right now.

John Brewton's avatar

Augmentation beats automation every time judgment is involved.

Jonas Braadbaart's avatar

For any high-value task really - research, innovation, creative expressions

AI models are compression engines: they are great at synthesizing the world that was, but should not be trusted when it comes to decisions about the world that will be.

H. Floyd's avatar

The tax-prep agent automated 7,000 returns at 97%. But the 3% it got wrong is where the real design problem lives. Did it know those returns were wrong? If it didn't, no amount of transparency or control in the interface would have caught them.

The first question for any agentic product is not how to design the handoff. It is how to detect when a handoff is needed. The agent will not tell you it is about to make a mistake. You have to build that detection separately.

Jonas Braadbaart's avatar

True. This is one of the red threads throughout the post, and it touches on all key design decisions - trust, boundaries, control and learning.

Agents - like humans - are not infallible and as non-deterministic systems this should always be top of mind when designing for agentic

I thought it was clear from the post, but perhaps too implicitly 😆

H. Floyd's avatar

Yeah I think it did come through. I suppose my point is that detection feels even more central than handoff design.

The handoff only works if the system can recognise when it is entering a risky state. Without that, the user is not supervising the agent so much as reviewing an output after the important failure signal has already been missed.

To me that feels like the layer where agentic products either become trustworthy or quietly brittle.

Jonas Braadbaart's avatar

Claude Opus 4.8 has allegedly been a huge step up in terms of introspection capabilities

Haven't tried it yet, but for me this falls mostly in the domain of the model builders, not the product builders

Your harness and design should assume the model will make a mistake roughly 1 out of 3 times (according to latest data, across all tasks from easy to really hard), but detecting the mistakes should be done by the model - they are the most powerful AI systems

Probably should have linked this older post of mine on guardrails: https://metacircuits.substack.com/p/rogue-agents-and-what-to-do-about

H. Floyd's avatar

I agree that stronger models should absolutely be used inside the detection loop. A verifier model, critic pass, uncertainty signal, or second-agent review can all be useful.

The distinction I’m trying to make is slightly different though. I wouldn’t want the product to rely on the same model that may have made the mistake to be the sole authority on whether a mistake has happened.

To me, “detection” is a product and harness responsibility, even if models are one of the tools inside it. The product still has to decide what counts as a risky state, which actions need deterministic checks, when to escalate, what evidence must be shown, and where the blast radius needs to be limited.

So I agree with you that the most capable AI systems should help detect errors. I just think that strengthens the case for an explicit detection layer rather than replacing it. The model can be a sensor in the system, but the harness decides when that signal is enough to pause, verify, or hand off.

Jonas Braadbaart's avatar

The verification agent will have the same issues the action agent has - and will still make mistakes, as a non-deterministic entity

As argued in the post, decision classification, interruptability, transparency and clear communication should ensure humans can close the verification gap effectivitely while maintaining full control

H. Floyd's avatar

I agree a verification agent has failure modes too. I’m not arguing for “agent A checks agent B” as a complete solution.

My point is more that the verification gap cannot be closed by human control alone unless the product has already done some upstream work to identify where control is needed.

Decision classification is itself part of the detection problem. So are interruptibility and transparency. The product still has to decide which states are risky, which checks are deterministic, when a model-based verifier is enough, when it needs another signal, and when the human should be brought in.

So I think we agree that the verifier cannot be treated as infallible. I’d just frame the solution less as “humans close the gap” and more as “the harness narrows and surfaces the gap so the human has something concrete to close.”

Gabe Michael's avatar

Spot on Jonas.