74% on isolated tasks, 3.75% on real freelance work - that gap terrifies me. I've been comparing Claude Code vs Codex for 2 months, and it's exactly this. Both tools crush benchmarks. Neither handles the chaos of actual development.
The context switching between coding, debugging, error handling, and context recovery - that's where they fail. Benchmarks test one thing. Real work tests fourteen.
Curious what you think the next frontier is. Is it task integration, or do we need a fundamentally different approach?
The supervision tax, just 1% net productivity boost with AI when factoring correction was an unexpected stat.
Thinking about it, it makes sense even in my use cases, but I guess it cuts down the boring work and I focus more on thinking and evaluating.
I completely agree the important meta skill is now effective delegation, and that itself keeps shifting by the day with new models and tool launches.
74% on isolated tasks, 3.75% on real freelance work - that gap terrifies me. I've been comparing Claude Code vs Codex for 2 months, and it's exactly this. Both tools crush benchmarks. Neither handles the chaos of actual development.
The context switching between coding, debugging, error handling, and context recovery - that's where they fail. Benchmarks test one thing. Real work tests fourteen.
Curious what you think the next frontier is. Is it task integration, or do we need a fundamentally different approach?
https://thoughts.jock.pl/p/claude-code-vs-codex-real-comparison-2026