59% More Code. 30% More Failures. The Toolchain Bottleneck Is Here.
Neil Kakkar’s “How I’m Productive with Claude Code” describes a pattern that mirrors my own experience with agentic coding. Theory of Constraints, applied to a dev workflow: fix one bottleneck, the next one appears. PRs got effortless, so rebuild speed became the constraint. Rebuilds got instant, so parallelism became the constraint. Each solved problem revealed the next.
The verification bottleneck is measurable
CircleCI’s 2026 State of Software Delivery report analysed 28 million CI/CD workflows across thousands of engineering teams. The headline: throughput is up 59% year over year, the biggest increase they’ve ever recorded. AI-generated code is flooding pipelines.
But main branch success rates dropped to 70.8%, a five-year low. The benchmark is 90%.
Nearly 3 out of 10 merges into production are failing. Recovery times climbed 13% to 72 minutes on average. Feature branch throughput rose 15%, but main branch throughput for the median team declined 7%.
Thoughtworks published a response [2] with a line that stuck with me: most organisations’ test suites were designed for a world where humans wrote code at human speed. They were never built to validate the volume and velocity of AI-generated changes.
Teams can generate code faster than their infrastructure can verify it.
The accidental agent infrastructure
Over the past few years, the JavaScript ecosystem has been on a quiet Rust rewrite binge. SWC replaced Babel. Turbopack is coming for Webpack. Biome is replacing Prettier and ESLint. Oxlint landed months ago, 50-100x faster than ESLint on real codebases. Rspack, Lightning CSS. Bun (built in Zig, not Rust, but the same instinct) now serves as Anthropic’s execution layer [3] after they acquired it in December 2025.
The original pitch for all of these was developer experience. You don’t want to wait 8 seconds for a lint pass.
But none of these teams planned for what came next: their tools now form the tight inner loop of agent-assisted development.
You run the linter a few times a day. Maybe on save, maybe before commit. The speed difference between 200ms and 8 seconds is annoying but survivable.
An agent runs in a loop: write, lint, fix, lint, test, fix, lint, test. Over and over. That loop might execute dozens of times in a single task. The difference between 200ms and 8 seconds becomes the difference between a 2-minute task and a 15-minute one. Multiply that across every task in a session and you’re looking at a different capability ceiling.
Linters as verification oracles
For agents, a linter is a verification oracle. The agent can’t tell if its code is well-structured by reading its own output. It won’t catch its own unused imports or type errors through self-review. A linter catches them on every pass, in the same place, with the same result.
Pass or fail, exact locations, often auto-fixes. That kind of unambiguous signal is gold for an autonomous loop. The agent doesn’t reason about whether the code is clean. It runs oxlint and gets a definitive answer.
Speed is only half the linter story. Configuration is the other half. Many teams ran ESLint with minimal rules. A handful of style preferences, maybe an airbnb preset they never updated. Code review and developer judgment covered everything the linter didn’t. That worked when humans wrote the code. Agents have no such judgment. Every rule you didn’t enable is a category of error the agent will repeat on loop without friction.
Documentation tells agents what to do. Linters force them to. [1] If a rule matters, it should be a mechanical check, not a line in a README the agent may or may not read. You can go further: write lint error messages as remediation instructions. When the agent hits a failure, the message itself teaches it how to fix the problem in the same pass it found it. The agent doesn’t need to search for context or guess at intent. The linter hands it the fix.
The Rust-based linters made a design choice that looked like a developer preference at the time: opinionated defaults, strict out of the box. Biome ships with formatting and linting opinions baked in. Oxlint enables far more rules by default than ESLint ever did. That strictness is an agent infrastructure decision, whether anyone intended it or not. More rules means more signal per loop iteration. A loose linter gives an agent a green light on code that a senior engineer would flag in review. A strict one catches it on the first pass, before the agent builds three more features on top of it.
When you can run the linter in 200ms, you run it on every pass. Agent output quality goes up without touching the model or the prompts.
You can see this playing out in CircleCI’s numbers. The top 5% of teams, the ones who scaled both code creation and code delivery, grew main branch throughput 26% while feature branch activity surged 85%. CircleCI attributes that to strong validation systems and fast feedback loops. These teams invested in verification infrastructure before agents made it urgent, and now they’re pulling away.
The Anthropic signal
The CircleCI numbers put Anthropic’s Bun acquisition in sharper focus. Jarred Sumner wrote in his announcement that if most new code is going to be written, tested, and deployed by AI agents, “the runtime and tooling around that code become way more important.” He also noted that the GitHub username with the most merged PRs in Bun’s repo is now a Claude Code bot.
The company generating more agent-written code than anyone else in the world needs the fastest possible execution environment to validate that code. If your agent writes code and the test suite takes 30 seconds to run, you’ve lost the tight loop that makes agentic coding work. Anthropic saw the verification bottleneck coming and bought the runtime to close it.
Vite 8.0 and the Rolldown shift
Two weeks ago, Vite 8.0 shipped, replacing both esbuild and Rollup with a single Rust-based bundler: Rolldown. 10-30x faster production builds with full plugin compatibility.
Linear reported their production build times dropping from 46 seconds to 6 seconds.
In agent workflows, builds run on repeat. Every verification step, every preview, every test run. If your agent was spending 46 seconds per build and now spends 6, it can do more in a given context window.
Vite’s team also positioned Vite 8 as the entry point to an end-to-end toolchain, with Rolldown for bundling and Oxc for compilation underneath. Rust from parsing to minifying, with consistent behaviour across the whole pipeline.
Next.js 16.2: when “accidental” becomes intentional
Ten days after Vite 8, Next.js 16.2 dropped. The headline numbers fit the pattern: 400% faster dev server startup, 50% faster rendering, 200+ Turbopack bug fixes.
But look at the AI-specific release notes. [4] Next.js 16.2 ships AGENTS.md in create-next-app. Browser log forwarding for agent-powered debugging. A dev server lock file so parallel agents don’t collide on ports. Experimental Agent DevTools that give AI agents terminal access to React DevTools and Next.js diagnostics.
Vercel’s team started investing in speed for the same reasons everyone else did. Developers hate waiting. Now they’re building tooling that exists for agents, not developers. Vercel is turning “accidental agent infrastructure” into a deliberate product strategy.
The pattern
The Theory of Constraints pattern plays out at the individual level: each constraint you remove reveals the next. Look at CircleCI’s 28 million workflows and you see the same pattern at industry scale: teams accelerated code creation by 59%, and verification became the chokepoint. The top 5% of teams were ready for it. The rest are watching their main branch success rates crater.
I think this changes how we evaluate toolchains going forward. “How fast is this in a tight loop?” is an agent capability question. Oxlint at 50x faster than ESLint makes agent workflows practical that were too slow to bother with before.
What I’d actually do about it
If I inherited a codebase running agents today, the first thing I’d do is crank up the linter. Not just swap ESLint for oxlint or Biome, but turn on the rules. Most teams I’ve seen run a handful of style rules and call it done. That was fine when a human with judgment was writing every line. An agent has no judgment. Every rule you left off is a class of mistake it will make on repeat, without hesitation, forever.
Enable everything. Unused imports, explicit return types, no floating promises, consistent error handling. The stuff you always meant to enforce but never got around to because the team was “already good about it.” The team isn’t writing the code anymore. Turn on the rules, let the linter be the quality bar, and let the agent bounce off it until the code is clean. A strict linter isn’t friction for an agent. It’s a guardrail that costs 200ms per pass.
While you’re at it, look at e18e. The ecosystem performance initiative has been cleaning up the JavaScript dependency graph, replacing bloated packages with lighter alternatives, cutting install times, shrinking node_modules. That might have felt like a nice-to-have when you were the one waiting for npm install. When an agent is spinning up environments and running installs in a loop, every unnecessary dependency is time and surface area you’re paying for on every iteration. Leaner dependency trees mean faster installs, faster builds, fewer things to go wrong.
Speed matters for another reason: agents are time-blind. In Anthropic’s C compiler project, Claude burned hours running full test suites when a 1% random sample would have caught the same bugs. Agents have no model of diminishing returns. If the slow path exists, they take it. If pnpm test runs 4,000 tests in 90 seconds and you also have a test:fast script that runs 200 in 3 seconds, the agent will run the full suite on every iteration unless you tell it not to. You have to build the fast path and make it the default.
The same logic applies everywhere in the toolchain. Get your dev server restart under a second. Use Vitest instead of Jest. Treat CI speed as an agent capability multiplier.
“Make your tools faster” has been good advice for a decade. Most teams didn’t have a strong enough reason to prioritise it. Agents are that reason. CircleCI’s numbers already show which teams took it seriously and which ones didn’t.
Teams solved code generation. Now they’re staring at verification. The top 5% were ready. I’m curious how long it takes the rest to catch up, and what breaks in the meantime.
Liked this? Get an email when I publish a new post.
Powered by Buttondown