AI brokers and dangerous productiveness metrics

February 23, 2026

3

Right here’s slightly little bit of snark from developer John Crickett on X:

Software program engineers: Context switching kills productiveness. Additionally software program engineers: I’m now managing 19 AI brokers and doing 1,800 commits a day.

Crickett’s quip lands completely as a result of it isn’t really a joke. It’s a preview of the following administration fad, whereby we exchange one dangerous productiveness proxy (traces of code) with a good worse one (agent output), then act stunned when high quality collapses.

And sure, I do know, no one is doing 1,800 significant commits. However that’s the purpose. The metric is already being gamed, and brokers make gaming easy. In case your group begins celebrating “commit velocity” within the agent period, you aren’t measuring productiveness. You might be measuring how shortly your group can manufacture legal responsibility.

The good promise of generative synthetic intelligence was that it might lastly clear our backlogs. Coding brokers would churn out boilerplate at superhuman speeds, and groups would lastly ship precisely what the enterprise desires. The truth, as we settle into 2026, is way extra uncomfortable. Synthetic intelligence will not be going to avoid wasting developer productiveness as a result of writing code was by no means the bottleneck in software program engineering. The true bottleneck is validation. Integration. Deep system understanding. Producing code and not using a rigorous validation framework will not be engineering. It’s merely mass-producing technical debt.

So what do we modify?

Pondering accurately about code

First, as I argued just lately, we have to cease interested by code as an asset in isolation. Each single line of code is floor space that should be secured, noticed, maintained, and stitched into all the things round it. As such, making code cheaper to jot down doesn’t scale back the whole quantity of labor however as an alternative will increase it as a result of you find yourself manufacturing extra legal responsibility per hour.

For years, we handled builders like extremely paid Jira ticket translators. The belief was that you may take a well-defined requirement, convert it to syntax, and ship it. Crickett rightfully factors out that if that is all you’re doing, then you’re completely replaceable. A machine can do primary translation, and a machine is completely completely satisfied to do all of it day with out complaining.

What a machine can not do, nonetheless, is perceive important enterprise context. AI can not really feel the monetary price of a compliance mistake or have a look at a buyer workflow and instinctively acknowledge that the underlying requirement is basically fallacious. For this we want individuals, and we want individuals to thoughtfully take into account precisely what they need AI to do.

Crickett frames this transition as a mandatory transfer towards spec-driven growth. He’s proper, however we have to be extremely clear about what a specification means within the agent period. It’s not another Jira ticket however, relatively, a set of constraints tight sufficient to make sure an LLM can’t escape them. In different phrases, it’s an executable definition of executed, backed solely by checks, API contracts, and strict manufacturing alerts. That is the precise kind of foundational work we’ve got underinvested in for many years as a result of it doesn’t seem like uncooked output; it seems to be like course of. You understand, that “boring stuff” that slows you down.

You’ll be able to see the friction taking part in out in actual time simply by wanting on the feedback to Crickett’s tweet. You’ll discover individuals desperately attempting to sq. the circle of agentic growth. One commenter tries to reframe the chaos by calling it structure versus engineering. One other insists that managing 19 brokers is definitely orchestrating, not context switching. A 3rd bluntly states that operating greater than 5 brokers concurrently begins to seem like vibe coding, which is merely a well mannered phrase for playing with manufacturing techniques. They’re all highlighting the core difficulty: You haven’t eradicated the work. You’ve simply moved it from implementation to supervision and evaluation.

The extra you parallelize your code era, the extra “evaluation debt” you create.

Observability to the rescue

That is the place Charity Majors, the co-founder and CTO of Honeycomb, turns into annoyed. Majors has argued for years you can’t actually know if code works till you run it in manufacturing, beneath actual load, with actual customers, and actual failure modes. While you use AI brokers, the burden of growth shifts solely from writing to validating. People are notoriously dangerous at validating code merely by studying massive pull requests. We validate techniques by observing their habits within the wild.

Now take that concept one step additional into the agent period. For many years, one of the crucial frequent debugging methods was solely social. A manufacturing alert goes off. You have a look at the model management historical past, discover the one who wrote the code, ask them what they have been attempting to perform, and reconstruct the architectural intent. However what occurs to that workflow when nobody really wrote the code? What occurs when a human merely skimmed a 3,000-line agent-generated pull request, hit merge, and moved on to the following ticket? When an incident occurs, the place is the deep data that used to dwell contained in the writer?

That is exactly why wealthy observability will not be a nice-to-have function within the agent period. It’s the one viable substitute for the lacking human. Within the agent period, we want instrumentation that captures intent and enterprise outcomes, not simply generic logs that say one thing occurred. We’d like distributed traces and high-cardinality occasions wealthy sufficient that we will reply precisely what modified, what it affected, and why it failed. In any other case, we’re trying to function a black field constructed by one other black field.

Majors additionally affords important operational recommendation: Deploy freezes are an entire hack. The frequent human intuition when change feels dangerous is to cease deploying. However when you preserve merging agent-generated code whereas not deploying it, you’re merely batching danger, not decreasing it. While you lastly execute a deploy, you’ll have completely no thought which particular AI hallucination simply took down your fee gateway. So if you wish to freeze something, freeze merges. Higher but, make the merge and the deploy really feel like one singular atomic motion. The quicker that loop runs, the much less variance you will have, and the better it’s to pinpoint precisely what broke.

Golden paths are the way in which

The repair for this impending chaos is to not depend on heroic engineers. As Majors factors out, resilient engineering requires a dedication to platform engineering and golden paths (one thing I’ve additionally argued). Such golden paths make proper habits extremely straightforward and the fallacious habits extremely arduous. The best groups of the following decade is not going to be those with probably the most freedom to make use of no matter framework an agent suggests, however as an alternative people who function safely inside one of the best constraints.

So how do you measure success within the agentic period?

The metrics that matter are nonetheless the boring ones as a result of they measure precise enterprise outcomes. The DORA metrics stay one of the best sanity verify we’ve got as a result of they tie supply velocity on to system stability. They measure deployment frequency, lead time for modifications, change failure fee, and time to revive service. None of these metrics cares concerning the variety of commits your brokers produced right this moment. They solely care about whether or not your system can soak up change with out breaking.

So, sure, use coding brokers. Use them aggressively! However don’t confuse code era with productiveness. Productiveness is what occurs after code era, when code is constrained, validated, noticed, deployed, rolled again, and understood. That’s the important thing to enterprise security and developer productiveness.

Previous articleThe Neglected Hack for Higher LLM Outcomes

Next articleWingcopter companions with Ukraine’s largest drone maker

AI brokers and dangerous productiveness metrics

Pondering accurately about code

Observability to the rescue

Golden paths are the way in which

5 New Instruments to Speed up Progress

AWS Weekly Roundup: Claude Sonnet 4.6 in Amazon Bedrock, Kiro in GovCloud Areas, new Agent Plugins, and extra (February 23, 2026)

How you can adapt your abilities for AI-driven growth

LEAVE A REPLY Cancel reply

Most Popular

5 Strategies to Current AI-Generated Insights

5 New Instruments to Speed up Progress

Implement a knowledge mesh sample in Amazon SageMaker Catalog with out altering purposes

eBay acquires Depop from Etsy

Recent Comments

ABOUT US

POPULAR POSTS

5 Strategies to Current AI-Generated Insights

5 New Instruments to Speed up Progress

Implement a knowledge mesh sample in Amazon SageMaker Catalog with out altering purposes

POPULAR CATEGORY