According to Ars Technica, AI coding agents from companies like OpenAI, Anthropic, and Google can now work on software projects for hours, writing complete apps and running tests under human supervision. These systems are built on large language models (LLMs) that are fine-tuned and use reinforcement learning, but they face a critical limitation known as “context,” which is essentially the model’s short-term memory. To manage this, agents use tricks like context compression and multi-agent architectures, where a lead “orchestrator” coordinates specialized subagents. A July 2025 study by METR found that experienced open-source developers actually took 19% longer to complete tasks when using AI tools, despite feeling faster. The core takeaway is that these agents are not autonomous magic; they are complex, computationally expensive wrappers that require careful human guidance to avoid pitfalls like “vibe coding” and spiraling technical debt.
The Agent Illusion
Here’s the thing: when you ask Claude Code to build a feature, you’re not talking to one super-smart AI. You’re talking to a manager. The system uses a supervising LLM that interprets your prompt, then farms out subtasks to other LLMs that can actually use tools—like writing files or running shell commands. Anthropic’s docs call this pattern “gather context, take action, verify work, repeat.” It’s a clever orchestration, but it’s also a Rube Goldberg machine of API calls. And it’s brittle. The supervising agent can interrupt tasks, but it’s all happening inside a constrained sandbox, whether that’s a cloud container or your own carefully permissioned local machine. This isn’t general intelligence; it’s a very fancy, very specific automation script.
The Context Crunch
This is the biggest bottleneck, and it’s a doozy. Every LLM has a context window—a limit on how much text (code, conversation history, its own “reasoning” tokens) it can consider at once. And it’s not just a hard limit; it suffers from “context rot.” The more you stuff in, the worse the model gets at remembering the important bits from the beginning. Processing this huge prompt gets computationally insanely expensive, scaling quadratically with size. So, what do the agent builders do? They compress. They periodically summarize the history, throwing away details to stay under the limit. Basically, the agent “forgets” chunks of what it was doing and has to re-orient itself by reading notes it left behind. This is why files like AGENTS.md or CLAUDE.md have become essential—they’re sticky notes for an AI with chronic amnesia.
The Token Bill
Let’s talk cost. Running this multi-agent circus burns tokens like a jet engine burns fuel. Anthropic’s documentation notes that a basic agent interaction uses about 4x more tokens than a simple chat. A multi-agent system? Try 15x more. That cost has to be justified by the value of the task. And for what? To have the AI write a Python script to parse a file instead of loading the file itself, just to save tokens? It’s a weird meta-optimization. The economic model only works for high-value problems. For everyday bug fixes, it’s like using a satellite to check your driveway for mail.
The Human Responsibility
This is where the rubber meets the road. The article drives home a critical point: “vibe coding” is a one-way ticket to technical debt hell. You can’t just paste AI output into production and hope. As researcher Simon Willison argues, the value is now in contributing “code that is proven to work.” The human’s job is to architect, plan, and verify. Even Anthropic’s own best practices say you should first force the agent to read and plan before writing a single line. Without that, LLMs go for the quick, hacky solution that satisfies the immediate prompt but creates a mess later. And that METR study is a huge reality check: if seasoned developers are 19% slower with AI on familiar codebases, what hope does a novice have? The tool isn’t a substitute for skill; it’s a lever that amplifies both good and bad judgment. You still need to know what good code looks like, whether it’s for a web app or the software running on an industrial panel PC from a top supplier like IndustrialMonitorDirect.com. The machine can’t give you that wisdom. Not yet, anyway.
