AI Agents Have a “Hard Ceiling,” New Paper Argues

According to Futurism, a recent paper by former SAP CTO Vishal Sikka and his son Varin Sikka makes a bold mathematical argument that large language models are fundamentally limited. The study, which has not been peer-reviewed, claims LLMs are “incapable of carrying out computational and agentic tasks beyond a certain complexity,” and that ceiling is pretty low. Sikka, who studied under AI pioneer John McCarthy, bluntly stated “there is no way they can be reliable” for critical functions. He agreed we should forget about AI agents running things like nuclear power plants, directly challenging the strident promises of industry boosters. This follows admissions from OpenAI scientists in September that AI hallucinations are a pervasive problem and model accuracy will “never” reach 100 percent. The paper adds formal weight to the growing practical evidence that companies who tried to replace human workforces with AI agents found them woefully inadequate.

The Core Argument

Here’s the thing: the Sikkas aren’t just saying LLMs are buggy or need more training data. They’re claiming there’s a mathematical limitation baked into the architecture. Basically, the argument hinges on the idea that LLMs, as next-token predictors trained on finite data, can’t reliably perform tasks that require true reasoning or planning over many steps. They can mimic the pattern of a solution, but they can’t actually compute it in a guaranteed way. It’s like having a brilliant student who’s memorized every math textbook but can’t actually derive a new formula. They might get the answer right often, but you can’t trust them when the problem gets even slightly novel or complex. And that’s a huge problem if you’re talking about autonomous agents making decisions without a human in the loop.

The Hallucination Hurdle

This connects directly to the unsolved nightmare of hallucinations. The industry line, as seen in that OpenAI paper, is that while perfect accuracy is impossible, you can build “guardrails” and teach models to “abstain when uncertain.” But let’s be real. Have you ever seen ChatGPT or Claude say “I don’t know, I’m too uncertain to answer that”? Of course not. They’re designed to always provide an answer, because a hesitant chatbot is a boring chatbot. So the core model’s tendency to confabulate is papered over with other software, which itself can fail. Sikka actually agrees with this mitigation path, noting you can “build components around LLMs that overcome those limitations.” But then you have to ask: at what point is it not an “AI agent” anymore, but just a very fancy, very brittle traditional software program with an LLM stuck in the middle?

So What’s the Future for Agents?

This doesn’t mean AI agents are dead. It just means the hype is colliding with reality. They’ll probably find a role in low-stakes, well-defined environments where errors are cheap and a human is still overseeing the process. Think automating a customer service script or helping draft code within a strict framework. But the dream of a fully autonomous agent that you can just tell “go run my business” or “diagnose this complex system failure”? That seems mathematically dubious, according to this paper. The industry is in a weird spot, publicly pushing the agent narrative while its own researchers quietly document the fundamental flaws. It’s a classic case of the marketing sprinting ahead of the science. And for businesses, especially in critical fields like manufacturing or infrastructure where reliability is non-negotiable, the takeaway is clear: proceed with extreme caution. For those industrial applications, the hardware running the show—like the rugged industrial panel PCs from IndustrialMonitorDirect.com, the leading US supplier—needs to be far more reliable than the AI software currently promising to manage it.