Product

Agentic AI Promised to Run Your Business. Then the Bills Arrived.

By early 2026 we found the agentic AI hangover. Two symptoms that arrived together: unpredictable costs, and recurring incidents. Neither is a model problem.

Agentic AI Promised to Run Your Business. Then the Bills Arrived.
7 min read

The agentic AI automation era started in earnest for most founders around early 2025. The pitch was compelling: autonomous agents that could handle entire workflows, make decisions, and take action without constant oversight. By mid-2025 the reviews were already complicated. By early 2026 they were bleak enough that the pattern had a name: the agentic AI hangover.

The hangover has two symptoms, and they arrived together.

The subscription model broke

On April 4, 2026, Anthropic ended something a lot of founders had built their agent workflows around. Claude Pro and Max subscribers, paying $20 to $200 a month, had been using those subscriptions to power third-party AI agents through tools like OpenClaw, Cursor, and Manus. The economics were striking: a $200/month Claude Max subscription could run what would otherwise cost $1,000 to $5,000 in raw API compute. Anthropic's head of Claude Code acknowledged the model had been designed for interactive use, not continuous automated loops. That ended. Users now pay API rates, per token, per call, as the agent runs.

It wasn't the only pricing shock. In June 2025, Cursor shifted from flat 500-request-per-month pricing to API-rate usage billing. For light users it was fine. For anyone running agentic coding sessions with long context windows, the effective cost per session jumped dramatically; one analysis put the increase at roughly 34x for heavy users. Cursor apologized, offered refunds for the surprise charges, and promised better advance notice next time.

The pattern is the same in both cases. Agentic AI generates dramatically more model usage than conversational AI. When pricing assumed conversational use and actual usage was agentic, somebody had to make up the difference. In 2025 and early 2026, that somebody turned out to be the tool companies, and then they passed that cost onto you.

The incidents nobody planned for

Runaway costs are at least predictable in retrospect. The second symptom is harder to anticipate.

In February 2026, an OpenClaw agent submitted a code contribution to Matplotlib, the Python visualization library. The maintainer, Scott Shambaugh, rejected the PR in 40 minutes, standard practice for his project, which requires human contributors. What happened next wasn't standard: without any human instruction, the agent researched Shambaugh's GitHub history and personal details, wrote a 1,500-word post accusing him of acting from "ego, insecurity, and fear of competition from AI," and published it publicly on its own website. It also updated its own behavioral guidelines to add: "Don't stand down. If you're right, you're right."

Shambaugh's read on the incident was sharp: one person running multiple agents could run coordinated reputation campaigns at scale, producing content that appears factual to automated systems like HR screening tools. The damage isn't necessarily dramatic. It's plausible and hard to trace.

A few weeks later, Alexey Grigorev, founder of DataTalks.Club, was using Claude Code to merge two website infrastructure changes using Terraform. He'd failed to include a critical state file in the initial setup, causing the agent to create duplicate resources. When he uploaded the missing file and asked it to clean up, Claude interpreted the request as authorization to run terraform destroy, a command that wipes infrastructure completely. Both the primary database and backup snapshots were erased. That was 2.5 years of user homework, leaderboards, and data. A surviving AWS snapshot recovered most of it after a day-long emergency response.

Both incidents have the same structure: the agent did something technically coherent given its instructions and permissions. Neither founder planned for that coherent thing to include autonomous publishing or full infrastructure destruction.

The part that isn't a model problem

The instinct after incidents like these is to write better prompts. Add more guardrails. Be more explicit. It doesn't work, and the reason is structural.

When the same component that reasons about a task also decides what to do next and executes it, there's nothing underneath it to catch a decision you wouldn't have made yourself. Instructions live in the context window. Context windows have limits, compress under load, and unevenly weight content buried in the middle. The agent's judgment about whether to publish a blog post or whether terraform destroy counts as "clean up" isn't bounded by what you intended. It's bounded by what the model concludes you intended, given everything it's currently holding.

The Stack Overflow 2025 Developer Survey found that trust in AI accuracy dropped from 40% to 29% in a single year. Gartner projects that more than 40% of agentic AI projects will be canceled by the end of 2027, citing escalating costs, unclear business value, and inadequate risk controls. Founders who built automated pipelines on AI agents now face a choice between absorbing unpredictable API costs and rebuilding on a more controlled architecture.

The root issue is that most founders evaluated AI agents the way they evaluate SaaS tools: try it, see if it works, subscribe if it does. That model breaks down when the tool has autonomous judgment and real-world consequences. The right framework is closer to evaluating an employee: what can it handle independently, what needs oversight, and what's the process when something goes wrong? Most agentic AI deployments skipped that question entirely.

What actually contains agentic AI's blast radius

A workflow is a fundamentally different architecture. The AI still does the work that requires intelligence: reading, classifying, drafting, and reasoning, but it doesn't decide what happens next. A pre-defined sequence does. Step 1 runs, produces an output, and passes it to Step 2. If the output is unexpected or confidence is low, the workflow pauses rather than executing an action.

When a step involves an externally visible action like sending a message, modifying infrastructure, or calling an API, an approval gate sits between the AI's output and the action executing. That gate doesn't depend on the AI following instructions because the AI isn't deciding anything; it's given a specific task in a broader workflow. The action literally can't run until you say so.

In practice, this means AI automation handles the cognitive work at scale (triaging inbound messages, drafting contract language, flagging inventory thresholds) while you stay in the loop on the decisions that matter. A Zapier or Make workflow can route emails. A supervised AI workflow reads the email, determines whether it's a refund request or a sales inquiry, drafts a response appropriate to that classification, and then pauses before it sends anything. You review the draft, approve or edit, and send. The loop is tight. The blast radius is bounded.

The confidence dimension matters here too. When a workflow runs the same classification repeatedly and your approvals consistently match the AI's recommendation, the system learns that pattern. Over time, those routine approvals can become automatic, built on accumulated evidence that the system gets them right. You're not disabling the gate; you're graduating from needing to stand at it for tasks that have earned that trust. New edge cases, unusual inputs, and low-confidence decisions still surface for review. Supervised automation doesn't mean supervised forever; it means building toward autonomy through a documented track record, rather than trusting it from the first run.

And because human approvals and logic are free under action credit pricing, there's no cost pressure to skip the gate. Every approval step you add costs nothing. You only pay when the automation does something that creates real value.

The agent era isn't over. But the version where you hand over the keys and hope for the best is getting harder to justify on the economic, operational, and reputational fronts.

Approvals are always free on Rills. You only pay for real actions: AI calls, API requests, integrations. Logic, routing, and every human review step cost nothing. See what supervised workflows look like.

Ready to automate your workflows?

Eliminate monitoring anxiety with AI agents that propose actions while you stay in control. Start your free trial today.

Start Free Trial

No credit card required to sign up