The Promise vs. Reality of Agentic AI
Developers of agentic AI have been making grand claims, envisioning autonomous systems that can handle everything from booking flights to monitoring competitors in real time and managing entire procurement cycles—all without a human pressing a button. While the underlying models (large language models, reasoning engines, and multi‑step planning systems) have advanced remarkably, the infrastructure required to make these agents work reliably at scale remains woefully inadequate.
Gartner recently projected that over 40% of agentic AI projects will be canceled before the end of 2027, citing escalating costs, unclear business value, and inadequate risk controls. This statistic is striking, especially given that many industry observers expected autonomous agents to finally herald AI’s coming‑of‑age. Yet the cancellation rate should not surprise anyone who has witnessed the undeniable limitations these agents exhibit in real‑world deployments. Most observers assume the problem lies with the quality of the AI models themselves—but that assumption is only partly correct. The deeper issue is infrastructural.
Why the Web Resists Agents
Consider what a capable agent actually needs. Accessing a website and receiving a response is just the beginning. The agent must then translate that response into structured, usable data—and it must do so consistently, in real time, and at a scale that makes the whole exercise worthwhile. Given the current state of the web, this is a daunting task.
Take online retail platforms as an example. There is no technical reason why an independent AI agent could not compare prices, availability, and shipping policies across multiple vendors and then choose the option that best suits a user’s preferences. However, those very platforms depend on that information not being readily accessible. To maintain their competitive advantage, they increasingly employ personalized results, sponsored placements, and urgency cues to shape user behavior and tip the scales in their favor. Without access to clean, unbiased data, no AI agent can complete its task or automate the selection of the optimal option.
The result is a web that works reasonably well for human browsing but systematically discourages automated access. The Web Openness Index, recently published by Oxylabs after scoring over 120 countries, provides clear evidence of this gap:
- Practical reachability—how well a site responds to standard automated HTTP requests—averages 83.4 out of 100 globally.
- Anti‑automation friction—the presence of CAPTCHAs, rate limiting, browser fingerprinting, and bot detection—scores only 62.8 (the lower the score, the more friction).
- Structured data interoperability—whether sites return data in machine‑friendly formats—drops even further to 60.3.
These 20‑plus‑point differences reflect a structural gap. Most sites technically respond to automated requests, but they simultaneously throw up barriers and return data in ways that are difficult for machines to parse. AI agents that depend on reliable, timely, structured information will often fall into that gap—and fail.
Data‑Starved AI Inside Organizations
Within corporate walls, agents face a different but related problem: a lack of usable data. The relevant information exists somewhere—in CRM databases, supply chain logs, customer support tickets—but it has not been cleaned, tagged, or structured in a way that an AI system can understand. Many organizations have spent years building dashboards for human analysts, yet they have not prepared their data for programmatic consumption. As a result, even the most sophisticated agent cannot access the context it needs.
The same applies to customer‑facing applications built on agentic systems. Without real‑time web data—current prices, live inventory levels, policy updates, market movements—agents have no choice but to reason based on a frozen version of the world. A customer asking for the best deal on a flight will receive an answer that is hours or days old, rendering the recommendation useless.
Latency is another critical dimension. An agent that eventually returns the correct answer is far less valuable than one that returns it fast enough to act on. In autonomous systems, tolerance for delay is even lower. A trading agent that cannot react within microseconds, or a supply chain agent that cannot reroute shipments within minutes, simply fails at its core task.
In every case, the constraint is the same: agents need context they can trust, and they are not getting it—not from their own organizational data, and not from the web.
Solving a Problem That Has Been Solved Before
It is easy to forget that this is not the first time the sheer volume of information has overwhelmed our capacity to process it. The early web provides an instructive parallel. At first, the World Wide Web held immense knowledge, but in its raw state it was nearly unusable. What made the difference was infrastructure built for scale: web crawlers that indexed pages, scrapers that compared prices across e‑commerce sites, and monitoring systems that tracked fraudulent ads and brand impersonation across thousands of domains. All of these innovations required the ability to collect public web data reliably and at scale.
A more recent example comes from the pro bono work of Debunk.org, a non‑profit combating online disinformation and fraud. Their investigation uncovered a large‑scale, multilingual scam operation targeting former fraud victims. The operation used over 50,000 ads, 459 domains, and more than 1,100 related web pages to reach an estimated 52 million people across Europe. That level of coverage could only be achieved through systematic, automated data collection at scale.
Agentic AI needs an infrastructure of the same kind—except with even higher demands. Agents do more with data than any previous application. They need information that is structured, current, complete, and returned fast enough to support real‑time action.
The Three Cs of Reliable Agent Infrastructure
All of this is unlikely to happen organically. For online platforms, opening up to frictionless automated access means ceding control over discovery, ranking, and customer relationships. While this could ultimately benefit consumers and reshape business models, it is an immediate threat to short‑term revenue. Therefore, the infrastructure that makes agentic systems work reliably must be built independently.
Three requirements—the three Cs—stand out as essential:
Consistency: Agents that encounter unreliable data sources produce unreliable behavior. Unreliable behavior is the fastest route to project cancellations and executive disillusionment. Consistency means predictable response times, identical formatting across requests, and minimal downtime—attributes that most current websites do not guarantee.
Currency: Real‑time access to prices, inventory, availability, and policy changes is what separates an agent reasoning based on current facts from one reasoning based on stale assumptions. In most commercial contexts, the latter creates more problems than it solves, eroding user trust and leading to costly errors.
Compliance: Access built outside fair standards tends to provoke countermeasures that raise barriers for all automated systems. Any infrastructure worth building has to be sustainable not just technically but in practice—respecting website terms of service, avoiding aggressive scraping patterns, and ensuring ethical data collection. Only then can agentic systems operate without triggering a constant arms race against bot detection.
The web was not designed for agents. Within organizations, the context agents need is often not easily accessible or even readily available. These are data quality problems that can be solved and infrastructure problems that we are actively solving. What we as a society ultimately need to decide is whether we are ready to welcome AI agents as autonomous helpers or if we will continue to erect barriers that keep them from reaching their full potential.