Every enterprise AI budget conversation right now follows the same arc. A bold number gets approved—$10 million, $20 million, more—agent infrastructure gets built and then the executive team waits for the productivity transformation they were promised. What they get instead is something more frustrating: real capability and real potential, but results that feel uneven, brittle and harder to trust than expected.
The instinct is to blame the model. The model is rarely the problem.
The problem is that the architecture underneath the model was never designed for what enterprises actually need it to do—and most organizations don’t realize this until they’re already deep into deployment. There’s also a second problem hiding underneath the first: most of that agent infrastructure is running on data the enterprise doesn’t fully control.
The context window math no one is doing
When an AI agent processes a request, it works within a finite space called the context window. Everything the model needs to reason, retrieve and respond must fit inside that space. How that space gets allocated turns out to be the most important architectural decision in enterprise AI and most organizations have left it to default.
Dan Yarmoluk, founder of GraphifyMD who teaches software engineering at the University of St. Thomas, has spent the last year mapping how context window capacity actually gets consumed in enterprise agent deployments. In a recent podcast conversation on AI & Data Horizons by EDB, he calls out a number that should concern every CIO with an active AI initiative.
“If you look at the breakdown of what agents are actually using with all this capacity: 25% are rules and constraints, 30% is orchestration overhead and then 30% is RAG chunks—which are probabilistic guesses—which leaves 15% in the tank for your domain.”
Read that again: In a typical enterprise agentic deployment, roughly 85% of the agent’s working capacity is consumed, before your business’s actual knowledge enters the room.
That 15% is where the reasoning happens. That 15% is what you’re paying for.
Why retrieval isn’t enough
The dominant approach to giving agents access to enterprise knowledge is retrieval-augmented generation, or RAG. The model doesn’t hold all your information; instead, it retrieves relevant chunks at inference time, based on semantic similarity to the query.
RAG is a genuine advance and it works well for pattern-based questions for which “close enough” is an acceptable answer. But enterprise decisions—in clinical development, supply chain, financial modeling, regulatory compliance—rarely involve questions for which proximity to the right answer is sufficient.
The deeper issue is structural. RAG is a probabilistic system being asked to support deterministic reasoning. As Yarmoluk puts it, “I think we could say that more pattern matching on pattern-matching systems is not going to solve hallucinations or drift.”
When your team finds agents impressive at first, then hesitates to trust them, this is usually why. The model is capable. The information it’s reasoning from is a probabilistic approximation of what it should know.
The architectural shift: Intelligence at the data layer
The answer isn’t a bigger context window, though that helps at the margins. It isn’t better prompting, though that matters too. The answer is moving domain knowledge closer to where inference actually happens—serving it from the operational data layer rather than reconstructing it at query time. That data layer needs to be one the enterprise owns, governs and can audit. Sovereignty isn’t a compliance checkbox, it’s an architectural requirement.
Yarmoluk has been developing an approach that treats enterprise knowledge as a structured artifact to be compressed and optimized for machine reasoning, not a corpus to be searched. Imagine a 63,000-word book was distilled into a 20-kilobyte structured knowledge file: small enough to occupy almost none of the context window, precise enough that the model reasons from it rather than retrieves around it.
“Now it doesn’t have to talk about all the layers you already talked about,” Yarmoluk explains. “That is a direction that the agent or the query is going to go down…. We want to use the muscle for the right stuff.”
This is the architectural principle behind EDB Postgres® AI (EDB PG AI). Agents that operate from the operational data layer, where the system of record actually lives, don’t have to guess at domain context. They reason from it directly. The data tier becomes the intelligence tier. And because that tier is sovereign—running under the organization’s governance, on infrastructure it controls—the reasoning is auditable, the outputs are defensible and the risk is manageable.
Agent reliability is a data architecture problem that can be addressed right now, with the infrastructure decisions organizations are making today.
Three questions CIOs should be asking their teams this quarter
If you’re responsible for an enterprise AI deployment, context efficiency and data sovereignty deserve a board-level conversation.
1. What percentage of our context window is reaching domain knowledge? If your architecture team can’t answer this with a specific figure, you’re not managing your most critical AI resource. Establish the baseline before you expand the context window or add more agents.
2. Is our operational data layer agent-accessible, or are we serving copies? There’s a meaningful difference between an agent that reasons from your live system of record and one that retrieves from a vector index built from that system of record. The former is architecturally sound. The latter is two steps removed from ground truth.
3. Are we measuring inference cost per decision, or just token consumption? Token spend is the wrong optimization target. The right metric is what Yarmoluk calls “cost before tokens.” How much did it cost to produce an inference that was actually useful? An agent consuming more tokens to reach a confident, accurate answer may be more cost efficient than one consuming fewer tokens to produce output that requires human review.
The promise of enterprise AI is real. So is the gap between what organizations have built and what they need. The difference will be made by those who treat context efficiency—not model capability—as the variable they control. The organizations that get there first will be the ones who also treated data sovereignty not as a legal requirement to satisfy but an architectural advantage to build on.







