Enterprises are not short of ambition when it comes to AI.
Across industries, teams are building agentic workflows, experimenting with large language models, and launching proofs-of-concept. Yet comparatively few of those prototypes make it to stable, scalable production.
According to Nikunj Bajaj, CEO and Co-Founder at TrueFoundry, that gap is not about model quality. It is about operational rigor.
“There are too many flashy prototypes out there that demonstrate capability, but aren’t built to survive the realities of production,” Bajaj says.
The Prototype Trap
In a prototype environment, common practical roadblocks may exert their influence. Input is clean, models behave consistently, and perhaps most importantly of all, users are cooperative.
The realities of production are more unforgiving. What’s more, reliance on shared infrastructure introduces dependencies that sit entirely outside the organization’s control.
Agentic systems compound these risks. Large language models are probabilistic by design. When layered into multi-step planning and execution workflows, that non-determinism multiplies.
“Enterprises are depending on shared external systems that are not yet fully time-tested,” Bajaj explains. “That introduces a completely different class of risk compared to traditional machine learning.”
In older ML environments, organizations largely controlled the full stack. Today, they are composing systems from models, guardrails, MCP servers, and APIs across multiple providers. That flexibility drives innovation, but it also increases fragility.
Observability Cannot Be Optional
One of the most critical gaps Bajaj sees is observability.
In many AI systems, logging is treated as a developer responsibility. That is insufficient for production-grade deployments.
“Observability should not depend on developers remembering to add logging. It has to be the default of the system.”
Bajaj argues that AI traffic must flow through a single, enforceable control point. He likens it to building security. If there is only one entrance, you can enforce checks consistently. If there are multiple uncontrolled entry points, governance breaks down.
An AI gateway, in this model, becomes the “single pipe” through which model calls, agent actions, and integrations pass. That architecture allows organizations to enforce authentication, authorization, guardrails, cost controls, and logging by default rather than by discretion.
For heavily regulated industries such as financial services and healthcare, this is particularly urgent. Fine-grained access control must extend not only to models, but also to MCP servers and downstream integrations that may access or modify sensitive datasets.
Data residency adds another layer of complexity. LLM trace logs often contain personal or proprietary information. Centralizing those traces for analytics can inadvertently violate regional data regulations if not carefully architected.
“The boundaries between internal and external governance are blurring,” Bajaj says. “If external vendors are processing your AI traffic, they must adhere to the same policies you apply internally.”
A Zero-Trust Mindset for AI
The growing ecosystem of third-party AI components introduces another challenge: trust.
Bajaj recommends leaders adopt a posture similar to zero-trust security models.
“I would actually recommend leaders start from a position of not trusting, not intent, but capability.”
This does not imply bad faith. Rather, it reflects the reality that AI systems are evolving rapidly, and integration risks are real.
API key sprawl is one example. As organizations connect to multiple model providers, developers often distribute and reuse credentials without consistent rotation policies. Practices that would have been unacceptable in traditional cybersecurity contexts are becoming normalized in AI experimentation.
Prompt injection risks present another emerging threat. When AI systems are enabled to perform web searches or ingest external content, attackers can embed malicious instructions within seemingly legitimate data. Without robust, multi-layer guardrails, these risks escalate quickly.
Bajaj cautions against relying solely on a single cloud provider’s built-in safeguards. Instead, he recommends combining best-in-class guardrails across providers and enforcing consistent policies at the system level.
Architecture as Strategy
Beyond governance, architecture decisions will increasingly determine which AI initiatives scale sustainably.
On the model side, Bajaj sees a clear pattern. Closed-source large language models are often ideal for rapid prototyping. However, at enterprise scale, cost dynamics shift.
Organizations running high-volume workloads may benefit from routing certain queries to smaller, specialized language models deployed on-premises or on dedicated cloud GPUs. Intelligent model routing, prompt compression, and semantic caching can significantly reduce cost without sacrificing performance.
“Cost optimization only becomes critical once you’re operating at scale,” Bajaj notes. “But once you are, you need meaningful cost observability to know exactly where spend is coming from.”
That visibility enables leaders to prevent cost leakage, for example, by enforcing usage guardrails in development environments or limiting runaway batch jobs.
Hybrid architectures are also becoming more common. Some sensitive datasets must remain on-premises due to regulatory or business constraints. At the same time, organizations do not want to forgo access to the latest external AI capabilities.
This creates a more nuanced decision matrix than a simple on-prem versus cloud debate. Leaders must consider their control plane, compute plane, and data plane separately, and design for fine-grained access control across all three.
Closing the Expectation Gap
Perhaps the most important message Bajaj offers is about expectations.
There is widespread belief that fully autonomous agents will soon run complex enterprise workflows end to end. Bajaj believes that timeline is unrealistic in the medium term.
Attempting full automation too quickly risks disillusionment.
Instead, he recommends a “hero use case” strategy. Select two or three high-value workflows. Identify deterministic components within those workflows. Define strict guardrails. Gradually automate bounded segments rather than entire processes.
“Instead of trying to automate everything, pick three high-value workflows and define the bounds within which agents operate.”
Crucially, leaders should not assume that simply connecting AI systems to all available data via MCP servers will produce intelligent orchestration. Business context must still be explicitly encoded. Workflows must be designed and supervised.
In other words, AI does not remove the need for management discipline. It amplifies the need for it.
__
Join us at the CDAO New York on June 9, 2026) to hear more from TrueFoundry and other innovators building production-ready AI systems. Discover how leading financial institutions are turning AI ambition into operational value.