Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    OpenAI co-founder Greg Brockman takes charge of product strategy

    May 17, 2026

    Marketing operating system Nectar Social raises $30M Series A led by Menlo

    May 17, 2026

    The haves and have nots of the AI gold rush

    May 17, 2026
    Facebook X (Twitter) Instagram
    Tech Pulse
    • OpenAI co-founder Greg Brockman takes charge of product strategy
    • Marketing operating system Nectar Social raises $30M Series A led by Menlo
    • The haves and have nots of the AI gold rush
    • Meridian Ventures launched $35M fund to back MBA-deferred founders
    • Lovable just backed a company that’s looking to bring vibe coding to hardware
    X (Twitter) Pinterest YouTube LinkedIn WhatsApp
    Techurz
    • Home
    • AI Systems
    • Cyber Reality
    • Future Tech
    • Disruption Lab
    • Signals
    • Tech Pulse
    Techurz
    Home - News - Confidence in agentic AI: Why eval infrastructure must come first
    News

    Confidence in agentic AI: Why eval infrastructure must come first

    TechurzBy TechurzJuly 2, 2025No Comments6 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Confidence in agentic AI: Why eval infrastructure must come first
    Share
    Facebook Twitter LinkedIn Pinterest Email

    As AI agents enter real-world deployment, organizations are under pressure to define where they belong, how to build them effectively, and how to operationalize them at scale. At VentureBeat’s Transform 2025, tech leaders gathered to talk about how they’re transforming their business with agents: Joanne Chen, general partner at Foundation Capital; Shailesh Nalawadi, VP of project management with Sendbird; Thys Waanders, SVP of AI transformation at Cognigy; and Shawn Malhotra, CTO, Rocket Companies.

    A few top agentic AI use cases

    “The initial attraction of any of these deployments for AI agents tends to be around saving human capital — the math is pretty straightforward,” Nalawadi said. “However, that undersells the transformational capability you get with AI agents.”

    At Rocket, AI agents have proven to be powerful tools in increasing website conversion.

    “We’ve found that with our agent-based experience, the conversational experience on the website, clients are three times more likely to convert when they come through that channel,” Malhotra said.

    But that’s just scratching the surface. For instance, a Rocket engineer built an agent in just two days to automate a highly specialized task: calculating transfer taxes during mortgage underwriting.

    “That two days of effort saved us a million dollars a year in expense,” Malhotra said. “In 2024, we saved more than a million team member hours, mostly off the back of our AI solutions. That’s not just saving expense. It’s also allowing our team members to focus their time on people making what is often the largest financial transaction of their life.”

    Agents are essentially supercharging individual team members. That million hours saved isn’t the entirety of someone’s job replicated many times. It’s fractions of the job that are things employees don’t enjoy doing, or weren’t adding value to the client. And that million hours saved gives Rocket the capacity to handle more business.

    “Some of our team members were able to handle 50% more clients last year than they were the year before,” Malhotra added. “It means we can have higher throughput, drive more business, and again, we see higher conversion rates because they’re spending the time understanding the client’s needs versus doing a lot of more rote work that the AI can do now.”

    Tackling agent complexity

    “Part of the journey for our engineering teams is moving from the mindset of software engineering – write once and test it and it runs and gives the same answer 1,000 times – to the more probabilistic approach, where you ask the same thing of an LLM and it gives different answers through some probability,” Nalawadi said. “A lot of it has been bringing people along. Not just software engineers, but product managers and UX designers.”

    What’s helped is that LLMs have come a long way, Waanders said. If they built something 18 months or two years ago, they really had to pick the right model, or the agent would not perform as expected. Now, he says, we’re now at a stage where most of the mainstream models behave very well. They’re more predictable. But today the challenge is combining models, ensuring responsiveness, orchestrating the right models in the right sequence and weaving in the right data.

    “We have customers that push tens of millions of conversations per year,” Waanders said. “If you automate, say, 30 million conversations in a year, how does that scale in the LLM world? That’s all stuff that we had to discover, simple stuff, from even getting the model availability with the cloud providers. Having enough quota with a ChatGPT model, for example. Those are all learnings that we had to go through, and our customers as well. It’s a brand-new world.”

    A layer above orchestrating the LLM is orchestrating a network of agents, Malhotra said. A conversational experience has a network of agents under the hood, and the orchestrator is deciding which agent to farm the request out to from those available.

    “If you play that forward and think about having hundreds or thousands of agents who are capable of different things, you get some really interesting technical problems,” he said. “It’s becoming a bigger problem, because latency and time matter. That agent routing is going to be a very interesting problem to solve over the coming years.”

    Tapping into vendor relationships

    Up to this point, the first step for most companies launching agentic AI has been building in-house, because specialized tools didn’t yet exist. But you can’t differentiate and create value by building generic LLM infrastructure or AI infrastructure, and you need specialized expertise to go beyond the initial build, and debug, iterate, and improve on what’s been built, as well as maintain the infrastructure.

    “Often we find the most successful conversations we have with prospective customers tend to be someone who’s already built something in-house,” Nalawadi said. “They quickly realize that getting to a 1.0 is okay, but as the world evolves and as the infrastructure evolves and as they need to swap out technology for something new, they don’t have the ability to orchestrate all these things.”

    Preparing for agentic AI complexity

    Theoretically, agentic AI will only grow in complexity — the number of agents in an organization will rise, and they’ll start learning from each other, and the number of use cases will explode. How can organizations prepare for the challenge?

    “It means that the checks and balances in your system will get stressed more,” Malhotra said. “For something that has a regulatory process, you have a human in the loop to make sure that someone is signing off on this. For critical internal processes or data access, do you have observability? Do you have the right alerting and monitoring so that if something goes wrong, you know it’s going wrong? It’s doubling down on your detection, understanding where you need a human in the loop, and then trusting that those processes are going to catch if something does go wrong. But because of the power it unlocks, you have to do it.”

    So how can you have confidence that an AI agent will behave reliably as it evolves?

    “That part is really difficult if you haven’t thought about it at the beginning,” Nalawadi said. “The short answer is, before you even start building it, you should have an eval infrastructure in place. Make sure you have a rigorous environment in which you know what good looks like, from an AI agent, and that you have this test set. Keep referring back to it as you make improvements. A very simplistic way of thinking about eval is that it’s the unit tests for your agentic system.”

    The problem is, it’s non-deterministic, Waanders added. Unit testing is critical, but the biggest challenge is you don’t know what you don’t know — what incorrect behaviors an agent could possibly display, how it might react in any given situation.

    “You can only find that out by simulating conversations at scale, by pushing it under thousands of different scenarios, and then analyzing how it holds up and how it reacts,” Waanders said.

    agentic confidence eval infrastructure
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleNuki Smart Lock review: This retrofit smart lock gives your old deadbolt some key new features
    Next Article Danny Boyle reveals why 28 Years Later was shot on iPhone, the ‘little 3D moments’ it made possible and why the future of cinema is immersive
    Techurz
    • Website

    Related Posts

    Opinion

    Zendesk acquires agentic customer service startup Forethought

    March 12, 2026
    Opinion

    In a vote of confidence for Meta’s Threads, Kalshi adds sharing feature

    March 10, 2026
    Opinion

    Blackstone backs Neysa in up to $1.2B financing as India pushes to build domestic AI infrastructure

    February 16, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    College social app Fizz expands into grocery delivery

    September 3, 20252,288 Views

    A Former Apple Luminary Sets Out to Create the Ultimate GPU Software

    September 25, 202516 Views

    The Reason Murderbot’s Tone Feels Off

    May 14, 202512 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Most Popular

    College social app Fizz expands into grocery delivery

    September 3, 20252,288 Views

    A Former Apple Luminary Sets Out to Create the Ultimate GPU Software

    September 25, 202516 Views

    The Reason Murderbot’s Tone Feels Off

    May 14, 202512 Views
    Our Picks

    OpenAI co-founder Greg Brockman takes charge of product strategy

    May 17, 2026

    Marketing operating system Nectar Social raises $30M Series A led by Menlo

    May 17, 2026

    The haves and have nots of the AI gold rush

    May 17, 2026

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms and Conditions
    • Disclaimer
    © 2026 techurz. Designed by Pro.

    Type above and press Enter to search. Press Esc to cancel.