Google’s new diffusion AI agent mimics human writing to improve enterprise research

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now

Google researchers have developed a new framework for AI research agents that outperforms leading systems from rivals OpenAI, Perplexity and others on key benchmarks.

The new agent, called Test-Time Diffusion Deep Researcher (TTD-DR), is inspired by the way humans write by going through a process of drafting, searching for information, and making iterative revisions.

The system uses diffusion mechanisms and evolutionary algorithms to produce more comprehensive and accurate research on complex topics.

For enterprises, this framework could power a new generation of bespoke research assistants for high-value tasks that standard retrieval augmented generation (RAG) systems struggle with, such as generating a competitive analysis or a market entry report.

AI Scaling Hits Its Limits

Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are:

Turning energy into a strategic advantage

Architecting efficient inference for real throughput gains

Unlocking competitive ROI with sustainable AI systems

Secure your spot to stay ahead: https://bit.ly/4mwGngO

According to the paper’s authors, these real-world business use cases were the primary target for the system.

The limits of current deep research agents

Deep research (DR) agents are designed to tackle complex queries that go beyond a simple search. They use large language models (LLMs) to plan, use tools like web search to gather information, and then synthesize the findings into a detailed report with the help of test-time scaling techniques such as chain-of-thought (CoT), best-of-N sampling, and Monte-Carlo Tree Search.

However, many of these systems have fundamental design limitations. Most publicly available DR agents apply test-time algorithms and tools without a structure that mirrors human cognitive behavior. Open-source agents often follow a rigid linear or parallel process of planning, searching, and generating content, making it difficult for the different phases of the research to interact with and correct each other.

Example of linear research agent Source: arXiv

This can cause the agent to lose the global context of the research and miss critical connections between different pieces of information.

As the paper’s authors note, “This indicates a fundamental limitation in current DR agent work and highlights the need for a more cohesive, purpose-built framework for DR agents that imitates or surpasses human research capabilities.”

A new approach inspired by human writing and diffusion

Unlike the linear process of most AI agents, human researchers work in an iterative manner. They typically start with a high-level plan, create an initial draft, and then engage in multiple revision cycles. During these revisions, they search for new information to strengthen their arguments and fill in gaps.

Google’s researchers observed that this human process could be emulated using a diffusion model augmented with a retrieval component. (Diffusion models are often used in image generation. They begin with a noisy image and gradually refine it until it becomes a detailed image.)

As the researchers explain, “In this analogy, a trained diffusion model initially generates a noisy draft, and the denoising module, aided by retrieval tools, revises this draft into higher-quality (or higher-resolution) outputs.”

TTD-DR is built on this blueprint. The framework treats the creation of a research report as a diffusion process, where an initial, “noisy” draft is progressively refined into a polished final report.

TTD-DR uses an iterative approach to refine its initial research plan Source: arXiv

This is achieved through two core mechanisms. The first, which the researchers call “Denoising with Retrieval,” starts with a preliminary draft and iteratively improves it. In each step, the agent uses the current draft to formulate new search queries, retrieves external information, and integrates it to “denoise” the report by correcting inaccuracies and adding detail.

The second mechanism, “Self-Evolution,” ensures that each component of the agent (the planner, the question generator, and the answer synthesizer) independently optimizes its own performance. In comments to VentureBeat, Rujun Han, research scientist at Google and co-author of the paper, explained that this component-level evolution is crucial because it makes the “report denoising more effective.” This is akin to an evolutionary process where each part of the system gets progressively better at its specific task, providing higher-quality context for the main revision process.

Each of the components in TTD-DR use evolutionary algorithms to sample and refine multiple responses in parallel and finally combine them to create a final answer Source: arXiv

“The intricate interplay and synergistic combination of these two algorithms are crucial for achieving high-quality research outcomes,” the authors state. This iterative process directly results in reports that are not just more accurate, but also more logically coherent. As Han notes, since the model was evaluated on helpfulness, which includes fluency and coherence, the performance gains are a direct measure of its ability to produce well-structured business documents.

According to the paper, the resulting research companion is “capable of generating helpful and comprehensive reports for complex research questions across diverse industry domains, including finance, biomedical, recreation, and technology,” putting it in the same class as deep research products from OpenAI, Perplexity, and Grok.

TTD-DR in action

To build and test their framework, the researchers used Google’s Agent Development Kit (ADK), an extensible platform for orchestrating complex AI workflows, with Gemini 2.5 Pro as the core LLM (though you can swap it for other models).

They benchmarked TTD-DR against leading commercial and open-source systems, including OpenAI Deep Research, Perplexity Deep Research, Grok DeepSearch, and the open-source GPT-Researcher.

The evaluation focused on two main areas. For generating long-form comprehensive reports, they used the DeepConsult benchmark, a collection of business and consulting-related prompts, alongside their own LongForm Research dataset. For answering multi-hop questions that require extensive search and reasoning, they tested the agent on challenging academic and real-world benchmarks like Humanity’s Last Exam (HLE) and GAIA.

The results showed TTD-DR consistently outperforming its competitors. In side-by-side comparisons with OpenAI Deep Research on long-form report generation, TTD-DR achieved win rates of 69.1% and 74.5% on two different datasets. It also surpassed OpenAI’s system on three separate benchmarks that required multi-hop reasoning to find concise answers, with performance gains of 4.8%, 7.7%, and 1.7%.

TTD-DR outperforms other deep research agents on key benchmarks Source: arXiv

The future of test-time diffusion

While the current research focuses on text-based reports using web search, the framework is designed to be highly adaptable. Han confirmed that the team plans to extend the work to incorporate more tools for complex enterprise tasks.

A similar “test-time diffusion” process could be used to generate complex software code, create a detailed financial model, or design a multi-stage marketing campaign, where an initial “draft” of the project is iteratively refined with new information and feedback from various specialized tools.

“All of these tools can be naturally incorporated in our framework,” Han said, suggesting that this draft-centric approach could become a foundational architecture for a wide range of complex, multi-step AI agents.

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

What's Hot

Elon Musk’s last co-founder reportedly leaves xAI

From Moon hotels to cattle herding: 8 startups investors chased at YC Demo Day

Aetherflux reportedly raising Series B at $2 billion valuation

Consumer-focused privacy company Cloaked raises $375M as it expands to enterprise

Wiz investor unpacks Google’s $32B acquisition

Gumloop lands $50M from Benchmark to turn every employee into an AI agent builder

College social app Fizz expands into grocery delivery

A Former Apple Luminary Sets Out to Create the Ultimate GPU Software

The Reason Murderbot’s Tone Feels Off

Most Popular

College social app Fizz expands into grocery delivery

A Former Apple Luminary Sets Out to Create the Ultimate GPU Software

The Reason Murderbot’s Tone Feels Off

Our Picks

Elon Musk’s last co-founder reportedly leaves xAI

From Moon hotels to cattle herding: 8 startups investors chased at YC Demo Day

Aetherflux reportedly raising Series B at $2 billion valuation

Subscribe to Updates

What's Hot

Google’s new diffusion AI agent mimics human writing to improve enterprise research

The limits of current deep research agents

A new approach inspired by human writing and diffusion

TTD-DR in action

The future of test-time diffusion

Related Posts

Subscribe to Updates