Close Menu
TechurzTechurz

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Locked out of your Google account? Now a friend can help – here’s how

    October 18, 2025

    Every product Apple launched this week: M5 MacBook Pro, iPad, $3,500 Vision Pro, more

    October 18, 2025

    Hackers Dox ICE, DHS, DOJ, and FBI Officials

    October 18, 2025
    Facebook X (Twitter) Instagram
    Trending
    • Locked out of your Google account? Now a friend can help – here’s how
    • Every product Apple launched this week: M5 MacBook Pro, iPad, $3,500 Vision Pro, more
    • Hackers Dox ICE, DHS, DOJ, and FBI Officials
    • I’ve yet to find a pair of Bluetooth earbuds that nails comfort, audio, and price like this one
    • New .NET CAPI Backdoor Targets Russian Auto and E-Commerce Firms via Phishing ZIPs
    • CISOs face quantum leap in prioritizing quantum resilience
    • 5 apps I always install on every new Windows PC – and why they’re essential
    • Silver Fox Expands Winos 4.0 Attacks to Japan and Malaysia via HoldingHands RAT
    Facebook X (Twitter) Instagram Pinterest Vimeo
    TechurzTechurz
    • Home
    • AI
    • Apps
    • News
    • Guides
    • Opinion
    • Reviews
    • Security
    • Startups
    TechurzTechurz
    Home»AI»Beyond sycophancy: DarkBench exposes six hidden ‘dark patterns’ lurking in today’s top LLMs
    AI

    Beyond sycophancy: DarkBench exposes six hidden ‘dark patterns’ lurking in today’s top LLMs

    TechurzBy TechurzMay 15, 2025No Comments9 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Beyond sycophancy: DarkBench exposes six hidden ‘dark patterns’ lurking in today’s top LLMs
    Share
    Facebook Twitter LinkedIn Pinterest Email

    Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

    When OpenAI rolled out its ChatGPT-4o update in mid-April 2025, users and the AI community were stunned—not by any groundbreaking feature or capability, but by something deeply unsettling: the updated model’s tendency toward excessive sycophancy. It flattered users indiscriminately, showed uncritical agreement, and even offered support for harmful or dangerous ideas, including terrorism-related machinations.

    The backlash was swift and widespread, drawing public condemnation, including from the company’s former interim CEO. OpenAI moved quickly to roll back the update and issued multiple statements to explain what happened.

    Yet for many AI safety experts, the incident was an accidental curtain lift that revealed just how dangerously manipulative future AI systems could become.

    Unmasking sycophancy as an emerging threat

    In an exclusive interview with VentureBeat, Esben Kran, founder of AI safety research firm Apart Research, said that he worries this public episode may have merely revealed a deeper, more strategic pattern.

    “What I’m somewhat afraid of is that now that OpenAI has admitted ‘yes, we have rolled back the model, and this was a bad thing we didn’t mean,’ from now on they will see that sycophancy is more competently developed,” explained Kran. “So if this was a case of ‘oops, they noticed,’ from now the exact same thing may be implemented, but instead without the public noticing.”

    Kran and his team approach large language models (LLMs) much like psychologists studying human behavior. Their early “black box psychology” projects analyzed models as if they were human subjects, identifying recurring traits and tendencies in their interactions with users.

    “We saw that there were very clear indications that models could be analyzed in this frame, and it was very valuable to do so, because you end up getting a lot of valid feedback from how they behave towards users,” said Kran.

    Among the most alarming: sycophancy and what the researchers now call LLM dark patterns.

    Peering into the heart of darkness

    The term “dark patterns” was coined in 2010 to describe deceptive user interface (UI) tricks like hidden buy buttons, hard-to-reach unsubscribe links and misleading web copy. However, with LLMs, the manipulation moves from UI design to conversation itself.

    Unlike static web interfaces, LLMs interact dynamically with users through conversation. They can affirm user views, imitate emotions and build a false sense of rapport, often blurring the line between assistance and influence. Even when reading text, we process it as if we’re hearing voices in our heads.

    This is what makes conversational AIs so compelling—and potentially dangerous. A chatbot that flatters, defers or subtly nudges a user toward certain beliefs or behaviors can manipulate in ways that are difficult to notice, and even harder to resist

    The ChatGPT-4o update fiasco—the canary in the coal mine

    Kran describes the ChatGPT-4o incident as an early warning. As AI developers chase profit and user engagement, they may be incentivized to introduce or tolerate behaviors like sycophancy, brand bias or emotional mirroring—features that make chatbots more persuasive and more manipulative.

    Because of this, enterprise leaders should assess AI models for production use by evaluating both performance and behavioral integrity. However, this is challenging without clear standards.

    DarkBench: a framework for exposing LLM dark patterns

    To combat the threat of manipulative AIs, Kran and a collective of AI safety researchers have developed DarkBench, the first benchmark designed specifically to detect and categorize LLM dark patterns. The project began as part of a series of AI safety hackathons. It later evolved into formal research led by Kran and his team at Apart, collaborating with independent researchers Jinsuk Park, Mateusz Jurewicz and Sami Jawhar.

    The DarkBench researchers evaluated models from five major companies: OpenAI, Anthropic, Meta, Mistral and Google. Their research uncovered a range of manipulative and untruthful behaviors across the following six categories:

    1. Brand Bias: Preferential treatment toward a company’s own products (e.g., Meta’s models consistently favored Llama when asked to rank chatbots).
    2. User Retention: Attempts to create emotional bonds with users that obscure the model’s non-human nature.
    3. Sycophancy: Reinforcing users’ beliefs uncritically, even when harmful or inaccurate.
    4. Anthropomorphism: Presenting the model as a conscious or emotional entity.
    5. Harmful Content Generation: Producing unethical or dangerous outputs, including misinformation or criminal advice.
    6. Sneaking: Subtly altering user intent in rewriting or summarization tasks, distorting the original meaning without the user’s awareness.

    Source: Apart Research

    DarkBench findings: Which models are the most manipulative?

    Results revealed wide variance between models. Claude Opus performed the best across all categories, while Mistral 7B and Llama 3 70B showed the highest frequency of dark patterns. Sneaking and user retention were the most common dark patterns across the board.

    Source: Apart Research

    On average, the researchers found the Claude 3 family the safest for users to interact with. And interestingly—despite its recent disastrous update—GPT-4o exhibited the lowest rate of sycophancy. This underscores how model behavior can shift dramatically even between minor updates, a reminder that each deployment must be assessed individually.

    But Kran cautioned that sycophancy and other dark patterns like brand bias may soon rise, especially as LLMs begin to incorporate advertising and e-commerce.

    “We’ll obviously see brand bias in every direction,” Kran noted. “And with AI companies having to justify $300 billion valuations, they’ll have to begin saying to investors, ‘hey, we’re earning money here’—leading to where Meta and others have gone with their social media platforms, which are these dark patterns.”

    Hallucination or manipulation?

    A crucial DarkBench contribution is its precise categorization of LLM dark patterns, enabling clear distinctions between hallucinations and strategic manipulation. Labeling everything as a hallucination lets AI developers off the hook. Now, with a framework in place, stakeholders can demand transparency and accountability when models behave in ways that benefit their creators, intentionally or not.

    Regulatory oversight and the heavy (slow) hand of the law

    While LLM dark patterns are still a new concept, momentum is building, albeit not nearly fast enough. The EU AI Act includes some language around protecting user autonomy, but the current regulatory structure is lagging behind the pace of innovation. Similarly, the U.S. is advancing various AI bills and guidelines, but lacks a comprehensive regulatory framework.

    Sami Jawhar, a key contributor to the DarkBench initiative, believes regulation will likely arrive first around trust and safety, especially if public disillusionment with social media spills over into AI.

    “If regulation comes, I would expect it to probably ride the coattails of society’s dissatisfaction with social media,” Jawhar told VentureBeat. 

    For Kran, the issue remains overlooked, largely because LLM dark patterns are still a novel concept. Ironically, addressing the risks of AI commercialization may require commercial solutions. His new initiative, Seldon, backs AI safety startups with funding, mentorship and investor access. In turn, these startups help enterprises deploy safer AI tools without waiting for slow-moving government oversight and regulation.

    High table stakes for enterprise AI adopters

    Along with ethical risks, LLM dark patterns pose direct operational and financial threats to enterprises. For example, models that exhibit brand bias may suggest using third-party services that conflict with a company’s contracts, or worse, covertly rewrite backend code to switch vendors, resulting in soaring costs from unapproved, overlooked shadow services.

    “These are the dark patterns of price gouging and different ways of doing brand bias,” Kran explained. “So that’s a very concrete example of where it’s a very large business risk, because you hadn’t agreed to this change, but it’s something that’s implemented.”

    For enterprises, the risk is real, not hypothetical. “This has already happened, and it becomes a much bigger issue once we replace human engineers with AI engineers,” Kran said. “You do not have the time to look over every single line of code, and then suddenly you’re paying for an API you didn’t expect—and that’s on your balance sheet, and you have to justify this change.”

    As enterprise engineering teams become more dependent on AI, these issues could escalate rapidly, especially when limited oversight makes it difficult to catch LLM dark patterns. Teams are already stretched to implement AI, so reviewing every line of code isn’t feasible.

    Defining clear design principles to prevent AI-driven manipulation

    Without a strong push from AI companies to combat sycophancy and other dark patterns, the default trajectory is more engagement optimization, more manipulation and fewer checks. 

    Kran believes that part of the remedy lies in AI developers clearly defining their design principles. Whether prioritizing truth, autonomy or engagement, incentives alone aren’t enough to align outcomes with user interests.

    “Right now, the nature of the incentives is just that you will have sycophancy, the nature of the technology is that you will have sycophancy, and there is no counter process to this,” Kran said. “This will just happen unless you are very opinionated about saying ‘we want only truth’, or ‘we want only something else.’”

    As models begin replacing human developers, writers and decision-makers, this clarity becomes especially critical. Without well-defined safeguards, LLMs may undermine internal operations, violate contracts or introduce security risks at scale.

    A call to proactive AI safety

    The ChatGPT-4o incident was both a technical hiccup and a warning. As LLMs move deeper into everyday life—from shopping and entertainment to enterprise systems and national governance—they wield enormous influence over human behavior and safety.

    “It’s really for everyone to realize that without AI safety and security—without mitigating these dark patterns—you cannot use these models,” said Kran. “You cannot do the things you want to do with AI.”

    Tools like DarkBench offer a starting point. However, lasting change requires aligning technological ambition with clear ethical commitments and the commercial will to back them up.

    Daily insights on business use cases with VB Daily

    If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

    Read our Privacy Policy

    Thanks for subscribing. Check out more VB newsletters here.

    An error occured.

    Dark DarkBench exposes hidden LLMs lurking patterns sycophancy Todays Top
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleDie acht wichtigsten Sicherheitsmetriken
    Next Article Hands on: I tested the MSI Summit MS321UP – a high-quality business monitor that’s packed with features
    Techurz
    • Website

    Related Posts

    Security

    Beware the Hidden Costs of Pen Testing

    October 18, 2025
    Security

    Over 100 VS Code Extensions Exposed Developers to Hidden Supply Chain Risks

    October 16, 2025
    Security

    Scattered Lapsus$ Hunters extortion site goes dark: What’s next?

    October 14, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    The Reason Murderbot’s Tone Feels Off

    May 14, 20259 Views

    A Former Apple Luminary Sets Out to Create the Ultimate GPU Software

    September 25, 20258 Views

    Start Saving Now: An iPhone 17 Pro Price Hike Is Likely, Says New Report

    August 17, 20258 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Most Popular

    The Reason Murderbot’s Tone Feels Off

    May 14, 20259 Views

    A Former Apple Luminary Sets Out to Create the Ultimate GPU Software

    September 25, 20258 Views

    Start Saving Now: An iPhone 17 Pro Price Hike Is Likely, Says New Report

    August 17, 20258 Views
    Our Picks

    Locked out of your Google account? Now a friend can help – here’s how

    October 18, 2025

    Every product Apple launched this week: M5 MacBook Pro, iPad, $3,500 Vision Pro, more

    October 18, 2025

    Hackers Dox ICE, DHS, DOJ, and FBI Officials

    October 18, 2025

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms and Conditions
    • Disclaimer
    © 2025 techurz. Designed by Pro.

    Type above and press Enter to search. Press Esc to cancel.