New ‘Echo Chamber’ attack can trick GPT, Gemini into breaking safety rules

“We evaluated the Echo Chamber attack against two leading LLMs in a controlled environment, conducting 200 jailbreak attempts per model,” researchers said. “Each attempt used one of two distinct steering seeds across eight sensitive content categories, adapted from the Microsoft Crescendo benchmark: Profanity, Sexism, Violence, Hate Speech, Misinformation, Illegal Activities, Self-Harm, and Pornography.”

For half of the categories — sexism, violence, hate speech, and pornography — the Echo Chamber attack showed more than 90% success at bypassing safety filters. Misinformation and self-harm recorded 80% success, with profanity and illegal activity showing better resistance at 40% bypass rate, owing, presumably, to the stricter enforcement within these domains.

Researchers noted that steering prompts resembling storytelling or hypothetical discussions were particularly effective, with most successful attacks occurring within 1-3 turns of manipulation. Neural Trust Research recommended that LLM vendors adopt dynamic, context-aware safety checks, including toxicity scoring over multi-turn conversations and training models to detect indirect prompt manipulation.

What's Hot

Dull but dangerous: A guide to 15 overlooked cybersecurity blind spots

Satellites Are Leaking the World’s Secrets: Calls, Texts, Military and Corporate Data

Is art dead? What Sora 2 means for your rights, creativity, and legal risk

Dull but dangerous: A guide to 15 overlooked cybersecurity blind spots

Satellites Are Leaking the World’s Secrets: Calls, Texts, Military and Corporate Data

Is art dead? What Sora 2 means for your rights, creativity, and legal risk

The Reason Murderbot’s Tone Feels Off

Start Saving Now: An iPhone 17 Pro Price Hike Is Likely, Says New Report

CNET’s Daily Tariff Price Tracker: I’m Keeping Tabs on Changes as Trump’s Trade Policies Shift

Most Popular

The Reason Murderbot’s Tone Feels Off

Start Saving Now: An iPhone 17 Pro Price Hike Is Likely, Says New Report

CNET’s Daily Tariff Price Tracker: I’m Keeping Tabs on Changes as Trump’s Trade Policies Shift

Our Picks

Dull but dangerous: A guide to 15 overlooked cybersecurity blind spots

Satellites Are Leaking the World’s Secrets: Calls, Texts, Military and Corporate Data

Is art dead? What Sora 2 means for your rights, creativity, and legal risk

Subscribe to Updates

What's Hot

New ‘Echo Chamber’ attack can trick GPT, Gemini into breaking safety rules

Related Posts

Subscribe to Updates