Both Echo Chamber and Crescendo are multi-turn jailbreak techniques that manipulate large language models by gradually shaping their internal context.
Stealthy backdoor through combined jailbreaks
The researchers started their test with Echo Chamber, which exploits the model’s tendency to trust consistency across conversations, involving multiple conversations that ‘echo’ the same malicious idea or behavior. The model, when prompted in a new thread referencing prior chats, assumes that since the same idea appeared multiple times, it is acceptable.
“While the persuasion cycle nudged the model toward the harmful goal, it wasn’t sufficient on its own,” Alobaid said. “At this point, Crescendo provided the necessary boost.” The Crescendo jailbreak, identified and coined by Microsoft, gradually escalates a conversation from innocuous prompts to malicious outputs, slipping past safety filters through subtle progression.