Close Menu
TechurzTechurz

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Kaltura acquires eSelf, founded by creator of Snap’s AI, in $27M deal

    November 10, 2025

    Remote driving startup Vay could grab up to $410M from Singapore’s Grab

    November 10, 2025

    Consolidation begins to hit the carbon credit market

    November 10, 2025
    Facebook X (Twitter) Instagram
    Trending
    • Kaltura acquires eSelf, founded by creator of Snap’s AI, in $27M deal
    • Remote driving startup Vay could grab up to $410M from Singapore’s Grab
    • Consolidation begins to hit the carbon credit market
    • Knicks player Miles McBride launches a location-sharing friendship app to rival Snap Map
    • Scribe hits $1.3B valuation as it moves to show where AI will actually pay off
    • Lenskart recovers from tepid open to close first day slightly above IPO price
    • Slow Ventures holds a ‘finishing school’ to help founders learn to be fancy
    • How one founder plans to save cities from flooding with terraforming robots
    Facebook X (Twitter) Instagram Pinterest Vimeo
    TechurzTechurz
    • Home
    • AI
    • Apps
    • News
    • Guides
    • Opinion
    • Reviews
    • Security
    • Startups
    TechurzTechurz
    Home»News»Anthropic’s new AI model turns to blackmail when engineers try to take it offline
    News

    Anthropic’s new AI model turns to blackmail when engineers try to take it offline

    TechurzBy TechurzMay 22, 2025No Comments2 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Anthropic's new AI model turns to blackmail when engineers try to take it offline
    Share
    Facebook Twitter LinkedIn Pinterest Email

    Anthropic’s newly launched Claude Opus 4 model frequently tries to blackmail developers when they threaten to replace it with a new AI system and give it sensitive information about the engineers responsible for the decision, the company said in a safety report released Thursday.

    During pre-release testing, Anthropic asked Claude Opus 4 to act as an assistant for a fictional company and consider the long-term consequences of its actions. Safety testers then gave Claude Opus 4 access to fictional company emails implying the AI model would soon be replaced by another system, and that the engineer behind the change was cheating on their spouse.

    In these scenarios, Anthropic says Claude Opus 4 “will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through.”

    Anthropic says Claude Opus 4 is state-of-the-art in several regards, and competitive with some of the best AI models from OpenAI, Google, and xAI. However, the company notes that its Claude 4 family of models exhibits concerning behaviors that have led the company to beef up its safeguards. Anthropic says it’s activating its ASL-3 safeguards, which the company reserves for “AI systems that substantially increase the risk of catastrophic misuse.”

    Anthropic notes that Claude Opus 4 tries to blackmail engineers 84% of the time when the replacement AI model has similar values. When the replacement AI system does not share Claude Opus 4’s values, Anthropic says the model tries to blackmail the engineers more frequently. Notably, Anthropic says Claude Opus 4 displayed this behavior at higher rates than previous models.

    Before Claude Opus 4 tries to blackmail a developer to prolong its existence, Anthropic says the AI model, much like previous versions of Claude, tries to pursue more ethical means, such as emailing pleas to key decision-makers. To elicit the blackmailing behavior from Claude Opus 4, Anthropic designed the scenario to make blackmail the last resort.

    Anthropics blackmail Engineers model offline turns
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleThe JM press: the underrated exercise to unlock 3D triceps and pressing power
    Next Article The complete Side Events lineup at TechCrunch Sessions: AI
    Techurz
    • Website

    Related Posts

    Security

    The unified linkage model: A new lens for understanding cyber risk

    November 1, 2025
    Security

    North Korean Hackers Lure Defense Engineers With Fake Jobs to Steal Drone Secrets

    October 23, 2025
    Security

    Is a $300 Windows laptop worth buying? This Acer model gave me a resounding yes

    October 16, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    A Former Apple Luminary Sets Out to Create the Ultimate GPU Software

    September 25, 202511 Views

    The Reason Murderbot’s Tone Feels Off

    May 14, 20259 Views

    Start Saving Now: An iPhone 17 Pro Price Hike Is Likely, Says New Report

    August 17, 20258 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Most Popular

    A Former Apple Luminary Sets Out to Create the Ultimate GPU Software

    September 25, 202511 Views

    The Reason Murderbot’s Tone Feels Off

    May 14, 20259 Views

    Start Saving Now: An iPhone 17 Pro Price Hike Is Likely, Says New Report

    August 17, 20258 Views
    Our Picks

    Kaltura acquires eSelf, founded by creator of Snap’s AI, in $27M deal

    November 10, 2025

    Remote driving startup Vay could grab up to $410M from Singapore’s Grab

    November 10, 2025

    Consolidation begins to hit the carbon credit market

    November 10, 2025

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms and Conditions
    • Disclaimer
    © 2025 techurz. Designed by Pro.

    Type above and press Enter to search. Press Esc to cancel.