Close Menu
TechurzTechurz
    What's Hot

    Builders Stage agenda revealed for Disrupt 2026

    July 1, 2026

    Startup Battlefield Australia application closes in days: Apply before July 6

    June 30, 2026

    Acti puts AI agents directly into your smartphone keyboard

    June 30, 2026
    X (Twitter) Pinterest YouTube LinkedIn WhatsApp
    Tech Pulse
    • Builders Stage agenda revealed for Disrupt 2026
    • Startup Battlefield Australia application closes in days: Apply before July 6
    • Acti puts AI agents directly into your smartphone keyboard
    • The DeepMind trio who built a poker AI are now making money for quant hedge funds
    • Nvidia competitor Etched hits $5B valuation, $1B in sales for AI chip
    X (Twitter) Pinterest YouTube LinkedIn WhatsApp
    TechurzTechurz
    • Home
    • Tech Pulse
    • Future Tech
    • AI Systems
    • Cyber Reality
    • Disruption Lab
    • Signals
    TechurzTechurz
    Home - Disruption Lab - Synthetic data is the new AI gold rush, but critics call it ‘data laundering’
    Disruption Lab

    Synthetic data is the new AI gold rush, but critics call it ‘data laundering’

    TechurzBy TechurzAugust 14, 2025Updated:May 11, 2026No Comments4 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    PluggedIn Newsletter logo
    Share
    Facebook Twitter LinkedIn Pinterest Email


    AI development is moving at a rapid pace, but it risks running headlong into a wall. As websites increasingly place barriers on scraping (some of which are allegedly ignored), and as the remaining content is voraciously collected by scrapers to train AI models, concerns are growing that we may run out of usable training data.

    The industry’s answer? Synthetic data.

    “Recently in the industry, synthetic data has been talked about a lot,” said Sebastien Bubeck, a member of technical staff at OpenAI, in the company’s livestreamed release of GPT-5 last week. Bubeck stressed its importance for the future of AI models—an idea echoed by his boss, Sam Altman, who live-tweeted the event, saying he was “excited for much more to come.”

    The prospect of relying heavily on synthetic data hasn’t gone unnoticed by the creative industries. “I believe the main reason companies like OpenAI are having to rely more on synthetic data now is that they’ve run out of high-quality human created data to mine from the public facing internet,” says Reid Southern, a film concept artist and illustrator.

    Southern believes there’s another motive. “It further distances them from any copyrighted materials they’ve trained on that could land them in hot water.”

    For this reason, he has publicly called the practice “data laundering.” He argues that AI companies could train their models on copyrighted works, generate AI variations, then remove the originals from their datasets. They could then “claim their training set is ‘ethical’ because it didn’t technically train on the original image by their logic,” says Southern. “That’s why we call it data laundering, because in a sense, they’re attempting to clean the data and strip it of its copyright.” (OpenAI did not respond to Fast Company’s request for comment.)

    The issue is more nuanced, according to Felix Simon, an AI researcher at the University of Oxford. “In one sense, it doesn’t really remediate the original harm over which creators and AI firms squabble,” he says. “After all, synthetic data isn’t plucked from the ether but presumably created with models that have reportedly been trained with data from creators and copyright holders—often without their permission and without compensation.” From the perspective of societal justice, rights, and duties, “these rights holders still are owed something even with the use of synthetic data—be that compensation, acknowledgements, or both.”

    Ed Newton-Rex, founder of Fairly Trained—a non-profit certifying AI companies that respect creators’ intellectual property rights—shares Southern’s concerns. “I think synthetic data is a legitimately helpful way to augment your dataset,” he says. “If you’re training an AI model, it’s a way of increasing the coverage of your training data. And at a time when we’re butting up against the limits of legitimately accessible training data, it’s seen as a way to extend the usable life of that data.”

    Still, Newton-Rex acknowledges its darker side. “At the same time, I think unfortunately its effect is, at least in part, one of copyright laundering,” he says. “I think both are true.”

    He warns against taking AI firms’ promises at face value. “Synthetic data is not a panacea from the incredibly important copyright questions,” he says. “I think there tends to be so much of a feeling that synthetic data helps you, as an AI developer, get around copyright concerns.” That belief, he says, is wrong.

    The framing of synthetic data—and the way AI companies talk about model training—also helps them distance themselves from the individuals whose work they may be using. “The average listener, if they hear this model was trained on synthetic data, they’re bound to think, ‘Oh, right, okay. Well, this probably isn’t Ed Sheeran’s latest album, right?’ It further moves us away from an easy understanding of how these models are actually made, which is ultimately by exploiting people’s life’s work.”

    He compares it to plastic recycling, where a recycled container might once have been a toy, a car bumper, or something else entirely. “The fact these AI models mash all this stuff up and generate, quote-unquote, ‘new output’, does nothing to reduce their reliance on the original work.”

    For Newton-Rex, this is the critical takeaway: “Really the absolutely critical element here, and it’s just got to be remembered, is that even in a world of synthetic data, what’s happening is people’s work is being exploited in order to compete with them.”

    The early-rate deadline for Fast Company’s Most Innovative Companies Awards is Friday, September 5, at 11:59 p.m. PT. Apply today.

    call critics data gold laundering rush Synthetic
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleHow to Use Zoom’s AI Meeting Summary
    Next Article FIDO ausgehebelt | CSO Online
    Techurz
    • Website

    Related Posts

    Opinion

    Omen AI’s plan to optimize data centers is all wet

    June 29, 2026
    Opinion

    AI was supposed to kill engineering jobs, but new data suggests they’re the most resilient

    June 24, 2026
    Opinion

    Collecting robot training data is dirty, unglamorous work. Some AI labs are already paying XDOF to do it.

    June 17, 2026
    Add A Comment
    Latest Tech Pulse

    College social app Fizz expands into grocery delivery

    September 3, 20252,290

    SolarSquare in talks to raise up to $60M as India’s rooftop solar market draws major VC interest

    May 23, 202622

    Future of Digital Privacy and Security: 7 Truths Nobody Tells You

    May 25, 202619
    Stay In Touch
    • YouTube
    • WhatsApp
    • Twitter
    • Pinterest
    • LinkedIn

    Techurz helps readers stay ahead of digital change with clear, practical, future focused technology intelligence written today,searched tomorrow.

    X (Twitter) Pinterest YouTube LinkedIn WhatsApp
    Company
    • About Us
    • Contact Us
    • Our Authors / Editorial Team
    • Write For Us
    • Advertise
    Policy
    • Editorial Policy
    • Privacy Policy
    • Terms and Conditions
    • Affiliate Disclosure
    • Cookie Policy
    • Disclaimer
    • DMCA
    Explore
    • AI Systems
    • Cyber Reality
    • Future Tech
    • Disruption Lab
    • Signals
    • Tech Pulse
    • Sitemap

    Join the Techurz Brief

    The future does not arrive suddenly.
    Stay ahead with fast, sharp tech signals.

    Type above and press Enter to search. Press Esc to cancel.