Close Menu
TechurzTechurz

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Creating a qubit fit for a quantum future

    August 28, 2025

    Anthropic will start training its AI models on chat transcripts

    August 28, 2025

    CrowdStrike buys Onum in agentic SOC push

    August 28, 2025
    Facebook X (Twitter) Instagram
    Trending
    • Creating a qubit fit for a quantum future
    • Anthropic will start training its AI models on chat transcripts
    • CrowdStrike buys Onum in agentic SOC push
    • I asked Google Finance’s AI chatbot what stocks to buy – and its answer surprised me
    • Intel has received $5.7 billion under Trump’s investment deal
    • This Qi2 battery pack from Anker just made wireless charging essential for me
    • Bob Odenkirk’s ‘Nobody 2’ Gets Streaming Date, Report Says
    • Unravelling 5G Complexity: Engaging Students with TIMS-Powered Hands-on Education
    Facebook X (Twitter) Instagram Pinterest Vimeo
    TechurzTechurz
    • Home
    • AI
    • Apps
    • News
    • Guides
    • Opinion
    • Reviews
    • Security
    • Startups
    TechurzTechurz
    Home»AI»The initial reactions to OpenAI’s landmark open source gpt-oss models are highly varied and mixed
    AI

    The initial reactions to OpenAI’s landmark open source gpt-oss models are highly varied and mixed

    TechurzBy TechurzAugust 7, 2025No Comments8 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    The initial reactions to OpenAI’s landmark open source gpt-oss models are highly varied and mixed
    Share
    Facebook Twitter LinkedIn Pinterest Email

    Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now

    OpenAI’s long-awaited return to the “open” of its namesake occurred yesterday with the release of two new large language models (LLMs): gpt-oss-120B and gpt-oss-20B.

    But despite achieving technical benchmarks on par with OpenAI’s other powerful proprietary AI model offerings, the broader AI developer and user community’s initial response has so far been all over the map. If this release were a movie premiering and being graded on Rotten Tomatoes, we’d be looking at a near 50% split, based on my observations.

    First some background: OpenAI has released these two new text-only language models (no image generation or analysis) both under the permissive open source Apache 2.0 license — the first time since 2019 (before ChatGPT) that the company has done so with a cutting-edge language model.

    The entire ChatGPT era of the last 2.7 years has so far been powered by proprietary or closed-source models, ones that OpenAI controlled and that users had to pay to access (or use a free tier subject to limits), with limited customizability and no way to run them offline or on private computing hardware.

    AI Scaling Hits Its Limits

    Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are:

    • Turning energy into a strategic advantage
    • Architecting efficient inference for real throughput gains
    • Unlocking competitive ROI with sustainable AI systems

    Secure your spot to stay ahead: https://bit.ly/4mwGngO

    But that all changed thanks to the release of the pair of gpt-oss models yesterday, one larger and more powerful for use on a single Nvidia H100 GPU at say, a small or medium-sized enterprise’s data center or server farm, and an even smaller one that works on a single consumer laptop or desktop PC like the kind in your home office.

    Of course, the models being so new, it’s taken several hours for the AI power user community to independently run and test them out on their own individual benchmarks (measurements) and tasks.

    And now we’re getting a wave of feedback ranging from optimistic enthusiasm about the potential of these powerful, free, and efficient new models to an undercurrent of dissatisfaction and dismay with what some users see as significant problems and limitations, especially compared to the wave of similarly Apache 2.0-licensed powerful open source, multimodal LLMs from Chinese startups (which can also be taken, customized, run locally on U.S. hardware for free by U.S. companies, or companies anywhere else around the world).

    High benchmarks, but still behind Chinese open source leaders

    Intelligence benchmarks place the gpt-oss models ahead of most American open-source offerings. According to independent third-party AI benchmarking firm Artificial Analysis, gpt-oss-120B is “the most intelligent American open weights model,” though it still falls short of Chinese heavyweights like DeepSeek R1 and Qwen3 235B.

    “On reflection, that’s all they did. Mogged on benchmarks,” wrote self-proclaimed DeepSeek “stan” @teortaxesTex. “No good derivative models will be trained… No new usecases created… Barren claim to bragging rights.”

    That skepticism is echoed by pseudonymous open source AI researcher Teknium (@Teknium1), co-founder of rival open source AI model provider Nous Research, who called the release “a legitimate nothing burger,” on X, and predicted a Chinese model will soon eclipse it. “Overall very disappointed and I legitimately came open minded to this,” they wrote.

    Bench-maxxing on math and coding at the expense of writing?

    Other criticism focused on the gpt-oss models’ apparent narrow usefulness.

    AI influencer “Lisan al Gaib (@scaling01)” noted that the models excel at math and coding but “completely lack taste and common sense.” He added, “So it’s just a math model?”

    In creative writing tests, some users found the model injecting equations into poetic outputs. “This is what happens when you benchmarkmax,” Teknium remarked, sharing a screenshot where the model added an integral formula mid-poem.

    And @kalomaze, a researcher at decentralized AI model training company Prime Intellect, wrote that “gpt-oss-120b knows less about the world than what a good 32b does. probably wanted to avoid copyright issues so they likely pretrained on majority synth. pretty devastating stuff”

    Former Googler and independent AI developer Kyle Corbitt agreed that the gpt-oss pair of models seemed to have been trained primarily on synthetic data — that is, data generated by an AI model specifically for the purposes of training a new one — making it “extremely spiky.”

    It’s “great at the tasks it’s trained on, really bad at everything else,” Corbitt wrote, i.e., great on coding and math problems, and bad at more linguistic tasks like creative writing or report generation.

    In other words, the charge is that OpenAI deliberately trained the model on more synthetic data than real world facts and figures to avoid using copyrighted data scraped from websites and other repositories it doesn’t own or have license to use, which is something it and many other leading gen AI companies have been accused of in the past and are facing down ongoing lawsuits as a result of.

    Others speculated OpenAI may have trained the model on primarily synthetic data to avoid safety and security issues, resulting in worse quality than if it had been trained on more real world (and presumably copyrighted) data.

    Concerning third-party benchmark results

    Moreover, evaluating the models on third-party benchmarking tests have turned up concerning metrics in some users’ eyes.

    SpeechMap — which measures the performance of LLMs in complying with user prompts to generate disallowed, biased, or politically sensitive outputs — showed compliance scores for gpt-oss 120B hovering under 40%, near the bottom of peer open models, which indicates resistance to follow user requests and defaulting to guardrails, potentially at the expense of providing accurate information.

    In Aider’s Polyglot evaluation, gpt-oss-120B scored just 41.8% in multilingual reasoning—far below competitors like Kimi-K2 (59.1%) and DeepSeek-R1 (56.9%).

    Some users also said their tests indicated the model is oddly resistant to generating criticism of China or Russia, a contrast to its treatment of the US and EU, raising questions about bias and training data filtering.

    Other experts have applauded the release and what it signals for U.S. open source AI

    To be fair, not all the commentary is negative. Software engineer and close AI watcher Simon Willison called the release “really impressive” on X, elaborating in a blog post on the models’ efficiency and ability to achieve parity with OpenAI’s proprietary o3-mini and o4-mini models.

    He praised their strong performance on reasoning and STEM-heavy benchmarks, and hailed the new “Harmony” prompt template format — which offers developers more structured terms for guiding model responses — and support for third-party tool use as meaningful contributions.

    In a lengthy X post, Clem Delangue, CEO and co-founder of AI code sharing and open source community Hugging Face, encouraged users not to rush to judgment, pointing out that inference for these models is complex, and early issues could be due to infrastructure instability and insufficient optimization among hosting providers.

    “The power of open-source is that there’s no cheating,” Delangue wrote. “We’ll uncover all the strengths and limitations… progressively.”

    Even more cautious was Wharton School of Business at the University of Pennsylvania professor Ethan Mollick, who wrote on X that “The US now likely has the leading open weights models (or close to it)”, but questioned whether this is a one-off by OpenAI. “The lead will evaporate quickly as others catch up,” he noted, adding that it’s unclear what incentives OpenAI has to keep the models updated.

    Nathan Lambert, a leading AI researcher at the rival open source lab Allen Institute for AI (Ai2) and commentator, praised the symbolic significance of the release on his blog Interconnects, calling it “a phenomenal step for the open ecosystem, especially for the West and its allies, that the most known brand in the AI space has returned to openly releasing models.”

    But he cautioned on X that gpt-oss is “unlikely to meaningfully slow down [Chinese e-commerce giant Aliaba’s AI team] Qwen,” citing its usability, performance, and variety.

    He argued the release marks an important shift in the U.S. toward open models, but that OpenAI still has a “long path back” to catch up in practice.

    A split verdict

    The verdict, for now, is split.

    OpenAI’s gpt-oss models are a landmark in terms of licensing and accessibility.

    But while the benchmarks look solid, the real-world “vibes” — as many users describe it — are proving less compelling.

    Whether developers can build strong applications and derivatives on top of gpt-oss will determine whether the release is remembered as a breakthrough or a blip.

    Daily insights on business use cases with VB Daily

    If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

    Read our Privacy Policy

    Thanks for subscribing. Check out more VB newsletters here.

    An error occured.

    GPTOSS highly initial landmark mixed models Open OpenAIs reactions Source varied
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleWindows tips for reducing the ransomware threat
    Next Article Save Hundreds on the Dyson Purifier Cool TP07
    Techurz
    • Website

    Related Posts

    AI

    Creating a qubit fit for a quantum future

    August 28, 2025
    AI

    Anthropic will start training its AI models on chat transcripts

    August 28, 2025
    AI

    I asked Google Finance’s AI chatbot what stocks to buy – and its answer surprised me

    August 28, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Start Saving Now: An iPhone 17 Pro Price Hike Is Likely, Says New Report

    August 17, 20258 Views

    You Can Now Get Starlink for $15-Per-Month in New York, but There’s a Catch

    July 11, 20257 Views

    Non-US businesses want to cut back on using US cloud systems

    June 2, 20257 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Most Popular

    Start Saving Now: An iPhone 17 Pro Price Hike Is Likely, Says New Report

    August 17, 20258 Views

    You Can Now Get Starlink for $15-Per-Month in New York, but There’s a Catch

    July 11, 20257 Views

    Non-US businesses want to cut back on using US cloud systems

    June 2, 20257 Views
    Our Picks

    Creating a qubit fit for a quantum future

    August 28, 2025

    Anthropic will start training its AI models on chat transcripts

    August 28, 2025

    CrowdStrike buys Onum in agentic SOC push

    August 28, 2025

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms and Conditions
    • Disclaimer
    © 2025 techurz. Designed by Pro.

    Type above and press Enter to search. Press Esc to cancel.