Close Menu
TechurzTechurz
    What's Hot

    NEA’s Tiffany Luck on AI IPOs, personal agents, and the ROI reckoning

    June 17, 2026

    World model maker Odyssey nabs $1.45B valuation backed by Amazon and other big names

    June 17, 2026

    Pramaana Labs raises $27M seed round from Khosla Ventures to bring formal verification to AI

    June 17, 2026
    X (Twitter) Pinterest YouTube LinkedIn WhatsApp
    Tech Pulse
    • NEA’s Tiffany Luck on AI IPOs, personal agents, and the ROI reckoning
    • World model maker Odyssey nabs $1.45B valuation backed by Amazon and other big names
    • Pramaana Labs raises $27M seed round from Khosla Ventures to bring formal verification to AI
    • Collecting robot training data is dirty, unglamorous work. Some AI labs are already paying XDOF to do it.
    • DeepL acquires Mixhalo for live-event audio streaming and translation
    X (Twitter) Pinterest YouTube LinkedIn WhatsApp
    TechurzTechurz
    • Home
    • Tech Pulse
    • Future Tech
    • AI Systems
    • Cyber Reality
    • Disruption Lab
    • Signals
    TechurzTechurz
    Home - News - Nvidia launches fully open source transcription AI model Parakeet-TDT-0.6B-V2 on Hugging Face
    News

    Nvidia launches fully open source transcription AI model Parakeet-TDT-0.6B-V2 on Hugging Face

    TechurzBy TechurzMay 5, 2025No Comments5 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Nvidia launches fully open source transcription AI model Parakeet-TDT-0.6B-V2 on Hugging Face
    Share
    Facebook Twitter LinkedIn Pinterest Email

    Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

    Nvidia has become one of the most valuable companies in the world in recent years thanks to the stock market noticing how much demand there is for graphics processing units (GPUs), the powerful chips Nvidia makes that are used to render graphics in video games but also, increasingly, train AI large language and diffusion models.

    But Nvidia does far more than just make hardware, of course, and the software to run it. As the generative AI era wears on, the Santa Clara-based company has also been steadily releasing more and more of its own AI models — mostly open source and free for researchers and developers to take, download, modify and use commercially — and the latest among them is Parakeet-TDT-0.6B-v2, an automatic speech recognition (ASR) model that can, in the words of Hugging Face’s Vaibhav “VB” Srivastav, “transcribe 60 minutes of audio in 1 second [mind blown emoji].”

    This is the new generation of the Parakeet model Nvidia first unveiled back in January 2024 and updated again in April of that year, but this version two is so powerful, it currently tops the Hugging Face Open ASR Leaderboard with an average “Word Error Rate” (times the model incorrectly transcribes a spoken word) of just 6.05% (out of 100).

    To put that in perspective, it nears proprietary transcription models such as OpenAI’s GPT-4o-transcribe (with a WER of 2.46% in English) and ElevenLabs Scribe (3.3%).

    And it’s offering all this while remaining freely available under a commercially permissive Creative Commons CC-BY-4.0 license, making it an attractive proposition for commercial enterprises and indie developers looking to build speech recognition and transcription services into their paid applications.

    Table of contents
    1 Performance and benchmark standing
    2 Use cases and availability
    3 Access and deployment
    4 Training data and model development
    5 Evaluation and robustness
    6 Hardware compatibility and efficiency
    7 Ethical considerations and responsible use

    Performance and benchmark standing

    The model boasts 600 million parameters and leverages a combination of the FastConformer encoder and TDT decoder architectures.

    It is capable of transcribing an hour of audio in just one second, provided it’s running on Nvidia’s GPU-accelerated hardware.

    The performance benchmark is measured at an RTFx (Real-Time Factor) of 3386.02 with a batch size of 128, placing it at the top of current ASR benchmarks maintained by Hugging Face.

    Use cases and availability

    Released globally on May 1, 2025, Parakeet-TDT-0.6B-v2 is aimed at developers, researchers, and industry teams building applications such as transcription services, voice assistants, subtitle generators, and conversational AI platforms.

    The model supports punctuation, capitalization, and detailed word-level timestamping, offering a full transcription package for a wide range of speech-to-text needs.

    Access and deployment

    Developers can deploy the model using Nvidia’s NeMo toolkit. The setup process is compatible with Python and PyTorch, and the model can be used directly or fine-tuned for domain-specific tasks.

    The open-source license (CC-BY-4.0) also allows for commercial use, making it appealing to startups and enterprises alike.

    Training data and model development

    Parakeet-TDT-0.6B-v2 was trained on a diverse and large-scale corpus called the Granary dataset. This includes around 120,000 hours of English audio, composed of 10,000 hours of high-quality human-transcribed data and 110,000 hours of pseudo-labeled speech.

    Sources range from well-known datasets like LibriSpeech and Mozilla Common Voice to YouTube-Commons and Librilight.

    Nvidia plans to make the Granary dataset publicly available following its presentation at Interspeech 2025.

    Evaluation and robustness

    The model was evaluated across multiple English-language ASR benchmarks, including AMI, Earnings22, GigaSpeech, and SPGISpeech, and showed strong generalization performance. It remains robust under varied noise conditions and performs well even with telephony-style audio formats, with only modest degradation at lower signal-to-noise ratios.

    Hardware compatibility and efficiency

    Parakeet-TDT-0.6B-v2 is optimized for Nvidia GPU environments, supporting hardware such as the A100, H100, T4, and V100 boards.

    While high-end GPUs maximize performance, the model can still be loaded on systems with as little as 2GB of RAM, allowing for broader deployment scenarios.

    Ethical considerations and responsible use

    NVIDIA notes that the model was developed without the use of personal data and adheres to its responsible AI framework.

    Although no specific measures were taken to mitigate demographic bias, the model passed internal quality standards and includes detailed documentation on its training process, dataset provenance, and privacy compliance.

    The release drew attention from the machine learning and open-source communities, especially after being publicly highlighted on social media. Commentators noted the model’s ability to outperform commercial ASR alternatives while remaining fully open source and commercially usable.

    Developers interested in trying the model can access it via Hugging Face or through Nvidia’s NeMo toolkit. Installation instructions, demo scripts, and integration guidance are readily available to facilitate experimentation and deployment.

    Daily insights on business use cases with VB Daily

    If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

    Read our Privacy Policy

    Thanks for subscribing. Check out more VB newsletters here.

    An error occured.

    Face fully Hugging launches model Nvidia Open ParakeetTDT0.6BV2 Source transcription
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleSamsung Galaxy Z Fold7 and Z Flip7 certification reveals their battery capacities
    Next Article Apple iPhone 16E vs. iPhone 15: Which Lower-Cost iPhone Is Best for You?
    Techurz
    • Website

    Related Posts

    Opinion

    World model maker Odyssey nabs $1.45B valuation backed by Amazon and other big names

    June 17, 2026
    Opinion

    Bluesky launches group chats, as company shifts focus to community features

    June 11, 2026
    Opinion

    Zest launches a restaurant discovery app powered by where people actually eat

    June 10, 2026
    Add A Comment
    Latest Tech Pulse

    College social app Fizz expands into grocery delivery

    September 3, 20252,289

    SolarSquare in talks to raise up to $60M as India’s rooftop solar market draws major VC interest

    May 23, 202622

    Future of Digital Privacy and Security: 7 Truths Nobody Tells You

    May 25, 202619
    Stay In Touch
    • YouTube
    • WhatsApp
    • Twitter
    • Pinterest
    • LinkedIn

    Techurz helps readers stay ahead of digital change with clear, practical, future focused technology intelligence written today,searched tomorrow.

    X (Twitter) Pinterest YouTube LinkedIn WhatsApp
    Company
    • About Us
    • Contact Us
    • Our Authors / Editorial Team
    • Write For Us
    • Advertise
    Policy
    • Editorial Policy
    • Privacy Policy
    • Terms and Conditions
    • Affiliate Disclosure
    • Cookie Policy
    • Disclaimer
    • DMCA
    Explore
    • AI Systems
    • Cyber Reality
    • Future Tech
    • Disruption Lab
    • Signals
    • Tech Pulse
    • Sitemap

    Join the Techurz Brief

    The future does not arrive suddenly.
    Stay ahead with fast, sharp tech signals.

    Type above and press Enter to search. Press Esc to cancel.