Close Menu
TechurzTechurz

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Volunteer at Disrupt 2025 while you still can

    September 1, 2025

    Here’s how we picked this year’s Innovators Under 35

    September 1, 2025

    Building Tech With No Experience Taught Me This Key Skill

    September 1, 2025
    Facebook X (Twitter) Instagram
    Trending
    • Volunteer at Disrupt 2025 while you still can
    • Here’s how we picked this year’s Innovators Under 35
    • Building Tech With No Experience Taught Me This Key Skill
    • I’ve tried 3 different smart rings but I keep going back to Apple Watch – here’s why
    • You can buy an iPhone 16 Pro for $250 off on Amazon right now – how the deal works
    • ‘Cyberpunk 2077’ Is Teasing Something For Three Days From Now
    • WhatsApp 0-Day, Docker Bug, Salesforce Breach, Fake CAPTCHAs, Spyware App & More
    • 5 days left: Exhibit tables are disappearing for Disrupt 2025
    Facebook X (Twitter) Instagram Pinterest Vimeo
    TechurzTechurz
    • Home
    • AI
    • Apps
    • News
    • Guides
    • Opinion
    • Reviews
    • Security
    • Startups
    TechurzTechurz
    Home»News»The best AI for coding in 2025 (including a new winner – and what not to use)
    News

    The best AI for coding in 2025 (including a new winner – and what not to use)

    TechurzBy TechurzJune 9, 2025No Comments14 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    The best AI for coding in 2025 (including a new winner - and what not to use)
    Share
    Facebook Twitter LinkedIn Pinterest Email


    photo_Pawel/Getty Images

    I’ve been around technology long enough that very little excites me, and even less surprises me. But shortly after OpenAI’s ChatGPT was released, I asked it to write a WordPress plugin for my wife’s e-commerce site. When it did, and the plugin worked, I was indeed surprised.

    That was the beginning of my deep exploration into chatbots and AI-assisted programming. Since then, I’ve subjected 14 large language models (LLMs) to four real-world tests.

    Also: Only 8% of Americans would pay extra for AI, according to ZDNET-Aberdeen research

    Unfortunately, not all chatbots can code alike. It’s been a little over two years since that first test, and even now, four of the 13 LLMs I tested can’t create working plugins.

    The short version

    In this article, I’ll show you how each LLM performed against my tests. There are now five chatbots I recommend you use. 

    Two of them, ChatGPT Plus and Perplexity Pro, cost $20 per month each. The free versions of the same chatbots do well enough that you could probably get by without paying. Two other recommended products are from Google and Microsoft. Google’s Gemini Pro 2.5 is free, but you’re limited to so few queries that you really can’t use it without paying. 

    Also: I tested 10 AI content detectors – and these 5 correctly identified AI text every time

    Microsoft has several Copilot licenses, which can get pricey, but I used the free version with surprisingly good results. The final one, Claude 4 Sonnet, is the free version of Claude. Oddly enough, the free version beat the paid-for version, so we’re not recommending Claude 4 Opus.

    But the rest, whether free or paid, are not so great. I won’t risk my programming projects with them or recommend that you do, until their performance improves.

    I’ve written lots about using AIs to help with programming. Unless it’s a small, simple project like my wife’s plugin, AIs can’t write entire apps or programs. But they excel at writing a few lines and are not bad at fixing code.

    Rather than repeat everything I’ve written, go ahead and read this article: How to use ChatGPT to write code.

    If you want to understand my coding tests, why I’ve chosen them, and why they’re relevant to this review of the 13 LLMs, read this article: How I test an AI chatbot’s coding ability.

    The AI coding leaderboard

    Let’s start with a comparative look at how the chatbots performed, as of this installment of our best-of roundup:

    David Gewirtz/ZDNET

    Next, let’s look at each chatbot individually. I’m back up to discussing 14 chatbots, because we’re splitting out Claude 4 Sonnet and Claude 4 Opus as separate tests. GPT-4 is no longer included since OpenAI has sunsetted that LLM. Ready? Let’s go.

    Pros

    • Passed all tests
    • Solid coding results
    • Mac app

    Cons

    • Hallucinations
    • No Windows app yet
    • Sometimes uncooperative
    • Price: $20/mo
    • LLM: GPT-4o, GPT-3.5
    • Desktop browser interface: Yes
    • Dedicated Mac app: Yes
    • Dedicated Windows app: No
    • Multi-factor authentication: Yes
    • Tests passed: 4 of 4

    ChatGPT Plus with GPT-4o passed all my tests. One of my favorite features is the availability of a dedicated app. When I test web programming, I have my browser set on one thing, my IDE open, and the ChatGPT Mac app running on a separate screen.

    Also: I put GPT-4o through my coding tests and it aced them – except for one weird result

    In addition, Logitech’s Prompt Builder, which can be activated with a mouse button, can be set up to utilize the upgraded GPT-4o and connect to your OpenAI account, allowing for a simple thumb tap to run a prompt, which is very convenient.

    The only thing I didn’t like was that one of my GPT-4o tests resulted in a dual-choice answer, and one of those answers was wrong. I’d rather it just gave me the correct answer. Even so, a quick test confirmed which answer would work. However, that issue was a bit annoying. 

    Pros

    • Multiple LLMs
    • Search criteria displayed
    • Good sourcing

    Cons

    • Email-only login
    • No desktop app
    • Price: $20/mo
    • LLM: GPT-4o, Claude 3.5 Sonnet, Sonar Large, Claude 3 Opus, Llama 3.1 405B
    • Desktop browser interface: Yes
    • Dedicated Mac app: No
    • Dedicated Windows app: No
    • Multi-factor authentication: No
    • Tests passed: 4 of 4

    I seriously considered listing Perplexity Pro as the best overall AI chatbot for coding, but one failing kept it out of the top slot: how you log in. Perplexity doesn’t use a username/password or passkey and doesn’t have multi-factor authentication. All the tool does is email you a login PIN. The AI doesn’t have a separate desktop app, as ChatGPT does for Macs.

    What sets Perplexity apart from other tools is that it can run multiple LLMs. While you can’t set an LLM for a given session, you can easily go into the settings and choose the active model.

    Also: Can Perplexity Pro help you code? It aced my programming tests – thanks to GPT-4

    For programming, you’ll probably want to stick to GPT-4o, because that model aced all our tests. But it might be interesting to cross-check your code across the different LLMs. For example, if you have GPT-4o write some regular expression code, you might consider switching to a different LLM to see what that model thinks of the generated code.

    As we’ll see below, most LLMs are unreliable, so don’t take the results as gospel. However, you can use the results to check your original code. It’s sort of like an AI-driven code review.

    Just don’t forget to switch back to GPT-4o.

    • Price: Free for limited use, then token-based pricing
    • LLM: Gemini Pro 2.5
    • Desktop browser interface: Yes
    • Dedicated Mac app: No
    • Dedicated Windows app: No
    • Multi-factor authentication: Yes
    • Tests passed: 4 of 4

    The last time I looked at Gemini, it failed miserably. Not quite as bad as Copilot at the time, but bad. Gemini Pro 2.5, however, has performed quite admirably. My only real issue with it is access. I found myself cut off from the free version after only running two of the four tests.

    Also: Gemini Pro 2.5 is a stunningly capable coding assistant – and a big threat to ChatGPT

    I waited a day and then ran the third test, and got cut off again. Finally, on the third day, I ran my fourth test. Obviously, you can’t do any real programming if you can only ask one or two questions before being shut down. So, if you sign up with Gemini Pro 2.5, be aware that Google charges by tokens (basically, the amount of AI you use). That can make it quite difficult to predict your expenses.

    Show more

    • Price: Free for basic Copilot, or fees for other Copilot licenses
    • LLM: Undisclosed
    • Desktop browser interface: Yes
    • Dedicated Mac app: No
    • Dedicated Windows app: No
    • Multi-factor authentication: Yes
    • Tests passed: 4 of 4

    In all my previous analyses of Microsoft Copilot, the results were the worst of the LLMs. Copilot got nothing right. It was astonishing how bad it was. But I said then that, “The one positive thing is that Microsoft always learns from its mistakes. So, I’ll check back later and see if this result improves.”

    Also: I retested Microsoft Copilot’s AI coding skills in 2025 and now it’s got serious game

    And boy, did it ever. This time out, Microsoft passed all four of my tests. Even better, it did this with the free version of Copilot. Yes, Microsoft has many paid programs for Copilot, but if you want to give it the AI spin, point yourself to Copilot and use it.

    Show more

    • Price: Free
    • LLM: Claude 4
    • Desktop browser interface: No
    • Dedicated Mac app: No
    • Dedicated Windows app: No
    • Multi-factor authentication: Yes
    • Tests passed: 4 of 4

    This is one of those times when AI implementations can be real head-scratchers. In our previous tests, Claude 4 Sonnet finished at the bottom of the barrel, failing all four of our tests. This time, however, Sonnet passed every test. So, what’s the head-scratcher? Opus, the Claude 4 model, which is a fee-paid version, did not do as well: it failed half the tests.

    Also: Anthropic’s free Claude 4 Sonnet aced my coding tests – but its paid Opus model somehow didn’t

    So, yes. The free version worked like a champ. And the one you’re paying anywhere from $20 to $250 a month for, depending on the plan? Well, that one failed half of the tests. Go figure.

    Show more
    Pros

    • Different LLM than ChatGPT
    • Good descriptions
    • Free access

    Cons

    • Only available in browser mode
    • Free access likely only temporary
    • Price: Free (for now)
    • LLM: Grok-1
    • Desktop browser interface: Yes
    • Dedicated Mac app: No
    • Dedicated Windows app: No
    • Multi-factor authentication: Yes
    • Tests passed: 3 of 4

    I have to say, Grok surprised me. I guess I didn’t have high hopes for an LLM that appeared tacked on to the social network formerly known as Twitter. However, X is now owned by Elon Musk, and two of Musk’s companies, Tesla and SpaceX, have towering AI capabilities.

    It’s unclear how much Tesla and SpaceX AI DNA is in Grok, but we can assume there will likely be more work. As of now, Grok is the only LLM not based on OpenAI LLMs that made it into the recommended list.

    Also: X’s Grok did surprisingly well in my AI coding tests

    Grok did make one mistake, but it was a relatively minor one that a slightly more comprehensive prompt could easily remedy. Yes, it failed the test. But by passing the others and even doing an almost perfect job on the one it passed, Grok earned itself a spot as a contender.

    Stay tuned. This is an AI to watch.

    Cons

    • Prompt throttling
    • Could cut you off in the middle of whatever you’re working on
    • Price: Free
    • LLM: GPT-4o, GPT-3.5
    • Desktop browser interface: Yes
    • Dedicated Mac app: Yes
    • Dedicated Windows app: No
    • Multi-factor authentication: Yes
    • Tests passed: 3 of 4 in GPT-3.5 mode

    ChatGPT is available to anyone for free. While both the Plus and free versions support GPT-4o, which passed all my programming tests, the free app has limitations.

    OpenAI treats free ChatGPT users as if they’re in the cheap seats. If traffic is high or the servers are busy, the free version of ChatGPT will only make GPT-3.5 available to free users. The tool will only allow you a certain number of queries before it downgrades or shuts you off.

    Also: How to use ChatGPT to write code – and my favorite trick to debug what it generates

    I’ve had several occasions when the free version of ChatGPT effectively told me I’d asked too many questions.

    ChatGPT is a great tool, as long as you don’t mind it shutting down. Even GPT-3.5 did better on the tests than all the other chatbots, and the test it failed was for a fairly obscure programming tool produced by a lone programmer in Australia.

    So, if budget is important to you and you can wait when you’re cut off, then use ChatGPT for free.

    Pros

    • Free
    • Passed most tests
    • Range of research tools

    Cons

    • Limited to GPT-3.5
    • Throttles prompt results
    • Price: Free
    • LLM: GPT-3.5
    • Desktop browser interface: Yes
    • Dedicated Mac app: No
    • Dedicated Windows app: No
    • Multi-factor authentication: No
    • Tests passed: 3 of 4

    I’m threading a pretty fine needle here, but because Perplexity AI’s free version is based on GPT-3.5, the test results were measurably better than the other AI chatbots.

    Also: 5 reasons why I prefer Perplexity over every other AI chatbot

    From a programming perspective, that’s pretty much the whole story. However, from a research and organization perspective, my ZDNET colleague Steven Vaughan-Nichols prefers Perplexity over the other AIs.

    He likes how Perplexity provides more complete sources for research questions, cites its sources, organizes the replies, and offers questions for further searches.

    So, if you’re programming, but also working on other research, consider the free version of Perplexity.

    Pros

    • Free
    • Open source
    • Efficient resource utilization

    Cons

    • Weak general knowledge
    • Small ecosystem
    • Limited integrations
    • Price: Free for chatbot, fees for API
    • LLM: DeepSeek MoE
    • Desktop browser interface: Yes
    • Dedicated Mac app: No
    • Dedicated Windows app: No
    • Multi-factor authentication: No
    • Tests passed: 3 of 4

    While DeepSeek R1 is the new reasoning hotness from China that has all the pundits punditing, the real power right now (at least according to our tests) is DeepSeek V3. This chatbot passed almost all of our coding tests, doing as well as the (now mostly discontinued) ChatGPT 3.5.

    Also: I tested DeepSeek’s R1 and V3 coding skills – and we’re not all doomed (yet)

    Where DeepSeek V3 fell was in its knowledge of somewhat more obscure programming environments. Still, it beat Google’s Gemini, Microsoft’s Copilot, and Meta’s Meta AI, which is quite an accomplishment. We’ll be keeping a close watch on each DeepSeek model, so stay tuned.

    Chatbots to avoid for programming help

    I tested 13 LLMs, and nine passed most of my tests this time around. The other chatbots, including a few pitched as great for programming, only passed one of my tests.

    Also: The five biggest mistakes people make when prompting an AI

    I’m mentioning them here because people will ask, and I did test them thoroughly. Some of these bots are fine for other work, so I’ll point you to their general reviews if you’re curious about their functionality.

    DeepSeek R1

    David Gewirtz/ZDNET

    Unlike DeepSeek V3, the advanced reasoning version, DeepSeek R1, did not showcase its reasoning capabilities in our programming tests. Unusually, the new failure area was one that’s not all that hard, even for a basic AI — the regular expression code for our string function test.  

    Also: I tested DeepSeek’s R1 and V3 coding skills – and we’re not all doomed (yet)

    But that’s why we are running these real-world tests. It’s never clear where an AI will hallucinate or just plain fail, and before you go believing all the hype about DeepSeek R1 taking the crown away from ChatGPT, run some programming tests. So far, while I’m impressed with the much-reduced resource utilization and the open-source nature of the product, its coding quality output is inconsistent.

    GitHub Copilot

    David Gewirtz/ZDNET

    GitHub’s Copilot integrates quite seamlessly with VS Code. The AI makes asking for coding help quick and productive, especially when working in context. That’s why it’s so disappointing that the code the AI outputs is often very wrong.

    Also: I put GitHub Copilot’s AI to the test – and it just might be terrible at writing code

    I can’t, in good conscience, recommend you use the GitHub Copilot extensions for VS Code. I’m concerned that the temptation will be too great to insert blocks of code without sufficient testing — and that GitHub Copilot’s produced code is not ready for production use. Try again next year.

    Claude 4 Opus

    David Gewirtz/ZDNET

    In a completely baffling turn of events, the paid-for version of the Claude 4 model, Opus, failed half of my tests. What makes this result baffling is that the free version, Claude 4 Sonnet, passed them all. I don’t know what to say apart from AI can be weird.

    Also: Anthropic’s free Claude 4 Sonnet aced my coding tests – but its paid Opus model somehow didn’t

    Meta AI

    David Gewirtz/ZDNET

    Meta AI is Facebook’s general-purpose AI. As you can see above, it failed three of our four tests. 

    Also: 15 ways AI saved me time at work in 2024 – and how I plan to use it in 2025

    The AI generated a nice user interface, but with zero functionality. It also found my annoying bug, which is a fairly serious challenge. Given the specific knowledge required to find the bug, I was surprised that the AI choked on a simple regular expression challenge. But it did.

    Meta Code Llama

    David Gewirtz/ZDNET

    Meta Code Llama is Facebook’s AI explicitly designed for coding help. It’s something you can download and install on your server. I tested the AI running on a Hugging Face AI instance.

    Also: Can Meta AI code? I tested it against Llama, Gemini, and ChatGPT – it wasn’t even close

    Weirdly, even though both Meta AI and Meta Code Llama choked on three of four of my tests, they choked on different problems. AIs can’t be counted on to give the same answer twice, but this result was a surprise. We’ll see if that changes over time.

    But I like [insert name here]. Does this mean I have to use a different chatbot?

    Probably not. I’ve limited my tests to day-to-day programming tasks. None of the bots has been asked to talk like a pirate, write prose, or draw a picture. In the same way we use different productivity tools to accomplish specific tasks, feel free to choose the AI that helps you complete the task at hand.

    The only issue is if you’re on a budget and are paying for a pro version. Then, find the AI that does most of what you want, so you don’t have to pay for too many AI add-ons.

    It’s only a matter of time

    The results of my tests were pretty surprising, especially given the significant improvements by Microsoft and Google. However, this area of innovation is improving at warp speed, so we’ll be back with updated tests and results over time. Stay tuned.

    Have you used any of these AI chatbots for programming? What has your experience been? Let us know in the comments below.

    You can follow my day-to-day project updates on social media. Be sure to subscribe to my weekly update newsletter, and follow me on Twitter/X at @DavidGewirtz, on Facebook at Facebook.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, and on YouTube at YouTube.com/DavidGewirtzTV.

    coding including winner
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleCall of Duty: Black Ops 7 Dives Back Into the Shadows With Mason and Menendez
    Next Article Why Can’t Every Console Game Have Mod Support?
    Techurz
    • Website

    Related Posts

    AI

    Google Pixel 10 Pro Fold vs. Samsung Galaxy Z Fold 7: Here’s the clear winner after testing both

    September 1, 2025
    AI

    My favorite affordable phone cases are BOGO free (including for the new Google Pixel 10 series)

    August 30, 2025
    Security

    Google Pixel 10 Pro vs. iPhone 16 Pro: I’ve used both handsets, and there’s a clear winner

    August 24, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Start Saving Now: An iPhone 17 Pro Price Hike Is Likely, Says New Report

    August 17, 20258 Views

    You Can Now Get Starlink for $15-Per-Month in New York, but There’s a Catch

    July 11, 20257 Views

    Non-US businesses want to cut back on using US cloud systems

    June 2, 20257 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Most Popular

    Start Saving Now: An iPhone 17 Pro Price Hike Is Likely, Says New Report

    August 17, 20258 Views

    You Can Now Get Starlink for $15-Per-Month in New York, but There’s a Catch

    July 11, 20257 Views

    Non-US businesses want to cut back on using US cloud systems

    June 2, 20257 Views
    Our Picks

    Volunteer at Disrupt 2025 while you still can

    September 1, 2025

    Here’s how we picked this year’s Innovators Under 35

    September 1, 2025

    Building Tech With No Experience Taught Me This Key Skill

    September 1, 2025

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms and Conditions
    • Disclaimer
    © 2025 techurz. Designed by Pro.

    Type above and press Enter to search. Press Esc to cancel.