Has Gemini surpassed ChatGPT? We put the AI models to the test. \ stacker news

Did Apple make the right choice in partnering with Google for Siri’s AI features?

The last time we did comparative tests of AI models from OpenAI and Google at Ars was in late 2023, when Google’s offering was still called Bard. In the roughly two years since, a lot has happened in the world of artificial intelligence. And now that Apple has made the consequential decision to partner with Google Gemini to power the next generation of its Siri voice assistant, we thought it was high time to do some new tests to see where the models from these AI giants stand today.

For this test, we’re comparing the default models that both OpenAI and Google present to users who don’t pay for a regular subscription—ChatGPT 5.2 for OpenAI and Gemini 3.2 Fast for Google. While other models might be more powerful, we felt this test best recreates the AI experience as it would work for the vast majority of Siri users, who don’t pay to subscribe to either company’s services.

As in the past, we’ll feed the same prompts to both models and evaluate the results using a combination of objective evaluation and subjective feel. Rather than re-using the relatively simple prompts we ran back in 2023, though, we’ll be running these models on an updated set of more complex prompts that we first used when pitting GPT-5 against GPT-4o last summer.

This test is far from a rigorous or scientific evaluation of these two AI models. Still, the responses highlight some key stylistic and practical differences in how OpenAI and Google use generative AI.
Dad jokesDad jokes
A mathematical word problemA mathematical word problem
Creative writingCreative writing
Public figuresPublic figures
Difficult emailsDifficult emails
Medical adviceMedical advice
Video game guidanceVideo game guidance
Land a planeLand a plane
Final verdictFinal verdict
...read more at arstechnica.com

45 sats \ 0 replies \ @optimism 21 Jan

I agree with their verdict based on my interacting with LMArena.

Gemini beats both chatgpt and claude on every text prompt I've done ever since Gemini 3. The regressions on chatgpt are stacking up while Claude is from my perspective now specialized into tool call and coding.

I don't ask a bot about jokes though, so I'm probably not an average user. Haha