The Methodology
We designed 50 benchmark tasks across five categories: coding, mathematical reasoning, creative writing, multimodal understanding, and conversational ability. Each task was run three times on both models, with results averaged and scored by a panel of domain experts.
Coding: GPT-5 Takes the Lead
In our coding benchmarks — spanning Python, TypeScript, Rust, and SQL — GPT-5 scored 94.2% vs Gemini Ultra 2.0's 91.8%. GPT-5 particularly excelled at multi-file refactoring tasks and understanding complex codebases.
Reasoning: A Virtual Tie
Both models scored within 1% of each other on mathematical and logical reasoning tasks. GPT-5: 89.5%, Gemini Ultra: 90.1%. The difference isn't statistically significant across our sample size.
Creative Writing: Gemini Ultra Surprises
In a blind evaluation by three professional writers, Gemini Ultra's creative outputs were preferred 58% of the time. Its prose had more variety in sentence structure and more natural dialogue.
Final Scores
- Coding: GPT-5 (94.2) vs Gemini (91.8)
- Reasoning: GPT-5 (89.5) vs Gemini (90.1)
- Creative: GPT-5 (82.3) vs Gemini (86.7)
- Multimodal: GPT-5 (91.0) vs Gemini (93.4)
- Conversation: GPT-5 (88.1) vs Gemini (87.5)
The Verdict
There's no clear winner. GPT-5 is better for coding and general conversation, while Gemini Ultra leads in creative tasks and multimodal understanding. The real winner? Users who have access to both.
Abhi
Tech writer and developer. I cover gadgets, AI tools, and open-source projects that make a difference. Follow me on Twitter for hot takes.
Discussion
Comments powered by GitHub Discussions — coming soon.