ChatGPT 4.1 early benchmarks in contrast in opposition to Google Gemini

April 16, 2025

74

ChatGPT 4.1 is now rolling out, and it is a important leap from GPT 4o, however it fails to beat the benchmark set by Google Gemini.

Yesterday, OpenAI confirmed that builders with API entry can strive as many as three new fashions: GPT‑4.1, GPT‑4.1 mini, and GPT‑4.1 nano.

Based on the benchmarks, these fashions are much better than the present GPT‑4o and GPT‑4o mini, significantly in coding.

For instance, GPT‑4.1 scores 54.6% on SWE-bench Verified, which is healthier than GPT-4o by 21.4% and 26.6% over GPT‑4.5. We’ve related outcomes on different benchmarking instruments shared by OpenAI, however how does it compete in opposition to Gemini fashions.

ChatGPT 4.1 early benchmarks

Based on benchmarks shared by Stagehand, which is a production-ready browser automation framework, Gemini 2.0 Flash has the bottom error price (6.67%) together with the best precise‑match rating (90%), and it’s additionally low-cost and quick.

However, GPT‑4.1 has the next error price (16.67%) and prices over 10 instances greater than Gemini 2.0 Flash.

Different GPT variants (like “nano” or “mini”) are cheaper or quicker however not as correct as GPT-4.1

In one other knowledge shared by Pierre Bongrand, who’s a scientist engaged on RNA at Harward, GPT‑4.1 presents poorer cost-effectiveness than competing fashions.

This is a vital issue as a result of GPT4.1 is cheaper than ChatGPT 4o.

Fashions like Gemini 2.0 Flash, Gemini 2.5 Professional, and even DeepSeek or o3 mini lie nearer to or on the frontier, which suggests they ship increased efficiency at a decrease or comparable price.

Finally, whereas GPT‑4.1 nonetheless works as an choice, it is clearly overshadowed by cheaper or extra succesful alternate options.

Coding benchmarks present GPT-4.1 lags behind Gemini 2.5

GPT 4.1

We’re seeing related ends in coding benchmarks, with Aider Polyglot itemizing GPT-4.1 with a 52% rating, whereas Gemini 2.5 is miles forward at 73%.

Gemini 2.5

It is usually vital to notice that GPT-4.1 is a non-reasoning mannequin, and it is nonetheless top-of-the-line fashions for coding.

GPT-4.1 is obtainable by way of API, however you should utilize it without cost when you join Windsurf AI.

Previous articleSkySafe Welcomes Eileen Treanor as CFO to Strengthen Monetary Management and Scale Operations – sUAS Information

Next articleJRuby 10 brings sooner startup occasions

ChatGPT 4.1 early benchmarks in contrast in opposition to Google Gemini

ChatGPT 4.1 early benchmarks

Coding benchmarks present GPT-4.1 lags behind Gemini 2.5

Regulatory Gaps & Legacy Programs Gasoline AI

OpenAI is testing a brand new GPT-5-based AI agent “GPT-Alpha”

Tech Overtakes Gaming as High DDoS Assault Goal, New Gcore Radar Report Finds

LEAVE A REPLY Cancel reply

Most Popular

AI Agent Tutorial Half 2

How machine imaginative and prescient is enhancing automation security and effectivity

Google Advertisements now flags low-quality photos

Validation Technician At UST In Bengaluru

Recent Comments

ABOUT US

POPULAR POSTS

AI Agent Tutorial Half 2

How machine imaginative and prescient is enhancing automation security and effectivity

Google Advertisements now flags low-quality photos

POPULAR CATEGORY