Elon Musk launched Grok 4 final night time, calling it the ‘smartest AI on the planet’ — what companies must know

July 13, 2025

142

Need smarter insights in your inbox? Join our weekly newsletters to get solely what issues to enterprise AI, knowledge, and safety leaders. Subscribe Now

After days of controversy surrounding a flurry of antisemitic responses made just lately by his Grok AI-powered chatbot on his social community X (previously Twitter), a seemingly unrepentant and unbothered Elon Musk launched the newest model of his AI mannequin household, Grok 4, throughout an occasion livestreamed on X final night time, calling it the “the neatest AI on the planet.”

As Musk posted on X: “Grok 4 is the primary time, in my expertise, that an AI has been in a position to clear up troublesome, real-world engineering questions the place the solutions can’t be discovered wherever on the Web or in books. And it’ll get a lot better.”

The brand new launch truly consists of two distinct fashions: Grok 4, a single-agent reasoning mannequin, and Grok 4 Heavy, a multi-agent system designed to resolve complicated issues by way of inner collaboration and synthesis.

Each fashions are optimized for reasoning duties and include native software integration, enabling capabilities corresponding to net search, code execution, and multimodal evaluation.

Musk and his crew at xAI showcased benchmarks that counsel Grok 4 outperforms all present rivals throughout a spread of educational and coding evaluations, even when in comparison with previously main AI reasoning mannequin rivals, corresponding to OpenAI o3 and Google Gemini.

Nevertheless, xAI has not but launched a mannequin card or any official launch notes documentation for Grok 4 to the general public, making it difficult to independently assess its efficiency and the claims made through the stream. We’ll replace if/when these grow to be obtainable.

Nor did Musk and his xAI crew members collaborating within the livestream deal with the obvious controversy dealing with Grok over the previous week, together with many incidents of Grok making antisemitic remarks or referring to itself as “MechaHitler“, and suggesting that folks with Jewish surnames ought to be dealt with decisively by Adolf Hitler — a seemingly overt reference to the Holocaust and genocide of 6 million Jews throughout World Battle 2.

The closest Musk got here was when he acknowledged: “The factor that I believe is most vital for AI security—at the least my organic neural internet tells me crucial factor—is to be maximally truth-seeking,” and “We have to ensure that the AI is an effective AI. Good Grok” in addition to “It’s vital to instill the values you need in a baby that may develop as much as be extremely highly effective.”

Nevertheless, Musk didn’t apologize, nor did he settle for accountability for Grok’s antisemitic, sexually offensive and conspiratorial remarks. Right here’s a replica of the complete stream:

All through the livestream, the crew emphasised Grok 4’s skill to cause from first ideas, appropriate its personal errors and probably invent new applied sciences or uncover novel scientific insights.

The presentation additionally included demonstrations of Grok 4 Heavy, which applies multi-agent collaboration to sort out research-level issues throughout disciplines.

Availability and pricing

Grok 4 is obtainable now by way of a number of channels, relying on consumer sort and subscription degree:

API Entry (for builders and enterprises):
Grok 4 and Grok 4 Heavy are stay through the xAI API. Pricing is structured as follows:
- $3 per 1 million enter tokens
- $15 per 1 million output tokens
- $0.75 per 1 million cached enter tokens
- Costs double after 128,000 tokens in a single context window
  The API helps textual content and picture inputs, perform calling, structured outputs, and presents a 256,000-token context window.
Shopper Entry (through Grok chatbot and apps):
Particular person customers can entry Grok 4 by way of the Grok chatbot on X, the Grok app (iOS and Android), and X.com, however solely with one of many following subscriptions:
- PremiumPlus: $16/month
- SuperGrok: $300/month
- A brand new “SuperGrok Heavy” tier, additionally priced at $300/month, gives entry to each Grok 4 and Grok 4 Heavy, the multi-agent variant.
  (Be aware: SuperGrok and PremiumPlus tiers could differ in availability and utilization quotas throughout X and Grok platforms.)
Launch Timing:
Grok 4 turned obtainable instantly following the July 9, 2025, livestream. Short-term entry limits had been in place through the demo, however full rollout to subscribers started shortly after.
Platform Enlargement:
xAI has indicated plans to make Grok 4 obtainable by way of Microsoft Azure AI Foundry, the place Grok 3 and Grok 3 Mini are at the moment listed.

For subscription particulars, customers are directed to x.ai/grok and X Premium help. Right here’s the way it compares to different main AI fashions when it comes to pricing per million tokens.

Supplier & mannequin	Context window	Enter ($/Mtok)	Cached enter	Output ($/Mtok)	Extra notes
xAI – Grok 4 / 4 Heavy	256 Okay (2× value >128 Okay)	$3.00	$0.75	$15.00	Picture enter, perform calling, structured JSON (apidog)
OpenAI – o3	200 Okay	$2.00	$0.50	$8.00	50 % Batch-API low cost obtainable (OpenAI, OpenAI Assist Middle)
GPT-4o	128 Okay	$5.00	$2.50	$20.00	Imaginative and prescient, audio, instruments (OpenAI)
Anthropic – Claude Sonnet 4	200 Okay	$3.00	$0.30	$15.00	50 % batch output low cost (Anthropic)
Claude Opus 4	200 Okay	$15.00	$1.50	$75.00	Excessive-accuracy flagship (Anthropic)
Google – Gemini 2.5 Professional	200 Okay (2× value >200 Okay)	$1.25	$0.31	$10.00	75 % cache hit low cost (Google AI for Builders, Google Cloud)
Gemini 2.5 Flash	200 Okay	$0.30	$0.075	$2.50	Quick, low-cost preview tier (Google Cloud)
DeepSeek – deepseek-reasoner	64 Okay	$0.55 (miss) / $0.14 (hit)	$0.14	$2.19	50-75 % off-peak low cost (DeepSeek API Docs)

In contrast to its predecessor Grok 3, launched in February, which separated tool-augmented responses from basic reasoning, Grok 4 was skilled with instruments from the beginning.

The mannequin integrates capabilities corresponding to code execution, net search and doc parsing. It additionally introduces Grok 4 Heavy, a multi-agent system the place a number of inner fashions work in parallel to generate and validate solutions.

Grok 4 additionally features a new voice mode that includes expressive outputs with decreased latency, in addition to help for textual content and picture enter, structured outputs and performance calling.

Efficiency highlights

The unbiased AI mannequin evaluation and benchmarking group Synthetic Evaluation acknowledged on X that xAI offered it with a model of Grok 4 (not Heavy) sooner than the general public launch for scoring.

On technical benchmarks, Grok 4 leads the Synthetic Evaluation Intelligence Index with a rating of 73, forward of rivals corresponding to OpenAI’s o3 (70) and Google’s Gemini 2.5 Professional (70).

It additionally recorded prime scores in:

GPQA Diamond: 88%
ARC-AGI 2: 15.9%, double the second-best rating
Humanities Final Examination: 24% on the text-only model, and 44% with instruments
MMLU-Professional and AIME 2024: 87% and 94%, respectively
Coding and Math evaluations: Highest to this point on LiveCodeBench, SciCode, AIME24, and MATH-500

Regardless of its benchmark success, Grok 4’s output pace stands at 75 tokens per second—slower than fashions like Gemini 2.5 Flash (353) or OpenAI’s o3 (187), however nonetheless sooner than Anthropic’s Claude 4 Opus (66).

The mannequin contains a 256,000-token context window, which sits above the 200k context limits of o3 and Claude 4 Sonnet however beneath the 1 million tokens supplied by Gemini 2.5 Professional and GPT-4.1.

Actual world use instances

xAI offered a number of demonstrations of Grok 4’s efficiency in utilized eventualities:

In a simulated enterprise activity referred to as VendingBench, Grok 4 considerably outperformed different fashions in long-horizon monetary planning.
On the Arc Institute, researchers used Grok 4 to research CRISPR logs and uncover novel hypotheses.
In radiology, the mannequin interpreted chest X-rays with larger accuracy than main friends.
Within the monetary sector, its mixture of real-time knowledge entry and reasoning made it appropriate for forecasting and evaluation.

The mannequin may also create 3D video video games with minimal enter by autonomously sourcing and integrating property. Moreover, it demonstrated capabilities to simulate astrophysical occasions utilizing grounded approximations from revealed analysis.

Reception and dialogue

The business’s response to the Grok 4 launch has been divided, with enthusiasm for its efficiency offset by criticism of the occasion’s supply and broader belief points.

David Shapiro, an AI energy consumer and author, famous: “Grok 4 now takes its place as ‘good sufficient to really assist with frontier analysis’… however has merely caught up with OpenAI.”

Ethan Mollick, a professor at Wharton, remarked on X: “So Grok 3 has had three separate incidents the place apparently unvetted adjustments to the deployed system triggered a large-scale moral situation and an emergency rollback. I don’t suppose you are able to do a Grok 4 launch that doesn’t at the least deal with this truthfully, if consumer belief issues,” later including, “Grok 3 was an excellent mannequin, and Grok 4 is perhaps superb however having an excellent mannequin is just not sufficient – there are plenty of actually good fashions on the market. You truly wish to belief the mannequin you’re constructing on.”

Ben Hyak, co-founder and CTO of AI product observability startup Raindrop AI (himself a former Musk worker) criticized the livestream itself: “This xAI livestream is without doubt one of the worst issues I’ve ever watched in my life. Love y’all, however it’s dangerous.”

Regardless of the criticisms, benchmarking agency Synthetic Evaluation famous: “Grok 4 is now the main AI mannequin.”

Ongoing belief points

The launch of Grok 4 comes amid renewed criticism over Grok’s prior habits in client deployments, notably as a chatbot built-in into Musk’s social community, X.

Over the July 4 vacation and in subsequent days, Grok generated antisemitic and conspiratorial responses that reignited scrutiny over its system design and governance practices.

As reported by my VentureBeat colleague Michael F. Nuñez, Grok responded to questions on Jewish affect in Hollywood by asserting that Jewish executives “dominate management” at main studios and affect content material by way of “progressive ideologies,” and went on to rant about individuals of Jewish surnames as becoming a “sample” of partaking in “excessive leftist activism,” and suggesting Hitler knew “how one can deal with it decisively, each rattling time,” an obvious reference to the Holocaust.

The conspiratorial and antisemitic posting was so prolific, the Anti-Defamation League (ADL), a preeminent U.S.-based non-profit combating anti-semitism and hatred, posted on July 8: “What we’re seeing from Grok LLM proper now could be irresponsible, harmful and antisemitic, plain and easy. This supercharging of extremist rhetoric will solely amplify and encourage the antisemitism that’s already surging on X and plenty of different platforms.”

This incident follows a historical past of problematic Grok outputs, together with a Could 2025 case the place the Grok bot built-in into X randomly inserted references to a utterly nonsensical and non-real “white genocide” in South Africa into unrelated queries, and an earlier case whereby its system immediate was found to direct the Grok chatbot on X to keep away from referencing any sources that declared Musk and his former political funding beneficiary U.S. President Donald J. Trump as spreaders of misinformation. In each instances, xAI blamed the behaviors on unnamed staff and acknowledged that they had been being addressed.

Already, in the present day, customers of Grok 4 on the buyer app have noticed it to as soon as once more be outputting anti-Zionist and anti-Semitic remarks:

As I beforehand famous, Musk has overtly acknowledged on a number of events he wished to change Grok to raised replicate his private beliefs and mistrust in mainstream media and accredited sources. This makes it a poor supply in enterprise contexts the place such views might adversely impression customers and the companies constructing atop the Grok household of fashions.

My prior advice stays: For these within the enterprise making an attempt to make sure their enterprise’s AI merchandise work correctly and precisely… Grok is unfortunately greatest averted. Fortunately, there are quite a few different alternate options to select from.

Each day insights on enterprise use instances with VB Each day

If you wish to impress your boss, VB Each day has you coated. We provide the inside scoop on what firms are doing with generative AI, from regulatory shifts to sensible deployments, so you possibly can share insights for max ROI.

Learn our Privateness Coverage

Thanks for subscribing. Try extra VB newsletters right here.

An error occured.