βItβs smarter than nearly all graduate college students in all disciplines β Elon Musk.β
Elon Musk and his Grok group are again with their newest and finest mannequin to this point: Grok 4. It was solely 3 months in the past that this group of specialists launched Grok 3, a mannequin that also competes with the giants from OpenAI, Gemini, and Anthropic. However with Grok 4, Elon Musk is giving these corporations a run for his or her cash. Grok 4 comes with superhuman-level considering and reasoning capabilities. With instruments and brokers in its arsenal, it brings a greater understanding of the world, each private {and professional}. On this weblog, weβll discover every part about Grok 4: its options, capabilities, benchmarks, and eventually, weβll check it.
Letβs Grok it!
What’s Grok 4?
Grok 4 is the newest multi-modal massive language mannequin (LLM) from Elon Muskβs firm, x.ai. It has 100 occasions extra coaching knowledge than Grok 2 (the primary public mannequin by x.ai) and 10 occasions extra reinforcement studying compute than another mannequin out there. Grok 4 encompasses a 256K context window, real-time knowledge search, superior voice capabilities, agentic talents, and intelligence that carefully mimics human conduct.
Grok 4 has two variations:
- Regular Model: That is the single-agent model of the Grok 4 LLM. It options agentic conduct, the place one agent works to resolve your issues. This mannequin is helpful for every day duties involving language, search, coding, and extra. Itβs out there within the Tremendous Grok plan supplied by x.ai and likewise through API for builders.
- Grok 4 Heavy: That is the multi-agent model of Grok 4. When prompted, a number of brokers collaborate, evaluate outcomes, and generate the perfect outcome. Itβs best for advanced reasoning, deep evaluation, and analysis. It’s out there solely underneath the Tremendous Grok Heavy plan by x.ai.
Key Options
- Itβs an Educational Whiz:Β Grok 4 shines on the Humanityβs Final Examination (HLE) benchmark. Out of two,500 questions spanning math, physics, chemistry, humanities, and laptop science, it scored double digits on half! Most present fashions handle solely low single digits, suggesting Grok 4 can sort out PhD-level issues throughout disciplines.
- Instrument Use:Β Grok 4 has been skilled natively on software use, outperforming Grok 3βs analysis instruments. With intensive scaling and compute, it could actually deal with even the hardest text-based issues.
- Its design is Agentic: The Grok 4 fashions are agentic. With single and a number of brokers working behind the scenes, these fashions can swiftly carry out a number of duties.Β
- Its enhanced voice capabilities: The Grok 4 fashions include a sophisticated voice mode that sounds extra private and calm in comparison with the opposite fashions from Open AI and Gemini. It comes with a brand new voice, βEveβ β a British speaker that may rapidly swap from singing to whispering, mimicking human-like feelings.Β Together with this, the latency of their newest voice mode has been lowered by half, in comparison with its earlier model.
- It may possibly run a enterprise: The Grok 4 fashions can motive out like people and take decisive selections, strategise, and plan in a approach that makes them able to working a enterprise. Infact, they may simply enable you to make some revenue too.Β
Relating to multimodal capabilities, particularly picture evaluation and era, Grok 4 fashions presently carry out poorer than the highest fashions like o3, Gemini 2.4 Professional, Claude 4, and many others. Though this may increasingly enhance considerably within the coming few days (or weeks).
Availability

- Tremendous Grok:Β Consists of Grok 4 and Grok 3. Comes with a 128K token window, voice and imaginative and prescient capabilities. Priced at $30/month or $300/12 months.
- Tremendous Grok Heavy:Β Consists of Grok 4 Heavy and Grok 4. Provides an enhanced context window and early entry to new options. This premium plan prices $300/month or $3,000/12 months, corresponding to OpenAIβs and Googleβs premium tiers.
The way to Entry Grok 4?
To entry Grok 4 on chat:
- Head to Grookay.Β
- Log in to your Tremendous Grok account.
- Within the chatbox in the midst of the display screen and click on on the small mannequin dropdown on the nook of the chatbox.Β
- Choose the βGrok 4β mannequin

- As soon as accomplished, you will get began.
Β To entry Grok 4 on the API:
- Go to https://x.ai/api and click on on API Console Login.
- Click on on API Keys.
- Click on on Create API key and after that give a reputation to your api key and click on on Save to generate your grok api key.
- Now to entry the Grok 4 utilizing api endpoints, go to https://docs.x.ai/docs/fashions/grok-4-0709 and use the under code snippet to entry it.
from xai_sdk import Shopper
from xai_sdk.chat import consumer, system
consumer = Shopper(
Β Β Β Β api_host="api.x.ai",
Β Β Β Β api_key=""
)
chat = consumer.chat.create(mannequin="grok-4-0709", temperature=0)
chat.append(system("You're a PhD-level mathematician."))
chat.append(consumer("What's 2 + 2?"))
response = chat.pattern()
print(response.content material)
Grok 4 in Motion
Now that weβve learn all about Grok 4, itβs time to see if it brings within the punch because it claims. To do that, we’ll check Grok 4 on the next duties:
- PhD-level Query to check their reasoning capabilities
- Multi-step analysis to test its agentic capabilities
- Coding with context to check its real-world use capabilities
Letβs begin.Β
Job 1: Fixing a PhD-level Query
Outcome:
Evaluation:
Grok 4 approached the issue step-by-step, addressing every query so as. It appropriately interpreted the immediate, reasoned by the answer, and even generated code for the graphs when requested. The visualizations had been correct and aligned with the reason.
Job 2: Performing a Multistep Analysis
Immediate: βInform me about Analytics Vidhyaβs newest put up on X and discover the newest weblog on their web site β summarise info on them in 5 strains every.β
Result:
Evaluation:
This job it carried out higher than I had imagined. The duty itself isn’t tough, however I see so many fashions battling the dates to precisely fetch the newest info. Grok 4 took just a few seconds. It went by the web site and the Twitter web page, discovered the newest info, after which reasoned it out to present me 5 concrete strains on every.Β
You possibly can test it your self on our weblog web page or X web page.Β
Job 3: Doing Coding with Context
Immediate: βMerge all these PDFs and create a single JSON file.β
Outcome:

Evaluation:
It began effectively, by itemizing down the content material from a couple of information, after which started the hallucinations. All that I obtained within the outcome was a stream of #. So this was disappointing.Β
Immediate 2: βConvert the next code into Python and Reactβ
Outcome:
Evaluation:
Grok 4 was fast and fairly environment friendly, it rapidly generated the code in Python and truly understood that with the βreactβ phrase in my immediate. I used to be wanting ahead to seeing the code for my appβs frontend. It then additionally offered the code for every part, making it easy for me to repeat the required half as and when it’s wanted.Β
Grok 4 Benchmarks
Grok 4 nearly aced the entire benchmarks that we often take a look at. Here’s a abstract:

- GPQA (Graduate-Degree Physics Questions Archive): This benchmark check professional expert-level science data. On this benchmark, Grok 4 achieves 87-88%, main rivals like GPT-4o and Claude 3.5 Sonnet.
- AIME (American Invitational Arithmetic Examination) 2025: This benchmark compares the mathematical prowess. Grok 4 scores 95%, with some studies claiming as much as 100% dominance. This surpasses earlier SOTA fashions.
- SWE-Bench (Software program Engineering Benchmark): It evaluates coding and real-world software program problem-solving (Grok 4 Code variant). Scores vary from 72-75%, considerably forward of o3-mini (excessive) and Claude 3.5 Sonnet.
- Different Math and Reasoning Benchmarks: Grok 4 dominates U.S. Mathematical Olympiad and Harvard-MIT Arithmetic Event, and comparable checks with large beneficial properties over prior SOTA. It additionally excels typically reasoning and Ph.D.-level duties throughout fields.
These are the same old benchmarks for testing any newest LLM. Grok 4 additionally got here with its scorecard on two new benchmarks: ARC-AGI and Merchandising Bench.
ARC-AGI
This benchmark checks how shut fashions are to reaching AGI, or synthetic basic intelligence. That is accomplished by scoring their efficiency on completely different ARC-style duties, that are a set of difficult puzzles.

Grok 4 takes up the highest spot, breaking the ten% barrier, that means the mannequin has taken its first steps into basic reasoning. Claude Opus 4 fashions comply with subsequent after which come o3 (excessive), o4-mini(excessive), and others! This appears that Grok 4 is basically nearer to AGI than the remainder of its friends.Β
Merchandising Bench
This benchmark checks the agentic AI methods to measure how effectively these brokers can work together with an actual e-commerce web site to finish advanced duties.Β Itβs designed to emphasize check real-world determination making, planning, and UI interplay.Β
Grok 4 excels on this too, beating some human, Claude 4, Opus, and Gemini 2.5 Professional and o3.Β

Infact, the Grok 4 was examined to run an precise merchandising machine to check this, and it incurred large earnings whereas doing so. Anthropic had launched one thing comparable about Claude working a merchandising machine a couple of days again, and in that, that they had talked about that the machine ran right into a loss!
Functions of Grok 4
Grok 4 comes with a terrific set of options and efficiency benchmarks, based mostly on which it may be fairly helpful for:
- Actual-Time Social Media Interplay: It’s built-in straight into X (previously Twitter) as a chatbot. It may be used to generate memes, posts, polls, summaries, or sentiment evaluation.
- Superior Analysis: It may possibly remedy PhD-level questions, thus indicating that it could actually actually contribute to superior analysis in arithmetic, physics, and engineering.
- Enterprise Planning: It may possibly assist to map out methods and carry out superior enterprise evaluation that can assist you get actionable insights.Β
- Coding and Writing: Grok 4 comes with sensible SWE benchmarks and agentic capabilities, thus it could actually take up many coding duties and carry out them effectively too.Β
Grok 3 vs Grok 4
Though Grok 3 has been within the highlight for its racist feedback, with Grok 4, the group is seeking to do extra than simply injury management. Grok 4 comes with software use built-in from the beginning, and the Grok group plans to improve this to βindustrial gradeβ capabilities, serving to you remedy precise, real-world issues. Together with this, we will anticipate Grok 4 to grasp video and picture evaluation and era very quickly, bringing us nearer to experiencing playable AI-generated video video games and absolutely AI-generated exhibits.
Conclusion
Is Grok 4 a giant deal? Undoubtedly. In a market that feels more and more saturated, it stands out as a breath of recent air, providing actual enhancements over its predecessors. With precise use instances rising, it appears poised to assist remedy many on a regular basis issues. Each normal and Heavy variants are agentic, quick, and considerably higher at reasoning. Whereas some recommend itβs constructed for AGI, I imagine thereβs nonetheless time and room for progress. Grok 3 additionally launched with nice promise however later went off monitor. With this new launch, itβs just the start, a lot testing continues to be wanted to know its true potential.
Podcast: Play in new window | Obtain
Login to proceed studying and luxuriate in expert-curated content material.

