Grok 4: A Complete Overview

What is Grok 4?

Grok 4 is the latest ChatGPT alternative from xAI, and a successor to Grok 3.

‍

To create Grok 4, xAI used Colossus, a cluster of 200,000 GPUs working together. The team figured out how to make their training six times more efficient. They also expanded their training data beyond math and coding problems to include everything and did a training run that used over ten times more computing power than anything they had attempted before.

‍

Right now, there are two versions of Grok 4. The standard version is what most people will use. Then there's Grok 4 Heavy, which spawns multiple agents that consider different angles simultaneously. The Heavy version can run for up to 10 minutes on a single problem. This helps it solve complex mathematical proofs or analyze intricate scenarios.

‍

Grok 4 Features

‍

Tools

Grok 4 was trained using reinforcement learning to naturally use tools.

‍

This means that, for tasks that usually trip up large language models, Grok can enhance its thinking with things like code interpreters and web browsing. When Grok 4 needs current information or must answer tough research questions, it picks its own search terms, finds information across the web, and digs as deep as needed to provide a solid answer.

‍

xAI also trained Grok to access X's data in powerful ways. It can use advanced keyword and semantic search tools, as well as analyze media, to improve its responses.

‍

Grok 4 can actively search the web, and choose its own search queries. It decides when it needs more information. If the first search doesn't give good results, it reformulates and tries again. One example from xAI showed Grok searching through multiple X posts, news articles, and web sources to find a particular tweet.

‍

Voice mode

You can now talk to Grok using the upgraded Voice Mode, which is more realistic, responsive, and intelligent than before. They've introduced a new "serene" voice that sounds natural, and they've redesigned conversations to feel more human.
The really cool part is that Grok can see what you see.

‍

Point your camera at something while talking, and Grok will analyze your scene in real time and provide insights about what it's looking at during your voice conversation. They developed this model in-house using their reinforcement learning framework and custom speech compression techniques.

‍

API access

The Grok 4 API gives developers access to frontier-level understanding across text and images, with a 256,000 token context window and advanced reasoning for complex tasks. It includes real-time data search across X, the web, and news sources through their new live search API, delivering current, accurate responses powered by native tool use. The API meets enterprise security standards with SOC 2 Type 2, GDPR, and CCPA certifications for sensitive applications. Grok 4 is coming to hyperscaler partners soon, making enterprise deployment easier.

‍

How well does Grok 4 perform?

Grok 4 beats every other model in some popular benchmarks. For example, on the ARC-AGI benchmark it scores 15.9%, which is almost 100% higher than the next best AI model, Claude 4.

‍

Read our review of Claude 4.

‍

Also, Grok 4 Heavy just became the first AI model to crack 50% on Humanity's Last Exam — a benchmark specifically designed to be the final closed-ended academic benchmark of its kind. Previous models barely scratched 25%.

‍

On the AIME 2025 math competition, Grok 4 Heavy scored a perfect 100%, while the regular Grok 4 hit 91.7%, which is still better than most human mathematicians.

‍

Is Grok 4 the best AI model, then? It's not that simple. Despite all those benchmark scores, there are many accounts on Reddit of users who were underwhelmed by Grok 4 performance. Users report that Claude 4 is still better for my actual coding work. One Reddit user wrote: "I've heard nothing but massive praise and hype for grok 4, people calling it the smartest AI in the world, but then why does it seem that it still does a subpar job for me for many things?"

‍

Let's be clear about Grok 4's strengths: mathematics and theoretical reasoning. It's frighteningly intelligent when solving complex math problems, but may struggle in practical coding tasks.

‍

That's why xAI is working on a separate coding-specific model. Right now, if you need help with programming, you're probably better off with Claude or even ChatGPT.

‍

The issue with these benchmarks is that they're measuring very specific things that don't always translate to real-world usefulness. Grok 4 was trained with massive computational resources specifically aimed at performing well on these tests.

‍

As one Reddit commenter noted, "the moment you introduce tracking/benchmarks, people will start optimizing behavior for the benchmark — even if it negatively impacts the original behavior."

‍

FAQs

‍

How does Grok 4 compare to other AI models?

Grok 4 sets new records on several benchmarks. It scored 15.9% on ARC-AGI V2, nearly double Claude Opus 4's performance. On math competitions, Grok 4 Heavy got a perfect 100% on AIME'25, while regular Grok 4 hit 91.7%. It's the first model to score over 50% on Humanity's Last Exam. However, user reports suggest it's not necessarily better for everyday tasks like coding, where Claude or ChatGPT might still be preferable.

‍

How much does Grok 4 cost?

On Overchat AI, you can access Grok 4 for just $4.99/week. But the most advanced model, SuperGrok Heavy access costs $300/month. API pricing hasn't been detailed yet, but given the subscription costs, enterprise API access will likely be expensive. Regular Grok 4 is available through Overchat AI, SuperGrok and Premium+ subscriptions.

‍

How to Access Grok 4?

You can access Grok 4 on Overchat AI for 4.99$ per week. It's available with SuperGrok and Premium+ subscriptions. If you want the most powerful version, you'll need SuperGrok Heavy, which costs $300/month and gives you access to Grok 4 Heavy.

‍

Does Grok Really Consult Elon Musk's Opinions?

It seems so. When TechCrunch started digging into how Grok 4 answers controversial questions, they found that the AI appears to be checking what Elon Musk thinks before giving its final answer. For example, when asked about immigration, Grok 4's chain of thought showed it was "Searching for Elon Musk views on US immigration." Same thing happened with questions about Israel-Palestine, abortion, and the First Amendment. The AI would search through X for Musk's posts, scan news articles about his positions, then craft its response.

‍

Is Grok 4 good for coding?

Despite its benchmark performance, many users report that Grok 4 isn't great for practical coding tasks. On the other hand, It excels at math, so it can be used as a very accurate homework solver.