Why is DeepSeek R2 a big deal?
Leaked reports suggest that DeepSeek R2 is a massive 1.2 trillion parameter model that somehow only uses 78 billion parameters at a time, making it incredibly efficient.
If these rumors are true, R2 could cost just $0.07 per million input tokens and and $0.27 per million output tokens — compare this to OpenAI's $15 and $60 respectively for its o1 model, which is so expensive to run that it is largely reserved to API users and Max subscribers.
What this means is that we’re potentially talking about AI that's just as smart as the best holy-grail model from OpenAI, but costs less than GPT 4.1-nano, one of the cheapest OpenAI models right now.
What is DeepSeek R2?
DeepSeek R2 is the next-generation large language model from DeepSeek, a Chinese AI startup.
Aspect |
Details |
Core Capabilities |
• Complex reasoning across multiple languages • Advanced coding and debugging • Image and vision understanding • Multimodal support (text, images, audio, basic video) |
Why is it a big deal? |
R2 should match GPT-o1 performance while being 97% cheaper to run |
How was it trained? |
It was trained over 45 days using 5.2 petabytes of training data, including specialized datasets from finance, law, and patents |
What architecture does it use? |
It uses a MoE architecture with 1.2 trillion total parameters with 78 billion active at a time. |
How does it perform? |
• 89.7% on C-Eval 2.0 (China's toughest AI benchmark) • 92.4% accuracy on COCO vision tasks • Matches or exceeds GPT-o1/Claude Opus performance |
How much is it to run? |
• $0.07 per million input tokens • $0.27 per million output tokens |
When will it be released? |
Some time in 2025 |
It builds on the success of R1, which proved that you don't need billions in compute resources to create frontier AI. Similarly, R2 aims to be even more powerful while staying cost-efficient.
The model is designed to handle everything from complex reasoning and coding to understanding images and multiple languages at native-speaker levels.
Interestingly, while R1 was trained on thousands of Nvidia H100 and H200 GPUs, R2 is allegedly being developed using Huawei's Ascend 910B chips, and the training apparently only took 45 days. This is nothing in the world of AI training.
According to rumours, the model was trained on 5.2 petabytes of data including specialized datasets from finance, law, and patents and scores 89.7% on C-Eval 2.0 — China's toughest AI benchmark — and achieves 92.4% accuracy on vision tasks using the COCO dataset.
What will DeepSeek R2 be good at?
According to rumours, DeepSeek R2 will be particularly strong at 3 things:
- Coding
- Understanding multiple languages
- Processing different inputs
Coding
The model reportedly understands not just syntax but software architecture and can debug complex codebases, and performs on the level of a senior developer.
Essentially, R2 might surpass the best AI models for coding, while being a general-purpose model.
What does this mean in practice? You can ask it to write a Python script, explain a Rust error, refactor JavaScript code, and ask it to help with some homework, and it will do that.
Understanding multiple languages
Have you ever noticed that asking AI in languages other than English sometimes leads to less thoughtful responses? That’s not the case with DeepSeek R2, which reportedly maintains its full reasoning power across multiple languages.
For the billions of people who don't primarily speak English, this could mean finally having access to AI that works as well for them as Claude Sonnet 4 does for English speakers.
Read a review of Claude Sonnet 4.
Processing different inputs
The model reportedly handles images, audio, and basic video understanding all within a single unified system. This means you can show it a photo and ask questions about it, have it analyze charts and graphs, or even describe what's happening in a video clip.
For example, you could upload a screenshot of broken code, ask R2 to explain the error, have it generate a diagram explaining the fix, and then create documentation — all in one conversation.
Early examples show the model can understand complex visual relationships, read text in images, and even generate basic visualizations based on data.
It’s not a complete all-in-one AI package, but still, it’s very versatile.
DeepSeek R2 training innovations
The impressive performance described above was achieved thanks to several innovations in the training method. Here are the techniques we know they used to train this model:
Generative Reward Modeling (GRM)
DeepSeek developed something they call Generative Reward Modeling, and it's a bit like teaching the AI to grade its own homework. Instead of needing thousands of humans to tell the model what's good or bad, GRM lets the model generate its own feedback during training.
GRM supposedly leads to more nuanced understanding because the model develops its own sense of what works rather than just memorizing human preferences.
Self-Principled Critique Tuning
Before giving you an answer, R2 essentially asks itself: "Is this accurate? Is it helpful? Could it be clearer?" This self-reflection happens in milliseconds, but it makes the model a thousand times more accurate.
Hybrid Mixture-of-Experts (MoE) Architecture
This is a similar architecture to the one used in Gemini 2.5 Pro. And if you’ve ever used this model, you know how good it is.
How does this architecture work? This architecture is why R2 can be so massive yet so efficient. When you ask it to code, it activates the coding experts. When you switch to discussing history, different experts wake up while the coding ones go dormant. In all, instead of using all 1.2 trillion parameters for every single task, it activates only about 78 billion.
This is why it can cost only $0.07 per 1M input tokens (if the rumours on this are indeed correct).
From Nvidia to Huawei Chips
DeepSeek allegedly trained R2 using Huawei's Ascend 910B chips instead of Nvidia GPUs, which is absolutely massive news for the AI world.
After the US restricted China's access to advanced chips, many thought Chinese AI development would stall. Instead, DeepSeek may have just proved that Nvidia is overrated.
Reports suggest these Huawei-powered systems achieved 91% efficiency compared to similar Nvidia A100 clusters. For Nvidia, which has seen its stock price soar on AI demand, this could be the end of their near-monopoly, so investors are bracing for a massive pullback.
What’s more, DeepSeek allegedly has built an entire domestic supply chain for AI hardware, creating everything from memory modules to cooling systems, in house, right within China, to dodge any potential future restrictions, and if DeepSeek succeeds, other Chinese tech giants might try to copy their approach.
For consumers this likely means that we will get a lot more interesting AI models coming from China, even after R2 comes out (we’re looking at you, Alibaba — forget the Qwen 3, it’s time for Qwen 4).
How will R2 be used in the real world?
Major Chinese manufacturers like Haier, Hisense, and TCL have integrated DeepSeek AI into everything from refrigerators to robot vacuums, but the most interesting applications are in robotics.
Home robots powered by DeepSeek might understand their environment, adapt to changes, and be able to handle complex multi-step requests.
Imagine telling your robot vacuum "clean everywhere except near the sleeping cat, and do the kitchen last because I'm cooking."
Should I be concerned about privacy when using R2?
DeepSeek is a Chinese company, and all data from its apps goes to servers in China. This means that when you use DeepSeek, your data is subject to Chinese data laws. That is something worth understanding.
If you don’t want that, you can always run DeepSeek on your own hardware. After all, DeepSeek open-sources its models — you can download and run them completely offline.
You’d think that running an advanced model requires a very beefy PC, and usually you’d be right. But thanks to how efficient R2 is, you can realistically run it on a consumer level device.
The trade-off is convenience and ease of use.
When will DeepSeek R2 release?
DeepSeek has been unusually quiet about R2's official launch date, but the rumor mill suggests it may come out in the coming weeks or months.
The company has a history of surprise launches, having dropped R1 with little warning and maximum impact.
Here are some signs worth keeping an eye on that may suggest R2 is about to drop:
- Increased hiring at DeepSeek's Hangzhou headquarters
- New API documentation appearing on GitHub
- Suspicious downtime on their existing services that could indicate infrastructure upgrades.
Some developers even claim to have spotted R2 model identifiers in leaked API logs.
One thing is for sure — when R2 drops, it won't be a shadow launch, but a market shakeup.
FAQ
What is DeepSeek R2?
DeepSeek R2 is the upcoming next-generation AI model from Chinese startup DeepSeek, designed to compete with GPT o1 and Claude Opus while being 97% cheaper to run.
When will DeepSeek R2 be released?
DeepSeek R2 will be released some time in 2025. DeepSeek hasn't announced an official date, but increased activity and leaks point to an imminent release, possibly within weeks or months from now.
What makes DeepSeek R2 different from other AI models like GPT-4?
R2's main differentiators are its extreme cost efficiency, as it costs only $0.07 vs $15 per million input tokens, compared to similarly specced GPT-o1. It is also open-source, and uses innovative training techniques like Generative Reward Modeling.
Which companies are using DeepSeek's AI technology?
Overchat AI uses DeepSeek, along with other models, to offer smart AI chats online. Chinese manufacturers like Haier, Hisense, and TCL Electronics have integrated DeepSeek AI into consumer products: smart TVs, home appliances, and robot vacuums.
Is DeepSeek aiming for Artificial General Intelligence (AGI)?
Yes, DeepSeek openly states that achieving AGI is their long-term goal. They've prioritized research over revenue and rejected major investment offers to maintain independence in pursuing this vision.