Qwen 3 has Entered The Chat: A New Breakthrough Model from China

What makes Qwen3 Interesting

Meta hasn’t even had time to fully unveil its next generation of LLaMA 4 models, and Qwen3 has already shifted the conversation.

Llama model, represented by a green box, being outpaced by Qwen and DeepSeek, represented as two muscular man. — Chinese open source models are trending right now

The performance leap here is impressive:

Qwen3-3.8B outperforms the previous Qwen2-32B by almost 10 times
The compact model Qwen3-4B comes close in performance to Qwen2-72B-Instruct

Two more Qwen3 models, ranging from 0.5 to 72 billion parameters, are available or will soon appear on GitHub under an open license.

As with other AI systems, the more parameters a model has, the better — though also more expensive to run. The model was trained on a combination of textbooks, code snippets, question-answer pairs, AI-generated data, and other sources.

At the same time, Qwen3 is completely free — available both via the web and through an app.

Top 6 ChatGPT Alternatives

Advantages of Qwen3

Mixture of Experts (MoE)

Some Qwen3 models are built on the Mixture of Experts (MoE) architecture, which improves computational efficiency by splitting tasks and distributing them among specialized sub-models

Efficiency

Alibaba’s developers noted that integrating the "thinking" and "non-thinking" modes into Qwen3 was achieved quite inexpensively, and the architecture itself simplifies the customization of agents for specific tasks.

Multilingual Support

The model handles 119 languages, including rare ones, and offers a new approach to "thinking": for quick sequential queries, it uses an instant mode, while for complex tasks, it switches to a deliberate, “thinking” mode.

Qwen3 Benchmark Results

In benchmarks, the flagship Qwen3-72B-A2.2B outperforms competitors on the Codeforces platform, including OpenAI’s Q3-mini and Google’s Gemini 2.5 Pro.

It also shows strong results in AIME (math) and BFCL (ability assessment) benchmarks. However, this flagship version is not yet publicly available.

The open-access model Qwen3-23B, however, is already competing with the best proprietary and open AI models, including DeepSeek R1, while Qwen3-32B surpasses OpenAI's o1 in LiveCodeBench.

Qwen3 is another example of an open model that keeps pace with closed solutions.

Qwen 3 Benchark results — Qwen3 bencharm results. Source: Qwen

What This Means in Practice

Having a model with outstanding performance packed into a 4GB file would have seemed like science fiction back in the 2000s. Now, it’s a reality — and an open-source one at that.

Now even compact models like Qwen3-4B deliver results comparable to much larger models — a 4GB model can already program at an engineer’s level.

Running Qwen3-30B-A3B, with 3 billion active parameters, is possible even with just 11GB of VRAM.

This means that the generation speed and performance are comparable to a 3B model, but the quality is much closer to that of a significantly larger model.

This is made possible by the MoE (Mixture of Experts) architecture: the Qwen3-30B-A3B model has 30 billion parameters, but only 3 billion are active during inference. This means its performance and generation speed are comparable to a 3B model, while the quality is that of a much larger one.

Programming isn't where it ends. Qwen3 demonstrates a high level of reasoning: in benchmarks, it outperforms GPT-4o. Conveniently, the "thinking" mode can be activated not only through system prompts but also through regular messages.

Where to Use Qwen3

The model is already available for download on Hugging Face and can be run via LM Studio, Ollama, MLX, or llama.cpp. It supports both the Instruct mode and the base model format.

Qwen3 will also be integrated into Overchat AI — to use it directly in a conversational format, with access to all new features.

Stay tuned — Qwen3 is just beginning to rewrite the rules of the game.

The Bottom Line

Qwen3 appears to be a turning point in the development of open models. Modern language models no longer require three GTX 4090s to run locally, and AI agents, currently bleeding-edge tools for enthusiasts and enterprises, are soon to become default system apps, like calculators.

‍