MiniMax M3 vs Qwen 3.7 Max: A Head-to-Head Comparison

TL;DR

If you'd prefer not to read the full comparison, here's a table summarising the key details:

‍

Attribute	Llama	Claude
Best for	Open-weight deployment, long-context and multimodal work, natural writing	Math and reasoning-heavy text work
License	Open-weight	Proprietary, API-only
Modality	Text, image, video	Text only
Context	1M tokens	1M tokens
Released	June 1, 2026	May 20, 2026
Price (input/output per 1M)	~$0.60 / $2.40	~$2.50 / $7.50

‍

Quick recommendations by use case:

‍

For coding: It's close. Qwen 3.7 Max scores higher on the benchmarks, but in our own test both models reached the same optimal solution and MiniMax explained it better.
For math: Qwen 3.7 Max. It posts the strongest competition-math figures of any Chinese model, and it edged MiniMax in our own math test too.
For writing: MiniMax M3. This was the clearest gap we saw — its prose felt far more natural than Qwen's.
For multimodal or long-context tasks: MiniMax M3, the only one of the two that accepts image and video input.
For self-hosting or cost-sensitive scale: MiniMax M3, which ships open weights and the lower per-token price.

‍

Across the three hands-on tests, we preferred using MiniMax M3 overall. It won two of the three, and its explanations were consistently clearer.

‍

MiniMax M3 vs Qwen 3.7 Max at a Glance

Before we look at the specific benchmarks, let's first establish what these models are and compare their specifications. Here's a table containing the most important information about each model:

‍

Spec	MiniMax M3	Qwen 3.7 Max
Developer	MiniMax (Shanghai Hixi Technology)	Alibaba (Qwen team)
Release date	June 1, 2026	May 20, 2026
License	Open-weight	Proprietary
Architecture	MiniMax Sparse Attention (MSA)	Dense reasoning model
Modality	Text, image, video	Text
Context window	1M tokens (512K guaranteed)	1M tokens
Thinking mode	Toggleable	Always on
SWE-Bench Pro	59.0% (vendor-reported)	Vendor-reported, see benchmarks
Input price (per 1M)	~$0.60	~$2.50
Output price (per 1M)	~$2.40	~$7.50

‍

Key Differences

You've probably noticed from the table above that there are 4 areas where these models differ the most in terms of their specs:

‍

License: M3 is open-weight, with weights published for self-hosting. Qwen 3.7 Max is closed and reachable only through Alibaba's API.
Modality: M3 accepts image and video input natively. Qwen 3.7 Max is text-only.
Architecture: M3 uses a sparse-attention design built for long-context efficiency. Qwen 3.7 Max is a reasoning model that runs its thinking process on every request.
Price: M3 is roughly four times cheaper on input tokens.

‍

Who Each Model Is Best For

MiniMax M3 is best for users who need to self-host the model, or anyone who wants to send images or video inputs. It's also the cheaper model of the two, making it a good choice for high volume tasks where you expect to burn through a lot of tokens. In our testing it was also the stronger writer, so it's a good pick for long-form and creative work.

‍

The Qwen 3.7 Max is best suited to text-only tasks that require the model to demonstrate significant reasoning ability: complex backend coding problems, advanced mathematical calculations and analysing large statistical databases.

‍

What Is MiniMax M3?

MiniMax M3 is the flagship language model released by the Shanghai lab MiniMax on June 1, 2026.

‍

It's the first open-weight model to simultaneously offer:

‍

Frontier-level coding performance
A one-million-token context window
Ability to process images and videos natively

‍

Architecture and Training Approach

M3 is built on a new attention mechanism that the lab calls MiniMax Sparse Attention (MSA). Most models use dense attention, which gets more expensive very quickly as the input gets longer. That cost is the main reason long context windows are hard to offer. MSA is designed to handle very long inputs while keeping that cost low.

‍

MiniMax says this makes M3 much faster than its previous M2 model when working with very long inputs. It also helps with a specific problem in long tasks. As a conversation grows, a model normally has to re-process more and more text on every step, and MSA is built to reduce that overhead.

‍

Key Capabilities

Let's take a look at the main features of MiniMax M3:

‍

Coding. This is the capability MiniMax leads with. On SWE-Bench Pro — a benchmark that hands the model real bug reports from open-source projects and checks whether its fix passes the project's tests — M3 scores 59 percent. That is higher than the scores MiniMax reports for GPT-5.5 and Gemini 3.1 Pro, and just below Claude Opus 4.7, which is strong company for an open-weight model.
Multimodal input. M3 can read images and video, and it can operate a desktop computer.
Long context. It has a 1M-token context window, with a guaranteed minimum of 512K tokens. That is large enough to hold an entire codebase or a long document in a single prompt.
Toggleable thinking mode. You can turn its step-by-step reasoning on or off for each request, depending on whether the task needs it.

‍

Pricing and Availability

Here's where things stand on getting access to M3 and what it costs:

‍

License: open-weight. MiniMax planned to publish the model weights and a technical report on Hugging Face and GitHub within about ten days of launch, so you can run it on your own hardware.
API: already live, at around $0.60 per million input tokens at standard context length. Calls above 512K tokens cost more.
Easiest way to try it: directly on Overchat AI, with no setup.

‍

What Is Qwen 3.7 Max?

Qwen 3.7 Max is the flagship model in Alibaba's Qwen family. Alibaba announced it at the Alibaba Cloud Summit in Hangzhou on May 20, 2026. It is a text-only model built for reasoning, coding, and long tasks.

‍

Architecture and Training Approach

Qwen 3.7 Max is a reasoning model. It works through a problem step by step before giving an answer, rather than replying straight away. The model also marks a strategic shift for Alibaba. Earlier Qwen models were open-weight, but Qwen 3.7 Max is proprietary and API-only. It is Alibaba's first move toward a closed-frontier approach.

‍

Key Capabilities

Let's take a look at where Qwen 3.7 Max is strongest. It stands out most on mathematics and reasoning, where it scores higher than any other Chinese model on the main benchmarks:

‍

HMMT competition mathematics — 97.1. HMMT is one of the hardest high-school maths competitions in the United States. A score this high means the model solves nearly all of the problems, and it is the top score in Alibaba's comparison table.
GPQA Diamond — 92.4. GPQA Diamond is a set of graduate-level science questions written to be "Google-proof," so they reward reasoning rather than recall. PhD-level experts answer around 65 percent correctly, which makes 92.4 a very strong result.
Humanity's Last Exam — 41.4. This is a deliberately brutal benchmark of expert questions across many fields, where even the best models score low. A higher number is better, and Qwen leads its Chinese peers here.
Long context — 1M tokens. Its context window is up from 256K on the previous Qwen 3.6 Max, which puts it in line with other current flagship models.

‍

Pricing and Availability

Here's how to access Qwen 3.7 Max and what it costs:

‍

License: proprietary. There are no open weights, so you cannot self-host it.
Where to reach it: through Alibaba Cloud Model Studio and compatible API endpoints.
Price: roughly $2.50 per million input tokens and $7.50 per million output tokens, about half the price of comparable Western flagships.
Easiest way to try it: directly on Overchat AI, the same place you can test M3.

‍

Release Timeline

MiniMax M3 Release History

Early 2026 — M2.5 and M2.7. MiniMax released several M-series models in quick succession. M2.7 was notable because the lab said the model helped develop itself, running optimization loops on its own.
Weeks before launch — architecture teasers. MiniMax began previewing its new sparse-attention design ahead of the model's release.
June 1, 2026 — M3 launch. The model went live through the MiniMax API and the MiniMax Code app, with thinking mode that can be switched on or off.
Within ~10 days of launch — open weights. MiniMax planned to publish the model weights and a full technical report on Hugging Face and GitHub.

‍

Qwen 3.7 Max Release History

May 14, 2026 — preview on LM Arena. A preview version showed up on the public LM Arena leaderboard, with no press release. Its text Elo score reached about 1,475, near the top ten for math, coding, and software tasks.
May 18–19, 2026 — API goes live. Access opened on third-party platforms and on Alibaba Cloud Model Studio, a day before the public announcement.
May 20, 2026 — official launch. Alibaba formally announced Qwen 3.7 Max at the Alibaba Cloud Summit in Hangzhou.
A strategic shift. Unlike earlier open-weight Qwen models, Qwen 3.7 Max shipped as a closed, proprietary model — Alibaba's first move toward a closed-frontier approach.

‍

Qwen 3.7 Max vs Qwen 3.7 Plus

Alibaba released two models under the Qwen 3.7 name within weeks of each other. They are built for different jobs:

‍

Qwen 3.7 Max is text-only and was released on May 20, 2026. It is built for coding, reasoning, and long text tasks. This is the model used in this comparison.
Qwen 3.7 Plus is multimodal and was released on June 1, 2026. It is built on the Max model, with added vision and screen-control features for computer-use tasks.

‍

Benchmark Performance

Here's how both models performed in the benchmarks. Just a quick note, the numbers come directly from the developers and there are some tests that don't cover both models. We'll mark the performance of a model that wasn't tested with a "~" sign.

‍

Benchmark	What it measures	MiniMax M3	Qwen 3.7 Max
SWE-Bench Pro	Fixing real bugs in open-source codebases	59.0%	~
Coding (aggregate)	Average across head-to-head coding tests	67.0	73.6
Terminal-Bench 2.0	Running terminal commands safely and correctly	66.0%	69.7%
Agentic (aggregate)	Average across head-to-head agentic tests	71.9	69.7
HMMT	Hard high-school competition mathematics	~	97.1
PolyMATH	Multilingual competition mathematics	~	86.5
GPQA Diamond	Graduate-level, "Google-proof" science questions	~	92.4
Humanity's Last Exam	Very hard expert questions across many fields	~	41.4

‍

In short, the published numbers say Qwen 3.7 Max is stronger on coding, math, and reasoning, while MiniMax M3 leads on agentic tasks and wins clearly on price and openness. The next section checks whether Qwen's reasoning lead shows up on real prompts.

‍

Real-World Testing

Now we'll put both models through three tests, each covering a different skill: hard algorithmic coding, mathematical reasoning, and long-form writing.

‍

Testing Methodology

We ran both models on Overchat AI, which offers MiniMax M3 and Qwen 3.7 Max in one place. This let us give each prompt to both models under the same conditions. Every task used the same prompt for both, with no follow-up help. Each answer below is shown as the model returned it. The prompts come from established benchmarks.

‍

Evaluation Criteria

We judge each task against a fixed set of criteria that suits its type:

Coding: Does the code handle edge cases, run correctly, use a sensible approach, and come with a clear explanation?
Math and reasoning: Is the final answer correct, and is the working sound and complete?
Writing: Does it follow the brief, have a clear voice, hold together, and avoid padding?

‍

Test 1: Hard Algorithmic Coding

This is "Trapping Rain Water," a classic problem that LeetCode lists as Hard.

‍

Prompt

Given n non-negative integers representing an elevation map where the width of

each bar is 1, compute how much water it can trap after raining.

‍

For example, given the heights [0,1,0,2,1,0,1,3,2,1,2,1], the answer is 6.

‍

Constraints:

- n == height.length

- 1 <= n <= 20,000

- 0 <= height[i] <= 100,000

‍

Provide a solution that runs in linear time and uses constant extra space, and

explain the idea behind your approach.

‍

The naive approach checks every bar against the tallest bars to its left and right, which works but runs in O(n²) time. We're looking for the better solution that runs in a single linear pass: either precomputing the running maximums from each side, or using two pointers that move inward from the ends. A strong answer should use one of these, explain the idea behind it clearly, and provide runnable code that handles the edge cases, such as an empty or flat elevation map where no water is trapped.

‍

MiniMax M3 Results

‍

‍

Qwen 3.7 Max Results

‍

‍

Winner

Both models produced the same solution: the optimal two-pointer approach that runs in linear time and constant space.

‍

We ran both versions against the example and a range of edge cases — an empty map, a flat one, and strictly increasing and decreasing maps — and they returned the correct answer every time. The code is functionally identical, so the explanation is what decides this one.

‍

Each model explained the core idea well, and each included a proof of why the approach is correct, which is more than the prompt asked for.

‍

However, MiniMax added two things Qwen did not.

‍

A step-by-step walk-through of the example, tracing the pointers and the running water count at each step, which makes the logic easy to follow.
A short table comparing all four common approaches — brute force, prefix/suffix arrays, monotonic stack, and two pointers — that shows at a glance why two pointers is the only one meeting both the time and space limits.

‍

Qwen's answer is excellent, and its correctness proof, laid out as a numbered argument, is arguably the cleaner of the two. Even so, the walk-through and the comparison table make MiniMax's response more complete and more useful to anyone learning the problem.

‍

Winner: MiniMax M3.

‍

Test 2: Complex Math and Reasoning

This is a problem taken word-for-word from HMMT November 2025 (Problem 9). HMMT is one of the hardest amateur mathematics competitions in the United States. The problem has a single verifiable answer.

‍

Prompt

Let a, b, and c be pairwise distinct nonzero complex numbers such that

‍

(10a + b)(10a + c) = a + 1/a,

(10b + a)(10b + c) = b + 1/b,

(10c + a)(10c + b) = c + 1/c.

‍

Compute abc.

‍

The correct answer here is 1/91. A strong response should reach that value through a clear derivation, without skipping over the hard steps, and should state the final answer plainly instead of leaving it buried in the working.

‍

MiniMax M3 Results

‍

‍

Qwen 3.7 Max Results

‍

‍

Winner

Both models reached the correct answer of 1/91, so this test comes down to how they got there, and they took noticeably different routes.

‍

MiniMax M3 multiplied each equation through by its variable and rewrote everything in terms of the symmetric functions — the sum, the pairwise products, and the product abc. That let it show that a, b, and c are all roots of the same cubic, and two applications of Vieta's formulas then gave the sum and the product directly. The derivation is compact and reaches the answer without a wasted step.

‍

Qwen 3.7 Max took the equations in pairs and subtracted them from one another. The matching terms cancelled, the differences factored out, and the whole system collapsed to a single relationship: 91 = 1/abc. Qwen then kept going. It derived the same cubic M3 had used, checked that the roots match the required sum and product, and confirmed that the three numbers really are distinct and nonzero.

‍

Both derivations are sound, and neither skips over the hard steps. The verification is what separates them. The problem specifies that a, b, and c are pairwise distinct nonzero complex numbers, and only Qwen confirmed that such numbers actually exist and satisfy the conditions. M3 stopped the moment it had the answer. That extra check makes Qwen's solution the more complete and trustworthy of the two.

‍

Winner: Qwen 3.7 Max.

‍

Test 3: Long-Form Writing

This test looks at open-ended writing under firm constraints. The constraints let us judge the result on how well it follows the brief and whether it has a real voice, rather than on a vague impression.

‍

Prompt

Write a short story of approximately 600 words about a lighthouse keeper who

receives a letter with no return address. The story must be written entirely in

the second person ("you"), must never state the contents of the letter directly,

and must end on an image rather than a line of dialogue. Maintain a single

consistent tense throughout.

‍

The story has to hit all four constraints: second person throughout, the contents of the letter never stated, an ending on an image rather than dialogue, and one consistent tense. On top of that, we're looking for a real voice rather than generic prose, a structure that holds together from start to finish, and enough restraint to avoid padding and clichés.

‍

MiniMax M3 Results

‍

MiniMax M3 creative writing test result on Overchat AI

‍

Qwen 3.7 Max Results

‍

‍

Winner

It’s interesting to see that both stories overlap a fair amount: the same beats of carrying the envelope up the stairs, reading it in the beam, folding it away, and ending on the view.

‍

However, in our opinion, MiniMax uses sensory detail in a way that feels far less forced, reading like something you might find in a real book. Lines such as "the Atlantic gnawing at the rocks below," "you slid your thumb beneath the flap," and "it pulled at a memory older than the keeper's cottage, older than the lantern itself" carry some weight. This is united AI writing, mind you, but it doesn’t immediately register as slop, especially if you’re not trying to harshly judge it.

‍

Meanwhile, Qwen piles its details on more mechanically, as if working too hard to show rather than tell. The images it settles on are plainer for that effort, and the strain shows through.

‍

However, in fairness, MiniMax did slip into AI-slop territory with "Your hands did not tremble. Your breathing did not change. But something behind your sternum shifted." Still, we have to give this one to MiniMax M3.

‍

Winner: MiniMax M3.

‍

Strengths and Weaknesses

Where MiniMax M3 Performs Best

M3 is the only one of the two that can read images and video. It is also open-weight, so you can run it on your own hardware. Its sparse-attention design makes long-context work cheaper, and it leads Qwen on agentic and tool-use benchmarks.

‍

Our own testing added two more strengths to that list. M3 was the clearly better writer, producing prose that felt natural where Qwen's felt forced.

‍

It also explained its answers better across the board: on both the coding and math tests, even where the two models reached the same result, M3's write-up was easier to follow. For high-volume or multimodal work, and for anyone who values a clear, readable response, these strengths often matter more than a point or two on text-reasoning benchmarks.

‍

Where Qwen 3.7 Max Performs Best

Qwen 3.7 Max leads on the published figures for coding, mathematics, and graduate-level reasoning.

‍

Its biggest lead is in competition math, and that held up in our own testing: it won the math test, reaching the right answer and being the only one of the two to verify its work.

‍

Which Model Should You Choose?

Let’s quickie summarize and talk about which model you should pick based on the task before you.

‍

Best Model for Coding

This one is close. Qwen 3.7 Max scores higher on the published coding benchmarks, so on paper it's better.

‍

But in our own coding test the two were hard to separate: both reached the same optimal solution, and MiniMax actually came out ahead on the strength of a clearer explanation. We’d say — go with MiniMax.

‍

Best Model for Math

Qwen 3.7 Max is the better choice for math. It posts the highest competition-math scores of any Chinese model, and math is the area where it pulls furthest ahead of M3.

‍

Best Model for Writing

MiniMax M3. This was the biggest gap we saw in any of the three tests. M3's prose felt like something from a real book. For long-form or creative writing, M3 is the clear pick.

‍

Best Model for Business Use

It depends on what matters most to your business.

‍

If you need data control, self-hosting, or image and video input, MiniMax M3.
If you mainly do text-heavy analysis and want the strongest reasoning, Qwen 3.7 Max.

‍

Bottom Line

According to the published benchmarks, Qwen 3.7 Max is the stronger model for coding, mathematics, and reasoning. However, in our own testing, MiniMax M3 came out ahead in the three hands-on tests, winning the coding and writing tasks. This was mainly a matter of subjective preference — it felt better to work with this model. However, when working with AI, all you do is read its replies, so the ability to deliver clear explanations and communicate clearly should not be underestimated. For this reason alone, and given that both models are genuinely very strong, we would recommend using MiniMax M3 over Qwen 3.7 Max for most people — at least if you have to choose just one.

Which, you might not need to at all, since on Overchat AI you can try both side by side in one place, and decide which one you like more, or simply keep using them both at the same time.

‍

FAQ

Is MiniMax M3 better than Qwen 3.7 Max?

Neither is better across the board. Qwen 3.7 Max leads on published coding, math, and reasoning benchmarks. MiniMax M3 leads on agentic tasks, multimodality, openness, and price. In our own hands-on tests, M3 came out ahead overall, winning the coding and writing tasks while Qwen took math, and we preferred using it. The right choice still depends on whether your priority is reasoning benchmarks or things like cost, flexibility, and writing quality.

‍

Which model is better for coding: MiniMax M3 or Qwen 3.7 Max?

It's close. Qwen 3.7 Max scores higher across the coding benchmarks the two were compared on, so on paper it's the stronger coder. In our own test, though, both reached the same optimal solution and MiniMax edged it on a clearer explanation. M3 also leads on agentic coding tasks specifically, where the model works step by step through a larger job.

‍

Which model performs better on math and reasoning tasks?

Qwen 3.7 Max. It posts the strongest competition-math figures of any Chinese model, including a reported 97.1 on HMMT. It also leads on graduate-level reasoning benchmarks such as GPQA Diamond.

‍

How does MiniMax M3 compare to Qwen 3.7 Plus?

They target different jobs. M3 is an open-weight, multimodal model for general reasoning and long-context work. Qwen 3.7 Plus is a multimodal agent model built for screen perception and computer-use tasks. For GUI and agentic vision work, Plus is the closer comparison. For general reasoning, M3 stands on its own.

‍

Is Qwen 3.7 Max worth upgrading from Qwen 3.7 Plus?

They are not really an upgrade path for one another. Max is text-only and reasoning-focused, while Plus adds vision and GUI control. Choose Max for text reasoning and coding, and choose Plus for tasks that require the model to see and operate an interface.

‍

Which model has the larger context window?

They match. Both MiniMax M3 and Qwen 3.7 Max ship a 1M-token context window. M3 guarantees a minimum of 512K tokens at full fidelity.

‍

Which model is faster for real-world tasks?

This depends on the task and the thinking mode. M3's sparse-attention architecture is built for speed at long contexts, and it lets you turn thinking off for simple tasks. Qwen 3.7 Max runs its reasoning process on every request, which can add latency on tasks that would not need it.

‍

Which model produces better long-form writing?

MiniMax M3, by a clear margin in our testing. Given the same creative-writing brief, its prose read naturally, while Qwen 3.7 Max piled on sensory detail in a way that felt forced. This was the widest quality gap we saw across the three tests.

‍

Can MiniMax M3 and Qwen 3.7 Max use tools and agents?

Yes. Both are built for agentic and tool-use work, and both labs market long-horizon autonomous capabilities. On published head-to-head benchmarks, M3 holds a slight edge on agentic and tool-use measures.

‍

Which model is more cost-effective?

MiniMax M3. It costs roughly $0.60 per million input tokens against Qwen 3.7 Max's ~$2.50, which makes it about four times cheaper on input. Its open weights also allow self-hosting, which can cut costs further at scale.

‍

What are the biggest differences between MiniMax M3 and Qwen 3.7 Max?

There are four. M3 is open-weight while Qwen is closed. M3 is multimodal while Qwen is text-only. M3 uses sparse attention while Qwen is a dense reasoning model. And M3 is substantially cheaper. On capability, Qwen leads on text reasoning while M3 leads on agentic tasks.

‍

Is MiniMax M3 open source?

M3 is open-weight. MiniMax publishes the model weights for download and self-hosting, with a technical report released alongside. Open-weight means the trained model itself is available, which is the part that matters for self-hosting.

‍

Is Qwen 3.7 Max open source?

No. Qwen 3.7 Max is proprietary and API-only. This is a notable break from earlier Qwen models, which were released under permissive open-weight licenses.

‍

What is the best alternative to MiniMax M3 and Qwen 3.7 Max?

It depends on the need. DeepSeek's V4 family is another strong open-weight option. Western flagships such as Claude Opus, GPT-5.5, and Gemini 3.1 Pro offer frontier capability at a higher cost. Qwen 3.7 Plus is the option for multimodal agent work. Several of these can be compared directly on Overchat AI.

MiniMax M3 vs Qwen 3.7 Max: A Head-to-Head Comparison

Table of Contents

TL;DR

MiniMax M3 vs Qwen 3.7 Max at a Glance

Key Differences

Who Each Model Is Best For

What Is MiniMax M3?

Architecture and Training Approach

Key Capabilities

Pricing and Availability

What Is Qwen 3.7 Max?

Architecture and Training Approach

Key Capabilities

Pricing and Availability

Release Timeline

MiniMax M3 Release History

Qwen 3.7 Max Release History

Qwen 3.7 Max vs Qwen 3.7 Plus

Benchmark Performance

Real-World Testing

Testing Methodology

Evaluation Criteria

Test 1: Hard Algorithmic Coding

MiniMax M3 Results

Qwen 3.7 Max Results

Winner

Test 2: Complex Math and Reasoning

Prompt

MiniMax M3 Results

Qwen 3.7 Max Results

Winner

Test 3: Long-Form Writing

Prompt

MiniMax M3 Results

Qwen 3.7 Max Results

Winner

Strengths and Weaknesses

Where MiniMax M3 Performs Best

Where Qwen 3.7 Max Performs Best

Which Model Should You Choose?

Best Model for Coding

Best Model for Math

Best Model for Writing

Best Model for Business Use

Bottom Line

FAQ

Is MiniMax M3 better than Qwen 3.7 Max?

Which model is better for coding: MiniMax M3 or Qwen 3.7 Max?

Which model performs better on math and reasoning tasks?

How does MiniMax M3 compare to Qwen 3.7 Plus?

Is Qwen 3.7 Max worth upgrading from Qwen 3.7 Plus?

Which model has the larger context window?

Which model is faster for real-world tasks?

Which model produces better long-form writing?

Can MiniMax M3 and Qwen 3.7 Max use tools and agents?

Which model is more cost-effective?

What are the biggest differences between MiniMax M3 and Qwen 3.7 Max?

Is MiniMax M3 open source?

Is Qwen 3.7 Max open source?

What is the best alternative to MiniMax M3 and Qwen 3.7 Max?