/
Gemini 3.5 Flash: Frontier AI At The Cost And Speed of Everyday Models
Last Updated:
May 25, 2026

Gemini 3.5 Flash: Frontier AI At The Cost And Speed of Everyday Models

This is the first time that a Flash-tier model, which is usually designed to be faster and cheaper, has beaten the previous generation's Pro on most coding and agentic benchmarks, with the release of Gemini 3.5 Flash.

The model offers a 1M token context window and native support for text, image, audio, video and PDF inputs. It also runs roughly four times faster than other AI models performing at a similar level. So, is now the time to drop Claude Opus 4.7? Let's take a closer look. 

If you'd prefer to test the model yourself, Gemini 3.5 Flash is already available on Overchat AI.

TLDR

  • What is the Gemini 3.5 Flash? Launched on May 19, 2026 at Google I/O 2026, this is the  first model in the Gemini 3.5 family.
  • What’s significant about this release? This Flash beats Gemini 3.1 Pro on most coding tasks.
  • What are the outstanding features? Output speed runs at roughly 278–289 tokens per second — about 4x faster than Claude Opus 4.7 and GPT-5.5. What’s more, the model is natively multimodal and understands text, image, audio, video, and PDF inputs, with a 1M token context window.
  • How much does Gemini 3.5 Flash cost to use? API pricing is $1.50 per million input tokens and $9 per million output tokens, with cached input at $0.15 — roughly one-third the cost of Claude Opus 4.7 and GPT-5.5. Note that pricing has tripled compared to Gemini 3 Flash Preview.
  • Where can I access Gemini 3.5 Flash? One of the easiest ways to access it is through Overchat AI, where you can chat with the model online.

What is Gemini 3.5 Flash?

Gemini 3.5 Flash is Google's mid-tier model. It was released on May 19, 2026 at Google I/O in Mountain View, and became the first model in the Gemini 3.5 family.

Google positions this model as frontier intelligence at Flash latency — an AI that is as smart as the previous Gemini Pro models and as fast as the previous Gemini Flash models.

This holds true in benchmarks: 3.5 Flash really beats Gemini 3.1 Pro in coding, while running roughly 4x faster at less than half the cost, which is revolutionary and opens up a million real-world use cases that were previously simply impossible, which we’ll cover below.

What's New in Gemini 3.5 Flash

Frontier performance at Flash tier

On Google's own benchmark comparisons, Gemini 3.5 Flash outperforms Gemini 3.1 Pro on Terminal-Bench 2.1, MCP Atlas, and Finance Agent v2. It's the first time a Flash-tier model has cleanly beaten the previous Pro on this many benchmarks at once. The full comparison is coming up below.

This has enabled Google to use the model to power incredibly complex AI features that are coming to web search in the coming months, such as the model building interactive visualisations to answer questions when you type a query — which was impossible at the cost and speed of the Pro models.

What this likely means for you is a much more instant user experience, where you don’t have to wait for answers for as long. More on that below.

4x faster output speed

Gemini 3.5 Flash types at roughly 278–289 tokens per second which is about 4x faster than Claude Opus 4.7 (~50 tokens/sec) and GPT-5.5 (~65 tokens/sec). On Artificial Analysis, it ranks #2 of 147 models in its price class for output speed.‍ 

If you remember, when OpenAI released GPT-4o Turbo, that model also typed very fast, but users quickly noticed that it came at a price: the quality of answers. And OpenAI eventually quietly rolled it back. Well, in this case, Google delivers all the same benefits without the downsides — you get fast replies with no impact to quality, even an improvement.

Native multimodal

Gemini 3.5 Flash understands text, image, audio, video, and PDF inputs natively (though output is still text-only).

1M token context window

The context window is 1,048,576 input tokens with a maximum output of 65,536 tokens — on par with Claude Opus 4.7. The knowledge cutoff is January 2026.

3 thinking levels: minimal, low, medium, high

For developers, Gemini 3.5 Flash introduces a new thinking level parameter. Values are minimal, low, medium, and high. This allows you to control how much the model will think before answering and balance accuracy and speed.

Gemini 3.5 Flash Benchmarks

Gemini 3.5 Flash ranks 7th out of 147 models on the Artificial Analysis Intelligence Index v4.0. This is an impressive score, especially considering that we're talking about a speed model rather than a frontier model, where all stats are prioritised for reasoning at the expense of everything else.

Category Benchmark Score
Coding SWE-Bench Pro 55.1%
Coding Terminal-Bench 2.1 76.2%
Agent & tool use MCP Atlas 83.6%
Agent & tool use Toolathlon 56.5%
Agent & tool use Finance Agent v2 57.9%
Agent & tool use GDPval-AA (Elo) 1,656
Reasoning AA Intelligence Index v4.0 55.3
Reasoning Humanity's Last Exam (no tools) 40.2%
Multimodal CharXiv Reasoning 84.2%
Multimodal MMMU-Pro 83.6%
Long context MRCR v2 @ 128k 77.3%
Speed Output tokens / sec ~278–289

How Does It Compare?

Let's see how the model performs against other options in the same weight class.

Gemini 3.5 Flash vs Gemini 3.1 Pro

We’ve already established that the new Flash is better than the previous model, but here's exactly how it outperforms it:

Benchmark Gemini 3.5 Flash Gemini 3.1 Pro
Terminal-Bench 2.1 76.2% 70.3%
MCP Atlas 83.6% 78.2%
Finance Agent v2 57.9% 43.0%
MRCR v2 @ 128k 77.3% 84.9%
Input / Output per 1M $1.50 / $9.00 $2.00 / $12.00

Terminal-Bench 2.1 measures how well a model handles real command-line work and we can see that for this type of work Gemini 3.5 Flash is better. MCP Atlas is the standard test for tool use through the Model Context Protocol. And the biggest gap of all comes on Finance Agent v2, which simulates a financial analyst pulling data, running calculations, and producing a report — Flash wins by nearly 15 points. The exception is MRCR v2 at 128k, where Gemini 3.1 Pro still leads. This tests how accurately a model can pull specific facts from a 128,000-token document.‍

Gemini 3.5 Flash vs Claude Opus 4.7

Claude Opus 4.7 is Anthropic's current flagship and the strongest model on real-world coding benchmarks like SWE-Bench Pro.

Benchmark Gemini 3.5 Flash Claude Opus 4.7
SWE-Bench Pro 55.1% 64.3%
MCP Atlas 83.6% 79.1%
AA Intelligence Index 55.3 57.3
MRCR v2 @ 128k 77.3% 46.9%
Output speed (tokens/sec) ~278–289 ~50
Input / Output per 1M $1.50 / $9.00 $5.00 / $25.00

Judging by these benchmarks, Opus 4.7 is likely to be a better coding model, but it is very likely that there will be an upset when Gemini 3.5 Pro is released. Note that there are also categories in which Gemini 3.5 Flash outperforms Opus 4.7, and when you consider the differences in cost and speed, this is remarkable.

Gemini 3.5 Flash vs GPT-5.5

GPT-5.5 currently sits at the top of the Artificial Analysis Intelligence Index and leads on the hardest reasoning benchmarks, so it outperforms the Flash on most of the tasks, except MCP tool usage (measured by Atlas):

Benchmark Gemini 3.5 Flash GPT-5.5
SWE-Bench Pro 55.1% 58.6%
Terminal-Bench 2.1 76.2% 78.2%
MCP Atlas 83.6% 75.3%
AA Intelligence Index 55.3 60.2
Output speed (tokens/sec) ~278–289 ~65
Input / Output per 1M $1.50 / $9.00 $5.00 / $30.00

How to Access Gemini 3.5 Flash

You can access Gemini 3.5 Flash in several ways, here they are in no particular order:

  • Overchat AI — you can chat with Gemini 3.5 Flash alongside Claude Opus 4.7, GPT-5.5, Qwen 3.7 Max, Kimi K2.6, and other frontier models under a single subscription.
  • Gemini app — Gemini 3.5 Flash is now the default model globally for free and paid users.
  • AI Mode in Google Search — powers the agentic search experience worldwide.
  • Google AI Studio — for developers testing the model in a browser-based playground.
  • Gemini API — for production integrations, with the model ID gemini-3.5-flash.
  • Google Antigravity — Google's agent-first development platform.
  • Vertex AI — for enterprise deployments on Google Cloud.
  • Gemini Enterprise Agent Platform — for businesses building agent-based workflows.

Gemini 3.5 Flash Pricing

Gemini 3.5 Flash costs roughly one-third of Claude Opus 4.7 ($5 / $25) and GPT-5.5 ($5 / $30) on both input and output. Here’s the API pricing:

  • Standard input: $1.50 per million tokens
  • Output: $9.00 per million tokens
  • Cached input: $0.15 per million tokens (90% discount)
  • Non-global regions: $1.65 input / $9.90 output per million tokens
  • Batch processing: 50% off standard rates

‍‍

There is one important caveat  — both input and output now cost 3x higher than the previous Flash, and Artificial Analysis also noted a 5.5x increase in total benchmark run cost when accounting for the new tokenizer and the heavier default thinking level.

Google says that this was necessary for the model to perform so well at such a speed. Nevertheless, the Flash isn’t going to compete with hyper-optimised Chinese models like Qwen 3.5 Plus when it comes to pricing.

If you want to skip the API math entirely, you can chat with Gemini 3.5 Flash on Overchat AI as part of a single subscription that also includes Claude Opus 4.7, GPT-5.5, Qwen 3.7 Max, Kimi K2.6, and more.

Gemini 3.5 Flash Limitations

There are a few limitations worth knowing about before you build on Gemini 3.5 Flash.

It’s not the best-of-the best for coding. Both Claude Opus 4.7 and GPT-5.5 are better AI coders.

Google didn’t publish a SWE-Bench Verified benchmark. This is the most accurate benchmark for measuring real-life coding performance. Perhaps the model doesn’t perform as well as the other benchmarks suggest, given the absence of its data from Google’s marketing.

When Should You Use Gemini 3.5 Flash?

If you’re trying to decide whether this AI model is right for you, here are a few scenarios in which it is preferable to other options:

  • You want very high-quality, accurate answers delivered quickly, and you’re OK with paying more for API or platform tokens.
  • You're building high-volume agentic workflows where speed and cost-per-task matter — 4x faster output and one-third the price of flagship models add up quickly at scale.
  • You want to chat using text, images, audio, video or PDFs.
  • You're using MCP tools a lot.
  • You need long-context work that doesn't require the absolute deepest reasoning.

Reach for something else when:

  • To achieve the absolute pinnacle of AI performance, you need Claude Opus 4.7 or GPT-5.5.
  • You want to chat at higher volumes and are happy to sacrifice some "smarts" for cheaper prices — Kimi K2.6 or DeepSeek V4 Pro both are cheaper.

FAQ

When was Gemini 3.5 Flash released?

Gemini 3.5 Flash was released on May 19, 2026 at Google I/O 2026, available immediately as a Generally Available release — no preview suffix, no waitlist. It's accessible through the Gemini app, AI Mode in Search, Google AI Studio, the Gemini API, Antigravity, and Vertex AI.

How much does Gemini 3.5 Flash cost?

API pricing is $1.50 per million input tokens and $9 per million output tokens, with cached input at $0.15 per million — a 90% discount. That's roughly one-third the cost of Claude Opus 4.7 and GPT-5.5. Batch processing gets an additional 50% off standard rates.

Is Gemini 3.5 Flash better than Gemini 3.1 Pro?

Gemini 3.5 Flash beats Gemini 3.1 Pro on Terminal-Bench 2.1 (76.2% vs 70.3%), MCP Atlas (83.6% vs 78.2%), and Finance Agent v2 (57.9% vs 43.0%). The exception is long-context retrieval, where Gemini 3.1 Pro still leads on MRCR v2 at 128k (84.9% vs 77.3%).

How does Gemini 3.5 Flash compare to Claude Opus 4.7?

Claude Opus 4.7 leads on SWE-Bench Pro (64.3% vs 55.1%) and AA Intelligence Index (57.3 vs 55.3). Gemini 3.5 Flash leads on MCP Atlas (83.6% vs 79.1%), long-context retrieval (77.3% vs 46.9% on MRCR v2 at 128k), output speed (about 4x faster), and price (roughly one-third the cost). The split is clean: Opus 4.7 for the hardest coding and reasoning, Gemini 3.5 Flash for high-volume agentic and multimodal work.

Bottom Line

Gemini 3.5 Flash is the first time a Flash-tier model has clearly beaten the previous Pro tier across most coding and agentic benchmarks. Here are the key takeaways:

  • Gemini 3.5 Flash runs four times faster than Claude Opus 4.7 and GPT-5.5.
  • It is roughly one-third of the price of Opus 4.7 or GPT-5.5 and understands images, videos, audio and PDF files natively.
  • Its coding performance is below that of the most advanced models, but that is to be expected.
  • At the time of writing, it was the world’s most advanced model for MCP usage, which makes sense given that Google plans to power AI search with it.
  • All of these achievements came at a cost — literally, as the price has tripled compared to the previous Flash model.

To test Gemini 3.5 Flash on your own workflows today, head to Overchat AI and start chatting with Gemini 3.5 Flash.