Meet GPT-5.4 — OpenAI’s New Best AI Model

OpenAI’s GPT-5.4 is the company’s latest iteration of its flagship AI model. But you might be wondering:

‍

What improved compared to GPT-5.2?
How does it compare with models from Google, Anthropic, and others?
And is this the new model worth switching to, especially if you’re already using a capable AI like Claude Opus 4.6?

‍

GPT-5.4 is already on Overchat AI, and you can start chatting with the new model here.

‍

Read on to understand what changed.

TLDR

GPT-5.4 was released by OpenAI on March 5, 2026, positioned as the company's most capable frontier model for professional work, reasoning, coding, and agentic tasks.
The biggest new feature is native computer use — the model can operate desktops, click buttons, interact with running apps, and understand on-screen content, scoring 75% on the OSWorld-Verified benchmark (above the 72.4% human baseline).
Context window expands to 2 million tokens — double the previous GPT-4.1 ceiling — enabling processing of entire codebases, large legal documents, and full research archives.
Against GPT-5.2, factual errors dropped 33% and overall response errors dropped 18%, with the new Tool Search system reducing token usage by 47% in complex workflows.
Benchmark highlights: 83% on GDPval (knowledge work), 87.5% on investment banking modeling, and expert-level performance across real-world deliverables like spreadsheets and presentations.
The model comes in two configurations: GPT-5.4 Thinking for most users and GPT-5.4 Pro for demanding enterprise workloads.
Against competitors, GPT-5.4 slightly outperforms Gemini 3.1 Pro across most benchmarks and appears to challenge Claude Opus 4.6's long-held dominance in coding.
Available through ChatGPT paid tiers, the OpenAI API, and Overchat AI, with API pricing offset in practice by Tool Search's token-reduction features.

‍

What is GPT-5.4?

GPT-5.4 is a large language model released by OpenAI on March 5, 2026 designed primarily for professional work.

‍

OpenAI describes it as its most capable and efficient frontier model for professional tasks, combining reasoning, coding, and computer-use abilities in one system.

‍

The model introduces several major improvements:

‍

Up to 2 million tokens of context
Improved reasoning benchmarks
Large reductions in hallucinations
Better integration with external tools and software

‍

For those working with AI agents, GPT-5.4 also expands OpenAI’s push toward AI systems that can work autonomously for long periods of time. This is enabled through native computer use.

‍

The model comes in two main configurations:

‍

GPT-5.4 Thinking — for difficult tasks.
GPT-5.4 Pro — an even higher-performance version designed for most demanding workloads.

‍

Most people will be using GPT-5.4 Thinking, the Pro version is for power users and enterprises.

‍

Which is Better: GPT-5.2 vs GPT-5.4?

GPT-5.4 is a better AI model compared to GPT-5.2, as it’s an iterative upgrade.

‍

According to OpenAI benchmarks, GPT-5.4 performs at or above expert level across a wide range of real-world tasks.

‍

For example:

‍

GDPval (knowledge-work benchmark): 83%
Investment banking modelling benchmark: 87.5%
OSWorld computer-use benchmark: 75% (above human baseline)

‍

These benchmarks measure the ability of AI systems to produce real deliverables such as spreadsheets, presentations, or scheduling workflows.

‍

Compared with GPT-5.2:

‍

Factual errors decreased by 33%
Overall response errors decreased by 18%

‍

Another key improvement is efficiency.

‍

The new Tool Search system reduces token usage by 47% in complex workflows where many tools are available.

‍

GPT-5.4 Benchmarks

Below are some of the most notable benchmark results reported for GPT-5.4:

‍

Benchmark	Score	Description
GDPval (knowledge work)	83%	Matches or exceeds professionals
GPQA Diamond	92.8%	Advanced scientific reasoning
ARC-AGI-2	73.3%	Strong abstract reasoning
OSWorld Verified	75%	Surpasses human performance
SWE-Bench Pro	57.7%	Real-world software engineering

‍

GPT 5.4 features

So what makes this model the best OpenAI has ever released? Let’s talk about its main features.

‍

The Biggest New Feature: Computer Use

The most important addition in GPT-5.4 is the fact that it can now natively use computers This means the model can:

‍

Do things on the desktop
Interact with running apps
Click buttons and type commands
Understand what is currently on the screen

‍

On the OSWorld-Verified benchmark, which measures the ability to control a computer interface, GPT-5.4 achieved 75% accuracy, outperforming the human baseline of 72.4%.

‍

Long Context: 2 Million Tokens

GPT-5.4 supports up to two million tokens of context through the API and Codex platform. This is double the one million tokens supported by the previous model with the largest token window: GPT 4.1. GPT 4.1 was primarily a coding model and quickly became obsolete.

‍

Earlier GPT-5 models could handle only hundreds of thousands of tokens, not even a million.

‍

A two-million-token context enables AI systems to process:

‍

Entire codebases
Large legal documents
Full research archives

‍

The result is better long-horizon reasoning and planning.

‍

GPT 5.4 Vs.

And here’s how the new model compares against other frontier AI:

‍

GPT-5.4 vs Gemini 3.1 Pro

Google’s Gemini 3.1 Pro is one of the strongest competitors to OpenAI's GPT-5.4, particularly in reasoning and long-context tasks. However, across most benchmarks, GPT-5.4 slightly outperforms its Google competitor, and in several, it outperforms it by a large margin. Based on benchmark performance alone, GPT-5.4 is a superior model.

‍

Benchmark	GPT-5.4	Gemini 3.1 Pro	Winner
ARC-AGI-2 (abstract reasoning)	73.3%	65.2%	GPT-5.4
GPQA Diamond (PhD science)	92.8%	91.9%	GPT-5.4
OSWorld-Verified (computer use)	75.0%	72.7%	GPT-5.4
SWE-Bench Pro (software engineering)	57.7%	54.1%	GPT-5.4

‍

GPT-5.4 vs Claude Opus 4.6

Anthropic’s Claude Opus 4.6 is believed by many people to be the world’s best AI model for coding. In particular, many developers strongly prefer it to OpenAI offerings. Are the tables about to be turned? It certainly seems that way, at least according to these benchmarks:

‍

Benchmark	GPT-5.4	Claude Opus 4.6	Winner
SWE-Bench Pro (coding)	57.7%	56.9%	GPT-5.4
ARC-AGI-2 (abstract reasoning)	73.3%	62.4%	GPT-5.4
WebArena-Verified (browser agents)	67.3%	66.4%	GPT-5.4
OSWorld-Verified (computer use)	75.0%	69.8%	GPT-5.4

‍

GPT 5.4 Pricing

OpenAI also introduced new API pricing for GPT-5.4.

‍

Model	Input tokens	Output tokens
GPT-5.4	$2.50 / million	$15 / million
GPT-5.4 Pro	$30 / million	$180 / million

‍

One other thing worth keeping in mind when discussing the cost of GPT-5.4 is that there are several features that reduce the total cost. Tool Search is a feature that allows the model to selectively load information about available tools. This reduces total token usage, so while the per-token price may be higher, the difference may not be noticeable in actual usage.

‍

Frequently Asked Questions (FAQ)

‍

When was GPT-5.4 released?

GPT-5.4 was released on March 5, 2026 by OpenAI.

‍

What is GPT-5.4?

GPT-5.4 is OpenAI’s best AI model. It’s designed for professional work, automation, and agentic work. The biggest additions are native computer use, improved reasoning benchmarks, and a 1-million-token context window.

‍

Where can I access GPT-5.4?

GPT-5.4 is currently available through Overchat AI, ChatGPT paid tiers and the OpenAI API.

‍

Bottom Line

GPT-5.4 is the best AI model from OpenAI and a big step toward agentic AI systems that can perform real-world work by themselves. Compared to GPT-5.2, the new model offers:

‍