When you purchase through links on our site, we may earn an affiliate commission. Here’s how it works.

OpenAI announces GPT-4.1, its "smartest model for complex tasks"

openai logo

OpenAI has announced three models: GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano. These models come with massive context windows of up to 1 million tokens and a knowledge cutoff of June 2024.

The company says these models outperform recently updated GPT-4o and the GPT-4o mini, which was launched last July. GPT-4.1 is API-only for now, so you won’t be using it inside ChatGPT just yet.

Note that GPT‑4.1 will only be available via the API. In ChatGPT, many of the improvements in instruction following, coding, and intelligence have been gradually incorporated into the latest version⁠ of GPT‑4o, and we will continue to incorporate more with future releases.

Benchmark numbers show the improvements 4.1 brings. It scores 54.6% on SWE-bench Verified, up 21.4 points from GPT-4o. It hits 38.3% on MultiChallenge, an instruction-following benchmark, and sets a new record on long video understanding with a 72.0% score on the Video-MME benchmark, where models analyze videos up to an hour long with no subtitles.

OpenAI also worked with alpha partners to test how GPT-4.1 performs in real-world use cases.

Thomson Reuters tested GPT‑4.1 with CoCounsel, their legal AI assistant. Compared to GPT‑4o, they saw a 17% boost in multi-document review accuracy. This kind of work relies heavily on tracking context across multiple sources and identifying complex relationships like conflicting clauses or hidden dependencies, and GPT-4.1 delivered consistently strong performance.

Carlyle put GPT‑4.1 to work extracting financial data from long, dense documents, including Excel files and PDFs. According to their internal benchmarks, it performed 50% better than previous models in document retrieval. It was the first to reliably handle issues like needle-in-the-haystack searches, loss of information in the middle of documents, and reasoning that required connecting insights across files.

Performance is one thing, but speed matters, too. OpenAI says GPT‑4.1 returns its first token in about 15 seconds when processing 128,000 tokens and up to 30 seconds at a full million. Mini and nano are even faster.

GPT‑4.1 nano typically responds in less than 5 seconds for prompts with 128,000 input tokens. Prompt caching can help cut down latency even more while saving costs.

Image understanding also saw a noticeable jump. GPT‑4.1 mini, in particular, outperformed GPT‑4o in a variety of vision benchmarks.

  • On MMMU (which includes diagrams, charts, and maps), GPT‑4.1 mini scores 73%. That’s higher than GPT‑4.5 and far better than GPT‑4o mini’s 56%.
  • On MathVista (which tests models on visual math problems), GPT‑4.1 and GPT‑4.1 mini both reach 57%, leaving GPT‑4o mini’s 37% in the dust.
  • On CharXiv-Reasoning, where models answer questions based on scientific charts, GPT‑4.1 continues to lead.
  • On Video-MME (long videos without subtitles), GPT‑4.1 scores 72%, improving significantly over GPT‑4o’s 65%.

As for pricing:

  • GPT‑4.1 costs $2.00 per million input tokens and $8.00 for output.
  • GPT‑4.1 mini is priced at $0.40 for input and $1.60 for output.
  • GPT‑4.1 nano comes in at $0.10 input and $0.40 output.

Using prompt caching or the Batch API can bring those costs down even more, which is great for apps at scale. OpenAI is also preparing to retire GPT-4.5 Preview by July 14, 2025, citing better performance, lower latency, and lower cost from GPT-4.1.

Report a problem with article
Metro 2033 Redux
Next Article

Metro 2033 Redux giveaway goes live as studio shares update on next AAA projects

A graphical representation of programming
Previous Article

Google Docs adds 14 new programming languages to code blocks

Join the conversation!

Login or Sign Up to read and post a comment.

1 Comment - Add comment