Blog Post

Prmagazine > News > News > OpenAI’s new GPT-4.1 models can process a million tokens and solve coding problems better than ever
OpenAI’s new GPT-4.1 models can process a million tokens and solve coding problems better than ever

OpenAI’s new GPT-4.1 models can process a million tokens and solve coding problems better than ever


Openai emission New family with new models This morning, this significantly improved coding capabilities while cutting costs, directly responding to the growing competition in the enterprise AI market.

San Francisco-based AI company introduces three models – GPT-4.1, GPT-4.1 MINI and GPT-4.1 NANO- It’s ready to use Through its API. The new lineup performs better on software engineering tasks, follows instructions more accurately, and can handle contexts of up to one million tokens, equivalent to about 750,000 words.

“GPT-4.1 delivers excellent performance at a lower cost,” Kevin Weil, chief product officer of OpenAI, said in an announcement Monday. “These models are better than the GPT-4O in almost every dimension.”

Perhaps the most important thing for corporate customers is the price: GPT-4.1 It will be 26% lower than its predecessor, and the lightweight nano version becomes Openai’s most affordable product at just 12 cents per million tokens.

https://www.youtube.com/watch?v=ka-p9ood-ce

GPT-4.1 improvement goal is the biggest pain point for enterprise developers

In a frank interview with Michelle Pokrass, the head of post-training research at Openai, emphasized that practical business applications drive the development process.

“GPT-4.1 accepts a goal: useful for developers,” Pokrass told VentureBeat. “We found that GPT-4.1 is much better in following the kinds of instructions that enterprises use in practice, which makes it easier to deploy production-ready applications.”

The benchmark results reflect the focus on real-world utilities. exist SWE bench verifiedthe GPT-4.1 score for measuring software engineering capabilities was 54.6%, an increase from the 21.4 percentage points of GPT-4O.

The following teaching improvements are particularly valuable for enterprises that develop AI agents independent of complex tasks. On Scale’s Multichellenge benchmark, the GPT-4.1 scored 38.3%, outperforming GPT-4O, 10 percentage points.

Why Openai’s three-layer model strategy challenges competitors like Google and Anthropic

The introduction of three different models at different price points involves a diversified AI market. The flagship GPT-4.1 targets complex enterprise applications, while the mini and nano versions address the use cases where speed and cost efficiency are priorities.

“Not all tasks require the maximum intelligence or the highest functionality,” Pokrass told VentureBeat. “Nano will be the workhorse model for automatic completeness, classification, data extraction or any other use case that is most concerned with speed.”

Meanwhile, Openai announced plans to demean GPT-4.5 Preview – The largest and most expensive model released just two months ago – API to July 14. The company is positioned GPT-4.1 As a more cost-effective alternative, “provides many critical features improvements or similar performance with lower costs and delays.”

This move allows OpenAI to reclaim computing resources while providing developers with the most effective alternative, which is the most expensive product, priced at $75 per million input tokens and $150 per million output tokens.

Real-world results: How Thomson Reuters, Carlyle and Wind Waves leverage GPT-4.1

Some enterprise clients who tested the model before launch reported substantial improvements in their specific domains.

Thomson Reuters Multi-file review accuracy is 17% higher when using GPT-4.1 and its legal AI assistant coconut. This enhancement is particularly valuable for complex legal workflows involving long-term documentation and has subtle relationships between terms.

Financial companies Carlyle The report said performance in extracting granular financial data from intensive documents was 50%, a key capability in investment analysis and decision-making.

Varun Mohan, CEO of Coding Tool Provider Surfing (formerly specified), sharing detailed performance metrics during announcement period.

“We found that GPT-4.1 reduced its reads of unnecessary files by 40% compared to other leading models and modified unnecessary files by 70%,” Mohan said. “The detailed information for this model is also less…the detailed description of GPT-4.1 is 50% less than other leading models.”

A million-sentence environment: What can enterprises use 8 times processing power to do

The context windows of all three models are 1 million tokens, eight times larger than the GPT-4O’s 128,000 token limit. This extended capacity allows the model to process multiple lengthy documents or entire code bases at once.

In the demonstration, OpenAI showed that GPT-4.1 analyzed 450,000 NASA server log files since 1995 and identified exception entries hidden deep in the data. This feature is particularly valuable for tasks involving large data sets such as code repositories or corporate document collections.

However, Openai acknowledged performance declines and made great investments. Inside OpenAI-MRCR Testaccuracy dropped from 84% to 8,000 tokens to 50%, with 1 million tokens.

How the corporate AI landscape shifts with competition from Google, Anthropic and Openai

As competition in the enterprise AI space intensifies, this version comes with it. Google recently launched Gemini 2.5 Pro With a comparable one million-word context window, while humans Claude 3.7 sonnet Gained attention among businesses looking for alternatives to Openai products.

Chinese AI startup DeepSeek has also recently upgraded its model, putting additional pressure on OpenAI to maintain its leadership.

“It’s so cool to see how improvements in the novel translate into better performance in specific areas such as legal analysis and extraction of financial data,” Pokrass said. “We found it crucial to test our models beyond academic benchmarks and make sure they perform well in businesses and developers.”

By releasing these models specifically through them API Openai replaces commitments to developers and enterprise customers, not Chatgpt. The company plans to gradually incorporate GPT-4.1’s capabilities into Chatgpt over time, but the main focus remains on providing businesses with powerful tools to build professional applications.

To encourage further research in the long-term processing process, OpenAI is publishing two evaluation datasets: Openai-Mrcr Used to test multiple rounds of core capabilities and Graphwalks Used to evaluate complex reasoning across documents.

For business decision makers, GPT-4.1 Family Provides a more practical and cost-effective approach to AI implementation. As organizations continue to integrate AI into their operations, these improvements in reliability, specificity, and efficiency may accelerate adoption between industries that are still weighing the cost of implementation and potential benefits.

Although competitors chase larger, more costly models, OpenAI’s strategic hub with GPT-4.1 shows that the future of AI may not be the largest model, but the most effective one. The real breakthrough may not be in the benchmark, but in bringing enterprise-class AI into more businesses than ever before.


Source link

Leave a comment

Your email address will not be published. Required fields are marked *

star360feedback