Blog Post

Prmagazine > News > News > OpenAI’s new GPT-4.1 AI models focus on coding | TechCrunch
OpenAI’s new GPT-4.1 AI models focus on coding | TechCrunch

OpenAI’s new GPT-4.1 AI models focus on coding | TechCrunch

Openai launched a new family of models called GPT-4.1 on Monday. Yes, “4.1” – It seems like the company’s naming has not been confused yet.

There are GPT-4.1, GPT-4.1 Mini and GPT-4.1 Nano, all Openais are “Excel” under encoding and description. Can be provided through Openai’s API, but not chatgptmultimodal models have 1 million verbal context windows, meaning they can take about 750,000 words in a process (longer than “war and peace”).

GPT-4.1 builds complex programming models with the efforts of open competitors such as Google and Anthropic Ratchet. Google recently released Gemini 2.5 ProThis also has a million context windows, ranking very high on popular coding benchmarks. The same is true for humans Claude 3.7 sonnet and Chinese artificial intelligence startups DeepSeek upgraded V3.

This is the goal of many technology giants, including OpenAI, to train AI coding models that can perform complex software engineering tasks. Openai’s ambitious is to create “agent software engineers” as CFO Sarah Friar says At the technical summit in London last month. The company asserts that its future model will be able to program the entire application end-to-end and handle aspects such as quality assurance, error testing, and documentation.

GPT-4.1 is a step in this direction.

“We have optimized GPT-4.1 for real-world use based on direct feedback to improve the areas of developers’ concerns: front-end coding, fewer diplomatic editing, follow formats with reliable compliance, reliable compliance with responsive structures and orders, consistent tool usage, more tools, and more,” an OpenAi spokesperson told TechCrunch via email. “These improvements allow developers to build better agents on real-world software engineering tasks.”

Openai claims that the full GPT-4.1 model is better than it GPT-4O and GPT-4O MINI Regarding models including coding benchmarks, including SWE foundations. The GPT-4.1 Mini and Nano are said to be more efficient, at the expense of some accuracy, and Openai says the GPT-4.1 Nano is the fastest and cheapest model ever.

GPT-4.1 costs $2 per million in input tokens and $8 per million in output tokens. GPT-4.1 mini is $0.40/m input token and $1.60/m output token, GPT-4.1 Nano is $0.10/m input token and $0.40/m/m output token.

According to OpenAI’s internal test, GPT-4.1 can generate tokens at one time than GPT-4O (32,768 vs. 16,384), with a score of between 52% and 54.6% in SWE Bench (SWE Bench), a subset of human-verified SWE-SWE-SWE-BENCEN. (OpenAI noted in a blog post that some solutions to SWE Bench validation problems don’t work on their infrastructure, so the score range is.) These numbers dropped slightly under the scores of Google and Gemini 2.5 Pro (63.8%) and Claude 3.7 Sonnet (62.3%), respectively.

In a separate evaluation, OpenAI probed GPT-4.1 using Video-MME, an evaluation designed to measure the model’s ability to “understand” content in videos. GPT-4.1 achieves a high 72% accuracy in the “long, subtitle-free” video category, claiming to be OpenAi.

While GPT-4.1 scores quite well on the benchmark and has the latest “knowledge cutoff” that provides a better frame of reference for current affairs (as of June 2024), it is important to remember that even the best models today, some of the best models today are fighting the task that won’t trip over experts. For example, many Research have show Code-generated models are often unfixed and even introduce security vulnerabilities and errors.

Openai also admits that GPT-4.1 becomes less reliable (i.e., more likely to make mistakes), the more input tokens it has to process. In one of the company’s own tests (OpenAI-MRCR), the accuracy of the model dropped from 84% to 8,000 tokens, down to 50%, and used 1,024 tokens. The company said GPT-4.1 is also more “literal” than GPT-4O, and sometimes requires more specific, clear tips.

Source link

Leave a comment

Your email address will not be published. Required fields are marked *

star360feedback