Join our daily and weekly newsletter for the latest updates and exclusive content on industry-leading AI coverage. learn more
Unfortunately Googlerelease its latest flagship language model Gemini 2.5 Proburied in Studio Ghibli AI Image Storm That sucks air out of the AI space. Perhaps worried about its previous failed release, Google Propose carefully It is “our smartest AI model”, not the method of other AI labs, which introduces its new model as the best in the world.
However, actual experimental experiments show that the Gemini 2.5 Pro is indeed impressive and is currently probably the best inference model. This opens the way for many new applications and may put Google at the forefront of generating AI competitions.

A novel with good coding function
The excellent feature of the Gemini 2.5 Pro is its long context window and output length. The model can process up to 1 million tokens (2 million are coming soon), allowing multiple long documents and entire code repositories to be put into the prompts if necessary. For other Gemini models, the model also has an output limit of 64,000 tokens instead of about 8,000.
Long context windows also allow for extended conversations, as every interaction with the inference model generates tens of thousands of tokens, especially when it comes to code, images, and videos (I came across Claude 3.7 sonnet with 200,000 tocking context windows).
For example, software engineer Simon Willison created a new feature for his website using Gemini 2.5 Pro. Willison said in his blog“It crunched throughout my codebase, figuring out all the places I need to change – there are 18 files in total, as you can see in the generated PR. The whole project took about 45 minutes from start to end, with a range of less than three minutes per file, and I had to modify it. I’ve thrown a lot of other coding challenges, evaluated them, evaluated my own ideas, and could self-evaluate my own ideas.’
Impressive multimodal reasoning
The Gemini 2.5 Pro also has impressive inference capabilities for unstructured text, images, and videos. For example, I provide text of my recent post Sampling-based search And prompt it to create an SVG graph that describes the algorithm described in the text. Gemini 2.5 Pro correctly extracts key information from the article and creates a flowchart for the sampling and search process, even properly obtaining the conditional steps. (For reference, the same task interacted with Claude 3.7 sonnets many times and I ended up with the token limit.)

The rendered image has some visual errors (the arrow is misplaced). It can be made with a makeover, so I next tested Gemini 2.5 Pro with multi-mode prompts, making it screenshot the rendered SVG file with the code and prompts for improvements. The results are impressive. It corrects the arrows and improves the visual quality of the chart.

Other users have similar experience with multimodal prompts. For example, exist Their test, DataCamp copied the Runner Game example presented in Google’s blog, and then provided code and video recording to Gemini 2.5 Pro and prompted it to make some changes to the game code. The model can reason through visual effects, find part of the code that needs to be changed, and make correct modifications.
However, it is worth noting that, like other generative models, Gemini 2.5 Pro is prone to making mistakes, such as modifying irrelevant files and code snippets. The more precise your description, the lower the risk of the model.
Data analysis with useful inference traces
Finally, I’m in mine Classic messy data analysis test Used for inference models. I provided a file with a combination of plain text and raw HTML data that I copied and pasted from Yahoo’s different stock history pages. finance. I then prompt it to calculate the value of a portfolio that will invest $140 at the beginning of each month, evenly distributed on the grand 7 shares from January 2024 to the latest date in the document.
The model correctly identifies which stocks it has to pick from files (Amazon, Apple, Nvidia, Microsoft, Tesla, Alphabet, and Meta), extracts financial information from HTML data, and calculates the value of each investment based on the stock price at the beginning of each month. It responds to a good table, with stock and portfolio value every month and breaks down the value of the entire investment at the end of this period.

More importantly, I found the reasoning traces very useful. It is not clear whether Google reveals the original chain (COT) token for Gemini 2.5 Pro, but the reasoning traces are very detailed. You can clearly see how the model uses data inference, extracts different bits of information, and calculates the result before generating the answer. This can help troubleshoot the behavior of the model and turn it in the right direction when making mistakes.

Enterprise-level reasoning?
One concern with Gemini 2.5 Pro is that it is only available in inference mode, meaning that the model is even a very simple hint that can be answered directly even for the “thinking” process.
Gemini 2.5 Pro is currently being released previewed. Once the full model is published and pricing information is provided, we will have a better understanding of how much it will cost to build an enterprise application through the model. However, as the cost of inference continues to decline, we can expect it to become practical on a large scale.
The Gemini 2.5 Pro may not have the biggest debut, but its features require attention. Its massive context window, impressive multi-modal reasoning and detailed reasoning chains provide tangible advantages for complex enterprise workloads, from codebase refactoring to nuanced data analysis.
Source link