Join our daily and weekly newsletter for the latest updates and exclusive content on industry-leading AI coverage. learn more
A team of researchers Zoom communication A breakthrough technology has been developed that can significantly reduce the cost and computing resources required for AI systems to solve complex inference problems and potentially change how enterprises deploy AI on a large scale.
This method, call Draft chain (COD), enabling large language models (LLM) to solve problems with minimal words – using 7.6% of the text required for the current method while maintaining accuracy and even improving accuracy. The findings were published last week in a paper on research repository ARXIV.
“By reducing detailed insights and focusing on key insights, cod matches or exceeds the accuracy of COT (by thinking chains) while using only 7.6% of the tokens, greatly reducing the cost and latency of various inference tasks,” the authors wrote, led by Silei Xu, a researcher at Zoom.

How “less” changes AI reasoning without sacrificing accuracy
Cod draws inspiration from how humans solve complex problems. Instead of clarifying every detail when working through mathematical problems or logical puzzles, people usually write down basic information in abbreviation.
“When solving complex tasks, whether it’s mathematical problems, drafting papers or coding, we usually only record key information that helps us progress,” the researchers explained. “By simulating this behavior, LLMs can focus on moving towards solutions without the overhead of detailed reasoning.”
The team tested their approach in many benchmarks, including arithmetic reasoning (GSM8K), common sense reasoning (date understanding and sports understanding) and symbolic reasoning (coin flip task).
In a compelling example Claude 3.5 sonnet For sports-related issues, the COD method reduces the average output from 189.4 tokens to only 14.3 tokens (92.4% reduction), while increasing the accuracy from 93.2% to 97.3%.
Cut down corporate AI costs: Business cases of simple machine reasoning
“For enterprises that perform 1 million inference queries per month, COD can reduce costs from $3,800 (COT) to $760, savings over $3,000 a month,” AI researchers Ajith Vallath Prabhakar wrote in the analysis of the paper.
This research is at a critical moment in enterprise AI deployment. As companies increasingly integrate complex AI systems into their operations, computing costs and response times have become major obstacles to widespread adoption.
The latest inference techniques (e.g.Crib) was launched in 2022 and greatly improves AI’s ability to solve complex problems by breaking them down into step-by-step reasoning. But this approach produces a lengthy explanation, which can consume a lot of computing resources and increase response latency.
“The detailed nature of COT leads to significant computational overhead, increased latency and higher operating expenses,” Prabhakar wrote.
What is it Cod is particularly noteworthy For enterprises, it is the simplicity of their implementation. Unlike many AI advancements that require expensive model retraining or architectural changes, COD can be deployed immediately with existing models with simple and timely modifications.
“Organizations that already use COT can switch to COD with simple prompt modifications,” Prabhakar explained.
This technology may prove particularly valuable for latency-sensitive applications such as real-time customer support, mobile AI, educational tools and financial services, even small delays can significantly impact the user experience and are therefore particularly valuable.
However, industry experts believe that these implications are beyond cost savings. By making advanced AI reasoning more accessible and affordable, COD can enable access to complex AI functions in smaller organizations and resource-constrained environments.
With the continuous development of AI systems, technologies such as COD emphasize increasing attention to efficiency and original capabilities. For the rapidly changing AI landscape of the enterprise, this optimization may be as valuable as improvements to the underlying model itself.
“As AI models continue to evolve, optimizing inference efficiency will be as crucial as improving their original capabilities,” concluded Prabhakar.
Research code and data have been developed Publicly available On GitHub, organizations are allowed to implement and test the method using their own AI systems.
Source link