Blog Post

Prmagazine > News > News > Pruna AI open sources its AI model optimization framework | TechCrunch
Pruna AI open sources its AI model optimization framework | TechCrunch

Pruna AI open sources its AI model optimization framework | TechCrunch

pruna aiIt is a European startup that has been studying compression algorithms for AI models and is working on its optimization framework Open source Thursday.

Pruna AI has been creating a framework that applies several efficiency approaches such as caching, pruning, quantization and distillation to suit a given AI model.

“We also standardize saving and loading compression models, applying a combination of these compression methods, and evaluating compression models after compression,” John Rachwan, joint local and CTO of Pruna AI, told TechCrunch.

In particular, Pruna AI’s framework can evaluate whether there is a significant mass loss after the compression model and the performance growth you get.

He added: “If I were to use a metaphor, we were similar to how we hugged the face standardized transformers and diffusers – how to call them, how to save them, load them, etc. We did the same, but for the efficiency approach.”

Large AI labs are already using various compression methods. OpenAI, for example, has been relying on distillation to create faster versions of its flagship model.

This is probably how Openai develops the GPT-4 Turbo, which is a faster version of GPT-4. same, Flux.1-Schnell The image generation model is a distilled version of Flux. 1 Model from Black Forest Laboratory.

Distillation is a technique used to extract knowledge from large AI models with “teacher-student” models. The developer sends the request to the teacher model and records the output. Sometimes the answers are compared with the dataset to see their accuracy. These outputs are then used to train a student model, which is trained to approximate the teacher’s behavior.

“For big companies, what they usually do is build these things internally. What you can find in the open source world is usually based on a single method. For example, suppose a quantitative method for LLM, or a caching method for diffusion models,” Rachwan said. “But you can’t find the tool to put all of this together to make them all easy to use and combine. That’s the greatest value Pruna brings now.”

From left to right: Rayan Nait Mazi, Bertrand Charpentier, John Rachwan, Stephan GünnemannImage source:pruna ai

While Pruna AI supports any type of model from large language models to diffusion models, speech to text models and computer vision models, the company now focuses more specifically on image and video generation models.

Some of Pruna AI’s existing users include Imagine and Light room. In addition to the open source version, Pruna AI also has enterprise products with advanced optimization capabilities, including optimization agents.

“The most exciting feature we’re about to release will be the compression agent,” Rachwan said. “Basically, you give it a model and say, ‘I want higher speeds, but don’t reduce my accuracy by more than 2%.’ Then the agent will find the best combination for you, return it for you.

Pruna AI charges its Pro version by hour. “It’s similar to how you think of a GPU when you rent it on AWS or any cloud service,” Rachwan said.

And, if your model is a key part of your AI infrastructure, you will end up saving a lot of money when reasoning on optimizing your model. For example, Pruna AI makes the Llama model eight times less loss without using its compression frame. Pruna AI hopes its customers can view its compression framework as an investment in its own payments.

A few months ago, Pruna AI raised $6.5 million in seed funding. Investors in the startup include EQT Ventures, Daphni, Motier Ventures and Kima Ventures.

Source link

Leave a comment

Your email address will not be published. Required fields are marked *

star360feedback