Openinfer has raised $8 million in funding to redefine AI inference edge applications.
This is Behnam Bastani and Reza Nourai’s brain kids who spent nearly a decade building and expanding AI systems together with Roblox in Meta.
Through their cutting-edge work in AI and system design, Bastani and Nourai have witnessed first-hand how deep system architectures can achieve continuous large-scale AI inference. However, today’s AI inferences are still locked behind cloud APIs and managed systems, which are barriers to low-latency, private and cost-effective edge applications. OpenInfer changed that. Bastani said in an interview with GamesBeat that it hopes to be agnostic about the type of device at the edge.
By seamlessly performing large AI models directly on the device (from SOC to cloud to cloud), these barriers can be opened up to infer the AI model without damaging performance.
meaning? Imagine a world where your phone can predict your needs in real time – translate languages immediately, enhance photos with studio quality accuracy, or power your voice assistant that really understands. By running AI inference directly on the device, users can expect faster performance, greater privacy and uninterrupted functionality. This shift eliminates lag and brings intelligent high-speed computing to your palm.
Building OpenInfer Engine: AI Agent Inference Engine

Since the company was founded six months ago, Bastani and Noy Group have formed a team
Seven, including former colleagues during the Meta period. While at Meta, they built the Oculus
Link them together to demonstrate their expertise in low-latency, high-performance system design.
Bastani previously served as construction director at Meta’s reality lab and led the team
Google focuses on mobile rendering, VR and display systems. Recently, he is very tall
Engine AI Engineering Director at Roblox. Nourai holds senior engineering positions
Graphics and games for industry leaders including Roblox, Meta, Magic Leap and Microsoft.
OpenInfer is building the OpenInfer engine, which they call the “AI Agent Inference Engine”
Designed for unrivaled performances and seamless integration.
To achieve the first goal of unrivaled performance, this is the first version of OpenInfer
Compared to DeepSeek distilled by Llama.cpp and Ollama
model. This improvement comes from targeted optimization, including streamlining processing
Quantize values, improve memory access through enhanced cache and model-specific
Adjustment – No modification to the model is required.
To achieve the second goal of seamless integration with easy deployment,
OpenInfer engine is designed as a displaced replacement, allowing users to switch endpoints
Just update the URL. Existing agents and frameworks continue to run seamlessly,
No modifications.
“Openinfer’s advancement marks a major leap for AI developers. By significantly enhancing
Inference speed, Behnam and his team are making real-time AI applications faster,
Accelerate development cycles and enable powerful models to run effectively on the edge
Devices. This opens up new possibilities for device intelligence and extends possible things
AI-driven innovation. ” said Ernestine Fu Mak, managing partner at Brave Capital
Openinfer investors.
OpenInfer is pioneering hardware-specific optimizations to drive high-performance AI inference
On Large Models – An industry leader who performs well on edge devices. By design reasoning
Rooted, they are unlocking higher throughput, lower memory usage and seamless
Execute on local hardware.
The roadmap for the future: seamless AI infers all devices
OpenInfer’s release is timely, especially in light of the recent DeepSeek news. Adoption as AI
Acceleration, reasoning has exceeded training, which is the main driver of computing demand. although
Innovations such as DeepSeek reduce the computational requirements of training and inference,
Edge-based applications are still compared to performance and efficiency due to limited processing
strength. Running large AI models on consumer devices requires new inference methods
Enable low latency, high throughput performance without relying on cloud infrastructure,
Creates important opportunities for companies to optimize their local hardware AI.
“There is no open, due to the lack of clarity, inefficient AI inference on edge devices
Hardware abstraction layer. This challenge enables deployment of large models
Computing constrained platforms is very difficult, pushing AI’s workload back
Clouds – They become expensive, slow, and depend on network conditions. epeninfer
“Renovated the fringe inferences,” said Openinfer investor Gokul Rajaram.
Angel Investor, currently a member of the board of directors of Coinbase and Pinterest.
In particular, OpenInfer has a unique position in helping silicon and hardware vendors
Inference performance on the device. Businesses that require device AI for privacy, cost or
Reliability can be leveraged by OpenInfer, as well as robotics, defense, proxy AI and
Model development.
In mobile games, OpenINFER’s technology can realize hyper-responsive gameplay in real time
Adaptive AI. Enabling current reasoning reduces latency and latency in smart games
dynamics. Players will enjoy smooth graphics, AI-powered personalized challenges, and
Every move, more immersive experiences will develop.
“At Openinfer, our vision is to seamlessly integrate AI into every surface,” Bastani said. “We aim to set open as the default inference engine for all devices to self-driving cars, laptops, mobile AI power supply in equipment, robots, etc..”
Openinfer raises $8 million in seeds for its first round of funding. Investors include
Brave Capital, Cota Capital, Essence VC, operator Stack, Stemai, co-founder and former CEO of Oculus VR, Jeff Dean, chief scientist at Google Deepmind, Aparna Chennapragada, chief product officer at Microsoft Experies and Decection and Decections, Angel Invent Invest gokul Investor Gokul Rajaram etc.
“The current AI ecosystem is dominated by some centralized players who control access rights
Inferred through cloud API and managed services. At Openinfer, we are changing that. ”
Bastani. “Our name reflects our mission: We are ‘opening’ access to AI reasoning –
Everyone can run powerful AI models locally without locking into expensive clouds
Serve. We believe that in a future, AI can access, disperse and truly master
Its users. ”
Source link