Blog Post

Prmagazine > News > News > Silicon Valley bets big on ‘environments’ to train AI agents | TechCrunch
Silicon Valley bets big on ‘environments’ to train AI agents | TechCrunch

Silicon Valley bets big on ‘environments’ to train AI agents | TechCrunch

Big tech CEOs have been praising Artificial Intelligence Agent You can use software applications independently to complete personnel tasks. But, both Openai chatgpt proxy Or confused cometyou will soon realize how limited the technology still has. Making AI agents more robust may adopt new technologies that the industry is still discovering.

One of these techniques is to carefully simulate the workspace, where the agent can be trained on a multi-step task, called a reinforcement learning (RL) environment. Just as the tagged dataset is the last wave-driven dataset of AI, the RL environment is starting to look like a key element in proxy development.

AI researchers, founders and investors told TechCrunch that leading AI labs now require more RL environments and there are no shortage of startups that hope to provide them.

“All large AI labs have RL environments built in-house,” Jennifer Li, general partner at Andreessen Horowitz, said in an interview with TechCrunch. “But, as you can imagine, creating these datasets is very complex, so AI labs are also working on third-party vendors that can create high-quality environments and assessments. Everyone is looking for this space.”

The push to the RL environment has fostered a new class of funded startups, such as mechanization work and major intelligence, designed to lead the space. Meanwhile, large data tagging companies like Mercor and Subry say they are investing more in the RL environment to keep up with the industry’s shift from static datasets to interactive simulations. Major laboratories are also considering large investments: According to information, human leaders spend more than they can discuss $1 billion in RL environment In next year.

The hope of investors and founders is that one of these startups becomes “environmental-scale AI”, which refers to $29 billion in data label powerhouse This provides impetus for the age of chatbots.

The question is whether the RL environment will really push the forefront of AI progress.

TechCrunch Events

San Francisco
|
October 27-29, 2025

What is an RL environment?

At the heart of the RL environment is a training venue that simulates what an AI agent does in a real software application. A founder described it as Recent interviews “It’s like creating a very boring video game.”

For example, the environment can simulate a Chrome browser and buy an AI agent for a pair of socks on Amazon. The agent scores it for performance and sends a reward signal when successful (in this case, buying a pair of socks is worth a pair).

Although such tasks may sound relatively simple, in many places, AI agents can be tripped. Navigate the drop-down menu of the web page or buy too many socks may be lost. And, because developers cannot predict exactly what errors the agent will encounter, the environment itself must be strong enough to capture any unexpected behavior and still provide useful feedback. This makes the built environment more complex than static datasets.

Some environments are very robust and allow AI agents to use tools, access the Internet, or use various software applications to complete a given task. Others are narrower, designed to help agents learn specific tasks in enterprise software applications.

Although the RL environment is now a hot thing in Silicon Valley, there are many precedents to use this technology. One of Openai’s first projects in 2016 was to build “RL Gym,” very similar to the modern environmental concept. In the same year, Google DeepMind was trained Alphago – An AI system that can defeat world champions in board games, using RL technology in simulated environments.

What’s unique about today’s environment is that researchers are trying to build AI agents with large transformer models. Unlike Alphago, a specialized AI system that works in a closed environment, today’s AI agents accept more general features. Today’s AI researchers have a stronger starting point, but also a complex goal where more problems can go wrong.

A crowded field

AI data tagging companies such as Scale AI, Surge and Mercour are working to meet this moment and build an RL environment. These companies have more resources than many startups in the field and deeper relationships with AI labs.

Asprize CEO Edwin Chen told TechCrunch he has recently seen a “significant increase” in demand for RL environments in AI labs. surge – reportedly generated $1.2 billion in revenue He said last year, working with AI labs like OpenAI, Google, Anthropic and Meta, a new internal organization dedicated to building RL environments was recently sent.

Mercor is Mercor, a $10 billion startup that also works with OpenAI, Meta and Anthropic. Mercour is promoting its business to investors Build an RL environment According to the marketing materials seen by TechCrunch, for specific tasks in the field, such as coding, health care and law.

“Very few people understand how great the opportunity is really around the RL environment,” Mercor CEO Brendan Foody told TechCrunch in an interview.

Scaling AI is used to dominate data marking space, but has lost ground since Meta Invested $14 billion and hired its CEO. Since then, Google and Openai Falled Scaling AI to customers, the startup even faces competition for data tag jobs The interior of the interior. However, scale is still trying to satisfy this moment and build the environment.

“It’s just the essence of the business [Scale AI] “The agents and product owners of RL environments for Zoom AI. We did this early in our first business unit for self-driving cars. When Chatgpt came out, Zoom AI adapted to that. Now, we are adapting to new border spaces like agents and environments again.”

Some newer players focus on the environment from the very beginning. This includes mechanization work, a startup founded about six months ago with a bold goal of “automating all the work.” But co-founder Matthew Barnett told TechCrunch that his company started with the RL environment of AI coding agents.

Mechanization efforts aim to provide AI labs with a small number of powerful RL environments, rather than larger data companies, rather than creating various simple RL environments, Barnett said. At this point, startups are providing software engineers with $500,000 Salary To build an RL environment – contractors can earn large-scale AI or surge work.

Two sources familiar with the matter told TechCrunch that mechanization work has worked with anthropomorphism in the RL environment. Mechanization work and humanity declined to comment on partnerships.

Other startups bet that the RL environment will have an impact outside of the AI ​​lab. Prime Intellight – Startups powered by AI researcher Andrej Karpathy, Founders Foundation and Menlo Ventures are using the RL environment to target smaller developers.

Last month, the main intelligence was launched RL Environment Center, Aim to be the “embracing face of the RL environment”. The idea is to give open source developers access to the same resources as large AI labs and sell access to compute resources to those developers in the process.

According to major intelligence researchers, it is often more expensive to be able to be in an RL environment than previous AI training techniques. In addition to startups that build RL environments, GPU providers have another opportunity to power the process.

“The RL environment will be too big for any company to dominate,” Brown said in an interview. “Part of what we are doing is just trying to build a good open source infrastructure around it. The service we sell is compute, so using GPUs is a convenient range, but we think more about that in the long run.”

Will it expand?

The open question surrounding the RL environment is whether the technology will be similar to previous AI training methods.

Reinforcement learning powers some of the biggest leaps in AI over the past year, including Openai’s O1 and humans Claude Ops 4. These are particularly important breakthroughs, as the methods used to improve AI models are now Show reduced returns.

Environments are part of AI Labs’ larger bet on RL, and many believe they will continue to drive progress as they add more data and compute resources in the process. Some OpenAI researchers behind O1 have previously told TechCrunch that the company initially invested in AI inference models (created through investments calculated by RL and test time) because They think this will expand very good.

The best way to scale RL is not clear, but the environment seems to be a promising contender. Instead of simply rewarding chatbots for text responses, they let agents operate in simulations using tools and computers. This is much more resource-intensive, but may make more sense.

Some people will flood all these RL environments. Ross Taylor, a former head of AI research at Meta, co-founded general reasoning with others, told TechCrunch that the RL environment is prone to reward hackers. In the process, AI models cheat to get rewards without actually completing the task.

“I think people underestimate the difficulty of scaling an environment,” Taylor said. “Even the best publicly available [RL environments] It usually doesn’t work without serious modifications. ”

Openai’s API Business Engineering Director Sherwin Wu is now Recent Podcasts He is “very short” in RL environment startups. Wu pointed out that this is a very competitive space, but AI research is developing so fast that it is difficult to serve AI labs.

Karpathy, a major intellectual investor, said the RL environment is a potential breakthrough, and he is more cautious about the RL space. exist Post on Xhe raised concerns about RL’s push to out more AI progress.

“I’m bullish on environmental and proxy interactions, but I’m invisible to my bearish on reinforcement learning,” Karpathy said.

Source link

Leave a comment

Your email address will not be published. Required fields are marked *

star360feedback Recruitgo