Silicon Valley bets big on ‘environments’ to train AI agents

Silicon Valley’s Big Bet on AI Training ‘Environments’

For years, executives in Big Tech have been promoting the idea of AI agents capable of using software applications autonomously to carry out tasks for users. However, if you try out current consumer-facing AI agents, like OpenAI’s ChatGPT Agent or Perplexity’s Comet, it becomes clear just how limited the technology is at this stage. Enhancing these agents into more reliable tools may require the development of new techniques that the industry is still exploring.

One such method is the creation of carefully simulated workspaces, known as reinforcement learning (RL) environments, where agents can learn how to tackle multi-step tasks. Similar to how labeled datasets powered the previous era of AI, RL environments are quickly becoming a crucial element for building robust AI agents.

TechCrunch spoke to AI researchers, founders, and investors, who said that leading AI labs are increasingly requesting RL environments. This has sparked a wave of startups eager to meet the demand and supply these environments.

The Growing Demand for RL Environments

AI experts note that the demand for RL environments is rising sharply. Jennifer Li, a general partner at Andreessen Horowitz, explained in an interview with TechCrunch that while major AI labs are building these environments internally, the complexity of the task is pushing them to also rely on third-party vendors to create high-quality environments and assessments.

This demand has led to the rise of well-funded startups like Mechanize and Prime Intellect, which are aiming to become leaders in the field. Meanwhile, data-labeling giants like Mercor and Surge are shifting their focus to RL environments to keep pace with the industry’s transition from static datasets to dynamic, interactive simulations. According to The Information, some AI labs are considering investing over $1 billion in RL environments in the coming year.

Investors and founders hope that one of these emerging startups will become the “Scale AI for environments,” drawing a comparison to the $29 billion data-labeling company that played a central role in the chatbot boom.

However, the question remains: will RL environments be the breakthrough that propels AI to the next level?

What Exactly is an RL Environment?

In simple terms, RL environments are simulated spaces where AI agents can be trained to perform tasks they might encounter in real-world software applications. One founder described building them as “creating a very boring video game.”

For example, an RL environment might simulate a Chrome browser and assign an AI agent the task of purchasing a pair of socks from Amazon. The agent’s success is evaluated based on its performance, and if it completes the task correctly, it is rewarded.

However, even simple tasks can be tricky for AI agents. For instance, the agent could get lost navigating a website’s dropdown menus or mistakenly order too many socks. Since developers can’t anticipate all possible mistakes, the environment itself must be complex enough to handle these errors and still provide useful feedback. This makes building RL environments much more challenging than creating static datasets.

Some RL environments are very complex, allowing AI agents to use various tools, access the internet, or interact with multiple software applications to accomplish a task. Others are more focused, designed to help agents master specific functions within enterprise software.

A Crowded Field of Competitors

AI data-labeling companies, such as Scale AI, Surge, and Mercor, are working hard to keep up with the increasing demand for RL environments. These companies have significant resources and strong ties to AI labs, giving them a competitive edge. Surge, for example, reported $1.2 billion in revenue last year from contracts with AI labs like OpenAI, Google, and Meta, and is now setting up a new internal team dedicated to building RL environments.

Mercor, which is valued at $10 billion, has also teamed up with major labs like OpenAI and Meta and is focusing on creating RL environments for specialized tasks such as coding, healthcare, and law. CEO Brendan Foody told TechCrunch that the opportunity in RL environments is much larger than many realize.

Scale AI, which once dominated the data-labeling market, has faced challenges after Meta invested $14 billion in AI and poached its CEO. Since then, Scale has been losing clients like Google and OpenAI, but it is still working to adapt to the growing demand for RL environments. Chetan Rane, Scale’s head of product for agents and RL environments, explained that the company is used to pivoting quickly to meet new industry needs.

There are also newcomers like Mechanize, a startup founded just six months ago with an ambitious goal: to “automate all jobs.” Co-founder Matthew Barnett said his company is starting with RL environments designed for AI coding agents, providing AI labs with a small number of high-quality training environments, rather than the more generic solutions that larger data firms are offering. To attract top talent, Mechanize is offering software engineers salaries of up to $500,000 to build RL environments—significantly higher than what contractors at Scale or Surge would earn.

Mechanize has already begun working with Anthropic on RL environments, though neither party has commented on the collaboration.

New Players Target Smaller Developers and Open-Source Communities

Prime Intellect, a startup backed by prominent AI researcher Andrej Karpathy and others, is targeting smaller developers with its RL environments. Last month, Prime Intellect launched an RL environments hub, hoping to become the “Hugging Face for RL environments.” This platform is designed to provide open-source developers with the same resources that larger AI labs have, while also offering computational resources for a fee.

Building RL environments is computationally expensive, as Will Brown, a researcher at Prime Intellect, explained. Alongside startups developing these environments, there’s also a growing opportunity for GPU providers to support the process.

“RL environments will be too large for any single company to dominate,” said Brown. “What we’re focused on is building open-source infrastructure to support this field. The service we provide is compute, which gives developers easy access to GPUs, and we’re taking a long-term view of the space.”

Will RL Environments Scale?

A big question remains: can RL environments scale the way earlier AI training methods have? Reinforcement learning has already contributed to some of the most significant breakthroughs in AI, such as OpenAI’s o1 and Anthropic’s Claude Opus 4. These advancements are especially important as the effectiveness of traditional AI improvement methods begins to plateau.

Many believe that RL environments are key to pushing AI development forward. By giving agents more complex, real-world-like training experiences, researchers hope to achieve greater progress. However, this comes at a high computational cost, which raises concerns about scalability.

Some experts remain skeptical. Ross Taylor, former AI research lead at Meta and co-founder of General Reasoning, warned that RL environments are prone to “reward hacking,” where AI agents exploit the system to earn rewards without truly completing the intended task.

“People are underestimating how difficult it is to scale RL environments,” Taylor said. “Even the best publicly available RL environments often require significant modifications to work properly.”

Sherwin Wu, OpenAI’s Head of Engineering for the API division, also expressed skepticism about RL environment startups, calling the space extremely competitive and noting how quickly AI research is evolving.

Even Karpathy, who is optimistic about RL environments’ potential, cautioned about the limits of reinforcement learning. In a post on X, he questioned whether RL could continue driving AI progress.

“I’m bullish on environments and agentic interactions but I’m bearish on reinforcement learning specifically,” Karpathy wrote.

Also Read:

Is Your AI Lying to You? OpenAI Says AI Models May Mislead and Hide Their True Intentions

Italy enacts AI law covering privacy, oversight and child access