Cursor AI Agents Solve Research-Level Math Challenge After Four Days of Autonomous Work

Written by: Mane Sachin

Published on:

Follow Us

Cursor says its autonomous AI system may have found a new solution to a research-level mathematics problem after running independently for four days.

The company revealed that the system worked on Problem Six from the First Proof challenge — a set of 10 previously unpublished problems created to test whether AI can generate original mathematical proofs. These problems are designed to reflect the level of work typically handled by researchers at institutions such as Stanford, MIT and Berkeley.

According to Cursor co-founder Michael Truell, the AI agents explored different strategies on their own, without prompts or hints. After operating continuously for four days, the system produced a proof that he claims may outperform the official human-written solution.

The result has not yet been formally verified. Cursor said it is awaiting expert review from Nikhil Srivastava, Associate Professor of Mathematics at UC Berkeley, and Daniel Spielman, Sterling Professor of Computer Science at Yale University. However, spectral graph theory expert Yang Liu has reportedly indicated the proof is likely correct. Stanford mathematician Jan Vondrák also reviewed the work and said it appeared correct to the best of his knowledge.

Cursor stated that the proof uses the Marcus–Spielman–Srivastava interlacing polynomial method, a technique from spectral graph theory. Truell said this approach differs from the existing solution and produces stronger guarantees. Specifically, the constant improves from 0.03 to 0.13, and the solution partitions the entire vertex set into light components rather than just a subset.

The company added that the experiment suggests its system, originally built for large-scale software engineering, may be capable of tackling research problems beyond coding. Cursor used the same multi-agent framework it previously described in its research on autonomous coding.

In this setup, planner agents generate and refine tasks, while worker agents execute them independently. The architecture has previously been used in extended coding experiments where agents collaborated for weeks to write millions of lines of code, including building a web browser from scratch.

Also Read: Airtable Steps Into the AI Agent Game With Superagent

Mane Sachin

My name is Sachin Mane, and I’m the founder and writer of AI Hub Blog. I’m passionate about exploring the latest AI news, trends, and innovations in Artificial Intelligence, Machine Learning, Robotics, and digital technology. Through AI Hub Blog, I aim to provide readers with valuable insights on the most recent AI tools, advancements, and developments.

For Feedback - aihubblog@gmail.com