DeepSeek Joins OpenAI and Google in Achieving Gold at IMO 2025

DeepSeekMath-V2 Achieves Breakthrough Results in Global Competitions

DeepSeek, a research lab based in China, has introduced a new open-weight model called DeepSeekMath-V2.

According to the team, this model shows exceptional skill in mathematical theorem-proving.

It earned gold-level scores in the 2025 International Mathematics Olympiad (IMO).

The model successfully solved five out of the six IMO 2025 problems.

Clement Delangue, co-founder and CEO of Hugging Face, remarked on X that using this model is like having access to one of the world’s top mathematician-level minds for free.

He added that, to his knowledge, no chatbot or service currently offers access to a model capable of achieving an IMO 2025 gold medal performance.

Earlier in July, advanced versions of Google DeepMind’s Gemini model and an experimental reasoning model from OpenAI also reached gold status at IMO 2025.

Similar to DeepSeek’s new release, both companies’ models solved five of the six problems, becoming the first machine-learning systems to attain this level.

The IMO is widely considered the most challenging mathematics competition for high-school students worldwide.

Out of the 630 participants in IMO 2025, only 72 earned gold medals.

Advanced Reasoning Approach and Growing Industry Impact

Beyond the IMO, DeepSeekMath-V2 also ranked among the top performers in China’s most demanding national event, the China Mathematical Olympiad (CMO).

It additionally delivered near-perfect results on the undergraduate-level Putnam exam.

DeepSeek reported that in Putnam 2024, the model solved 11 out of 12 questions fully and answered the final one with only minor mistakes, scoring 118 out of 120—surpassing the best human score of 90.

The company argues that many modern machine-learning models excel at producing correct answers on benchmarks but often fall short in rigorous reasoning.

They emphasise that tasks such as theorem proving require structured, step-by-step logic, rather than simply delivering a correct final answer.

To solve this gap, DeepSeek stresses the need for systems that can evaluate and refine their own reasoning process.

They argue that this form of self-verification becomes essential when scaling test-time computing, especially for unsolved problems with no known solutions.

DeepSeek’s method trains a specialised evaluator that scores the quality of proofs rather than end results, guiding a separate proof-generation model that is rewarded only when it corrects its own errors.

To avoid the generator overfitting to its own evaluator, DeepSeek continually toughens the verification process by adding more compute and automatically labeling harder proofs, enabling both components to evolve together.

The model’s weights are available for download on Hugging Face, which Delangue described as a strong example of true openness and knowledge-sharing.

DeepSeek gained widespread attention after launching a low-cost, open-source model that rivaled leading U.S. systems, and its DeepSeek-R1 reasoning release briefly raised concerns about whether open models might challenge the commercial advantage of closed-source technologies, affecting investor sentiment around major firms like NVIDIA.

Also Read:

CloudExtel Raises ₹200 Crore to Expand Data Centre Interconnect Network

By 2030, OpenAI Could Have 220 Million Paying Users But Still Won’t Make Money: Reports