Sarvam Releases Open-Weight Models Debuted At AI Summit, Compared With DeepSeek And Gemini

Sarvam on Friday, March 6, said it has released two of its core multilingual AI models under an open-source licence. The models were first introduced during the India-AI Impact Summit 2026 held recently, and the company has now made them publicly available.

The Indian AI startup said the models — built with 30 billion and 105 billion parameters — are designed as reasoning-focused large language models. According to a company blog post, both systems were developed from the ground up using large, high-quality datasets curated internally.

Training the models required significant computing resources. Sarvam said the work was carried out using GPUs provided under the Indian government’s ₹10,372-crore IndiaAI Mission. Infrastructure support for the project came from data centre operator Yotta, while Nvidia provided technical assistance during the development process.

Although the models were first showcased at the AI Impact Summit hosted in New Delhi last month, Sarvam has now released them for broader commercial use under the Apache 2.0 open-source licence. The model weights can be downloaded from AIKosh and Hugging Face, and the systems are also available through Sarvam’s Indus AI chatbot and its API developer platform.

In recent weeks, Sarvam has increasingly been seen as part of India’s push to build what policymakers describe as “sovereign AI.” The broader goal is to reduce reliance on foreign AI companies such as OpenAI and Anthropic by supporting the development of efficient models that better serve Indian languages and local applications.

At the same time, some observers have questioned how the idea of sovereign AI fits with open-weight models. Since these systems can be freely modified and distributed by anyone globally, the concept of sovereignty in this context remains open to debate.

Within Sarvam’s ecosystem, the two models serve different purposes. The 30B model powers the company’s conversational agent platform called Samvaad, which focuses on real-time dialogue and interactive use. The larger 105B model forms the foundation of Sarvam’s Indus AI assistant, designed to handle more complex reasoning tasks and agent-based workflows. According to the company, both models have also been optimized to run across a variety of hardware setups, including personal devices such as laptops.

Sarvam said developing these models required building capabilities across the entire AI pipeline — from data collection and model training to inference systems and product deployment. With that groundwork in place, the company said it plans to scale up further and work on larger and more capable models, including those focused on coding, agent-based applications, and multimodal conversational systems.

Inside the Technology

Technically, the two models are built using a Mixture-of-Experts (MoE) transformer architecture. This design activates only a portion of the model’s total parameters during computation, which helps lower computing costs while maintaining performance.

The 30B model supports a 32,000-token context window, making it suitable for real-time conversational applications. The larger 105B model offers a 128,000-token context window, allowing it to handle longer inputs and more complex reasoning tasks.

To improve efficiency, the 30B model uses Grouped Query Attention (GQA), which reduces the amount of memory required for KV-cache while maintaining strong performance. The 105B model uses Multi-head Latent Attention (MLA), a technique similar to that used in DeepSeek models, which further reduces memory demands during long-context inference.

The training data for both models included a mix of programming code, general web content, specialized knowledge datasets, mathematical material, and multilingual text. Sarvam noted that a significant portion of the training effort went into building a multilingual dataset covering the ten most widely spoken Indian languages.

Benchmark Results and Model Performance

During early training evaluations, the 105B model showed stronger results than the 30B version, suggesting that the model scales efficiently as its size increases.

When compared with other large language models of similar scale, Sarvam said the 105B model delivered performance comparable to pt-oss 120B and Qwen3-Next (80B) on general capability benchmarks. It also showed strong results in agentic reasoning and task completion, outperforming models such as DeepSeek R1, Gemini 2.5 Flash, and o4-mini on the Tau 2 benchmark.

However, the company noted that the model is not necessarily the strongest performer in code generation. On the SWE-Bench Verified benchmark, its results were lower than some competing models.

For the smaller 30B model, Sarvam compared its performance with Nemotron 3 Nano 30B. According to the results, Sarvam’s model performs slightly better in coding tasks and agentic reasoning benchmarks such as SWE-Bench Verified and Tau2, though it lags slightly behind in benchmarks like Live Code Bench v6 and BrowseComp.

Sarvam also said the 30B model achieves 20–40 percent higher token throughput per second than Qwen3, largely due to code and kernel optimizations.

Another area where the company claims strong performance is Indian language processing. Sarvam developed its tokenizer from scratch, designed specifically to handle all 22 scheduled Indian languages, which span 12 different scripts.

Using a metric known as the fertility score — which measures the average number of tokens needed to represent a word — the company said its tokenizer performs more efficiently than several existing open-source tokenizers when encoding Indic languages.

Safety Measures and Model Security

During the supervised fine-tuning stage, Sarvam trained both models using datasets that covered general safety scenarios as well as risks specific to India.

The training data also included adversarial prompts and jailbreak attempts discovered through automated red-teaming systems. These prompts were paired with policy-aligned, safe responses during training so the models could learn how to handle potentially harmful instructions more responsibly.

Also Read: China’s ByteDance Launches Doubao 2.0 for the ‘Agent Era’