Microsoft Unveils Fara-7B, a Qwen-Based Agent Model Designed for Computer Use

Fara-7B: Microsoft’s New Computer-Use Model

Microsoft has introduced Fara-7B, its first compact language model designed to operate a computer much like a human user. The company says this 7-billion-parameter system rivals or outperforms larger agent-style models on real-time web tasks while running locally with lower delay and improved privacy.

Fara-7B interprets webpages visually and completes actions by clicking, typing, and scrolling at predicted screen positions. It avoids relying on accessibility trees or separate page-parsing mechanisms.

According to Microsoft, the model typically completes tasks in around 16 steps, substantially fewer than many comparable systems. It is trained on 145,000 synthetic interaction sequences generated with the Magentic-One framework and is based on Qwen2.5-VL-7B with supervised refinement.

The company promotes Fara-7B as a daily-use digital assistant capable of searching the web, summarizing content, filling out forms, handling accounts, booking travel, shopping online, comparing products, and locating job or property listings.

Microsoft is also launching WebTailBench, a benchmark featuring 609 real-world tasks across 11 categories. Fara-7B ranks highest among all computer-operation models in every area, including shopping, flights, hotels, dining, and more complex comparison workflows.

Deployment Options and Industry Context

Two deployment options are available. Azure Foundry allows users to run Fara-7B without downloading model files or using personal GPU hardware. More technical users may self-host the model with VLLM on their own graphics processors.

Its evaluation setup uses Playwright along with a flexible agent interface that can integrate with any compatible model. Microsoft cautions that Fara-7B is experimental and should be used in isolated environments without sensitive information.

Earlier this year, Microsoft released Phi-4-multimodal and Phi-4-mini, extending its Phi series of compact language models.

The previous month, Google DeepMind introduced the Gemini 2.5 Computer Use model, a specialized version of Gemini 2.5 Pro designed to interact directly with user interfaces. It is available in preview through the Gemini platform in Google’s development tools.

Also Read:

“HP to Lay Off Up to 6,000 Employees in AI push to Save $1 Billion”

Digital Connexion to Invest $11 Billion in Andhra Pradesh for AI-Focused Data Centres