DeepSeek AI has introduced DeepSeek-OCR, a new optical character recognition (OCR) system aimed at enhancing how large language models manage long-context text using advanced 2D visual mapping techniques.
This innovation leverages a vision-first strategy for compressing context, turning lengthy text into compact visual tokens. DeepSeek reports the system achieves more than 96% OCR precision at 9x to 10x compression levels and maintains around 60% accuracy even when compressing at 20x.
The system is built on two major components: DeepEncoder and DeepSeek3B-MoE-A570M. Together, they are designed to strike a balance between performance and resource usage. DeepEncoder, in particular, helps manage high-resolution inputs by significantly reducing the number of vision tokens passed to the model, avoiding GPU memory issues.
On the OmniDocBench benchmark, DeepSeek-OCR surpassed leading OCR models like GOT-OCR2.0 and MinerU2.0. It achieved better performance using fewer visual tokens, which translates to increased efficiency during processing.
According to DeepSeek, the model can handle over 200,000 pages daily using a single NVIDIA A100 GPU. When deployed across 20 nodes, its processing power scales up to a massive 33 million pages per day, highlighting its capability for enterprise-level document processing.
The company emphasized the system’s suitability for large-scale applications, including document digitization and training data preparation for AI models. It also supports a wide range of document types and formats—from multilingual text to complex visuals like charts and chemical structures.
DeepSeek believes this method marks a significant shift in how language models can improve memory use and context length. By utilizing visual inputs for compression, smaller models are able to interpret and decode efficiently, which opens doors for more scalable and lightweight LLM applications.
To encourage broader research and development, DeepSeek has released both the code and model weights for DeepSeek-OCR under an open-source license on GitHub, allowing developers and researchers to explore and contribute to this new approach.
The company sees this as a step toward redefining how vision and language modalities can be merged. With DeepSeek-OCR and its recent V3.2-Exp model, the firm is continuing its momentum toward more cost-effective, efficient long-context processing in the language model ecosystem.
Also Read:
European AI Rising Star Nexos.ai Raises €30M to Drive Enterprise AI Adoption
No More ChatGPT or Perplexity on WhatsApp as Meta Silently Bans Third-Party AI Chatbots








