Meet dots.ocr: A New 1.7B Imaginative and prescient-Language Mannequin that Achieves SOTA Efficiency on Multilingual Doc Parsing

August 16, 2025

81

dots.ocr is an open-source vision-language transformer mannequin developed for multilingual doc format parsing and optical character recognition (OCR). It performs each format detection and content material recognition inside a single structure, supporting over 100 languages and all kinds of structured and unstructured doc sorts.

Structure

Unified Mannequin: dots.ocr combines format detection and content material recognition right into a single transformer-based neural community. This eliminates the complexity of separate detection and OCR pipelines, permitting customers to modify duties by adjusting enter prompts.
Parameters: The mannequin accommodates 1.7 billion parameters, balancing computational effectivity with efficiency for many sensible situations.
Enter Flexibility: Inputs might be picture recordsdata or PDF paperwork. The mannequin options preprocessing choices (corresponding to fitz_preprocess) for optimizing high quality on low-resolution or dense multi-page recordsdata.

Capabilities

Multilingual: dots.ocr is educated on datasets spanning greater than 100 languages, together with main world languages and fewer widespread scripts, reflecting broad multilingual help.
Content material Extraction: The mannequin extracts plain textual content, tabular information, mathematical formulation (in LaTeX), and preserves studying order inside paperwork. Output codecs embrace structured JSON, Markdown, and HTML, relying on the format and content material sort.
Preserves Construction: dots.ocr maintains doc construction, together with desk boundaries, formulation areas, and picture placements, making certain extracted information stays devoted to the unique doc.

Benchmark Efficiency

dots.ocr has been evaluated towards trendy doc AI programs, with outcomes summarized beneath:

Benchmark	dots.ocr	Gemini2.5-Professional
Desk TEDS accuracy	88.6%	85.8%
Textual content edit distance	0.032	0.055

Tables: Outperforms Gemini2.5-Professional in desk parsing accuracy.
Textual content: Demonstrates decrease textual content edit distance (indicating greater precision).
Formulation and Structure: Matches or exceeds main fashions in formulation recognition and doc construction reconstruction.

https://github.com/rednote-hilab/dots.ocr/blob/grasp/property/weblog.md

Deployment and Integration

Open-Supply: Launched beneath the MIT license, with supply, documentation, and pre-trained fashions out there on GitHub. The repository offers set up directions for pip, Conda, and Docker-based deployments.
API and Scripting: Helps versatile process configuration through immediate templates. The mannequin can be utilized interactively or inside automated pipelines for batch doc processing.
Output Codecs: Extracted outcomes are equipped in structured JSON for programmatic use, with choices for Markdown and HTML the place acceptable. Visualization scripts allow inspection of detected layouts.

Conclusion

dots.ocr offers a technical resolution for high-accuracy, multilingual doc parsing by unifying format detection and content material recognition in a single, open-source mannequin. It’s significantly fitted to situations requiring sturdy, language-agnostic doc evaluation and structured data extraction in resource-constrained or manufacturing environments.

Take a look at the GitHub Web page. Be at liberty to take a look at our GitHub Web page for Tutorials, Codes and Notebooks. Additionally, be happy to comply with us on Twitter and don’t neglect to affix our 100k+ ML SubReddit and Subscribe to our E-newsletter.

Michal Sutter is a knowledge science skilled with a Grasp of Science in Information Science from the College of Padova. With a stable basis in statistical evaluation, machine studying, and information engineering, Michal excels at reworking complicated datasets into actionable insights.

Previous articleFreedom Of Speech And The Case Of The Homicidal Hoagie

Next articleEducating the mannequin: Designing LLM suggestions loops that get smarter over time

Meet dots.ocr: A New 1.7B Imaginative and prescient-Language Mannequin that Achieves SOTA Efficiency on Multilingual Doc Parsing

Structure

Capabilities

Benchmark Efficiency

Deployment and Integration

Conclusion

An Implementation to Construct Dynamic AI Techniques with the Mannequin Context Protocol (MCP) for Actual-Time Useful resource and Instrument Integration

Microsoft AI Proposes BitNet Distillation (BitDistill): A Light-weight Pipeline that Delivers as much as 10x Reminiscence Financial savings and about 2.65x CPU Speedup

Weak-for-Robust (W4S): A Novel Reinforcement Studying Algorithm that Trains a weak Meta Agent to Design Agentic Workflows with Stronger LLMs

LEAVE A REPLY Cancel reply

Most Popular

iOS Safari safe-area/standing bar reveals strong background as a substitute of permitting content material to scroll behind it (viewport-fit=cowl + fastened header)

AT&T combines with AWS in metro, Ericsson in RAN, Azure at edge

Waymo robotaxi fails to cease for varsity bus in Austin Texas

Vector Secures $20M Mortgage For US Drone Manufacturing

Recent Comments

ABOUT US

POPULAR POSTS

iOS Safari safe-area/standing bar reveals strong background as a substitute of permitting content material to scroll behind it (viewport-fit=cowl + fastened header)

AT&T combines with AWS in metro, Ericsson in RAN, Azure at edge

Waymo robotaxi fails to cease for varsity bus in Austin Texas

POPULAR CATEGORY