pull down to refresh

We introduce a new document parsing framework, MinerU2.5. The key innovation is a decoupled architecture that separates global layout analysis from local content recognition via an efficient coarse-to-fine, two-stage inference mechanism.
In the first stage, the model conducts fast and holistic layout analysis on downsampled document images, capturing the global structural organization with minimal computational cost. In the second stage, guided by the detected layout, it crops key regions from the original high-resolution input and performs fine-grained recognition within local windows, thereby preserving native resolution and ensuring high accuracy.
stackers have outlawed this. turn on wild west mode in your /settings to see outlawed content.