We introduce a new document parsing framework, MinerU2.5. The key innovation is a decoupled architecture that separates global layout analysis from local content recognition via an efficient coarse-to-fine, two-stage inference mechanism.In the first stage, the model conducts fast and holistic layout analysis on downsampled document images, capturing the global structural organization with minimal computational cost. In the second stage, guided by the detected layout, it crops key regions from the original high-resolution input and performs fine-grained recognition within local windows, thereby preserving native resolution and ensuring high accuracy.
pull down to refresh
related posts
0 sats \ 0 replies \ @035736735e 29 Sep outlawed
stackers have outlawed this. turn on wild west mode in your /settings to see outlawed content.