shoaibmohd
's Collections
MinerU2.5: A Decoupled Vision-Language Model for Efficient
High-Resolution Document Parsing
Paper
•
2509.22186
•
Published
•
143
CommonForms: A Large, Diverse Dataset for Form Field Detection
Paper
•
2509.16506
•
Published
•
22
Automated Structured Radiology Report Generation with Rich Clinical
Context
Paper
•
2510.00428
•
Published
•
8
Extract-0: A Specialized Language Model for Document Information
Extraction
Paper
•
2509.22906
•
Published
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper
•
2510.14528
•
Published
•
112
RL makes MLLMs see better than SFT
Paper
•
2510.16333
•
Published
•
49
NVIDIA Nemotron Parse 1.1
Paper
•
2511.20478
•
Published
•
21
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
Paper
•
2511.16334
•
Published
•
93
Shakti-VLMs: Scalable Vision-Language Models for Enterprise AI
Paper
•
2502.17092
•
Published
•
3
SmolDocling: An ultra-compact vision-language model for end-to-end
multi-modal document conversion
Paper
•
2503.11576
•
Published
•
138
OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models
Paper
•
2601.21639
•
Published
•
46
DeepSeek-OCR 2: Visual Causal Flow
Paper
•
2601.20552
•
Published
•
50