-
Learning A Universal Crime Predictor with Knowledge-guided Hypernetworks
Paper • 2511.02336 • Published -
PersonaLive! Expressive Portrait Image Animation for Live Streaming
Paper • 2512.11253 • Published • 39 -
MinerU: An Open-Source Solution for Precise Document Content Extraction
Paper • 2409.18839 • Published • 40
Collections
Discover the best community collections!
Collections including paper arxiv:2409.18839
-
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing
Paper • 2509.22186 • Published • 146 -
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper • 2510.14528 • Published • 118 -
MinerU: An Open-Source Solution for Precise Document Content Extraction
Paper • 2409.18839 • Published • 40
-
SAM 3: Segment Anything with Concepts
Paper • 2511.16719 • Published • 131 -
Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance
Paper • 2512.08765 • Published • 133 -
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length
Paper • 2512.04677 • Published • 172 -
LongCat-Image Technical Report
Paper • 2512.07584 • Published • 23
-
TradingAgents: Multi-Agents LLM Financial Trading Framework
Paper • 2412.20138 • Published • 19 -
MinerU: An Open-Source Solution for Precise Document Content Extraction
Paper • 2409.18839 • Published • 40 -
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing
Paper • 2509.22186 • Published • 146 -
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper • 2508.03680 • Published • 137
-
MinerU: An Open-Source Solution for Precise Document Content Extraction
Paper • 2409.18839 • Published • 40 -
FAN: Fourier Analysis Networks
Paper • 2410.02675 • Published • 29 -
Differential Transformer
Paper • 2410.05258 • Published • 180 -
UniMuMo: Unified Text, Music and Motion Generation
Paper • 2410.04534 • Published • 19
-
LocalMamba: Visual State Space Model with Windowed Selective Scan
Paper • 2403.09338 • Published • 8 -
GiT: Towards Generalist Vision Transformer through Universal Language Interface
Paper • 2403.09394 • Published • 26 -
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Paper • 2402.19479 • Published • 35 -
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Paper • 2405.10300 • Published • 30
-
Learning A Universal Crime Predictor with Knowledge-guided Hypernetworks
Paper • 2511.02336 • Published -
PersonaLive! Expressive Portrait Image Animation for Live Streaming
Paper • 2512.11253 • Published • 39 -
MinerU: An Open-Source Solution for Precise Document Content Extraction
Paper • 2409.18839 • Published • 40
-
SAM 3: Segment Anything with Concepts
Paper • 2511.16719 • Published • 131 -
Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance
Paper • 2512.08765 • Published • 133 -
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length
Paper • 2512.04677 • Published • 172 -
LongCat-Image Technical Report
Paper • 2512.07584 • Published • 23
-
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing
Paper • 2509.22186 • Published • 146 -
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper • 2510.14528 • Published • 118 -
MinerU: An Open-Source Solution for Precise Document Content Extraction
Paper • 2409.18839 • Published • 40
-
TradingAgents: Multi-Agents LLM Financial Trading Framework
Paper • 2412.20138 • Published • 19 -
MinerU: An Open-Source Solution for Precise Document Content Extraction
Paper • 2409.18839 • Published • 40 -
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing
Paper • 2509.22186 • Published • 146 -
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper • 2508.03680 • Published • 137
-
MinerU: An Open-Source Solution for Precise Document Content Extraction
Paper • 2409.18839 • Published • 40 -
FAN: Fourier Analysis Networks
Paper • 2410.02675 • Published • 29 -
Differential Transformer
Paper • 2410.05258 • Published • 180 -
UniMuMo: Unified Text, Music and Motion Generation
Paper • 2410.04534 • Published • 19
-
LocalMamba: Visual State Space Model with Windowed Selective Scan
Paper • 2403.09338 • Published • 8 -
GiT: Towards Generalist Vision Transformer through Universal Language Interface
Paper • 2403.09394 • Published • 26 -
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Paper • 2402.19479 • Published • 35 -
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Paper • 2405.10300 • Published • 30