Papers from the NICS-EFFALG Team
updated
R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large
Model Token Routing
Paper
• 2505.21600
• Published
• 71
Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models
with Flow Matching
Paper
• 2412.17153
• Published
• 39
Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding
Paper
• 2307.15337
• Published
• 39
DiTFastAttn: Attention Compression for Diffusion Transformer Models
Paper
• 2406.08552
• Published
• 25
Can LLMs Learn by Teaching? A Preliminary Study
Paper
• 2406.14629
• Published
• 21
MoA: Mixture of Sparse Attention for Automatic Large Language Model
Compression
Paper
• 2406.14909
• Published
• 16
A Survey on Efficient Inference for Large Language Models
Paper
• 2404.14294
• Published
• 4
ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers
for Image and Video Generation
Paper
• 2406.02540
• Published
• 3
MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with
Metric-Decoupled Mixed Precision Quantization
Paper
• 2405.17873
• Published
• 3
FrameFusion: Combining Similarity and Importance for Video Token
Reduction on Large Visual Language Models
Paper
• 2501.01986
• Published
• 1
Evaluating Quantized Large Language Models
Paper
• 2402.18158
• Published
Cache-to-Cache: Direct Semantic Communication Between Large Language
Models
Paper
• 2510.03215
• Published
• 98