-
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper • 2402.13753 • Published • 116 -
Data Engineering for Scaling Language Models to 128K Context
Paper • 2402.10171 • Published • 25 -
LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration
Paper • 2402.11550 • Published • 19 -
The What, Why, and How of Context Length Extension Techniques in Large Language Models -- A Detailed Survey
Paper • 2401.07872 • Published • 2
Collections
Discover the best community collections!
Collections including paper arxiv:2401.06951
-
LCM-LoRA: A Universal Stable-Diffusion Acceleration Module
Paper • 2311.05556 • Published • 87 -
LongAlign: A Recipe for Long Context Alignment of Large Language Models
Paper • 2401.18058 • Published • 24 -
Efficient Tool Use with Chain-of-Abstraction Reasoning
Paper • 2401.17464 • Published • 21 -
Transfer Learning for Text Diffusion Models
Paper • 2401.17181 • Published • 17
-
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Paper • 2401.10774 • Published • 59 -
APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decoding
Paper • 2401.06761 • Published • 1 -
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
Paper • 2401.02669 • Published • 17 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 60
-
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models
Paper • 2401.06951 • Published • 26 -
Extending LLMs' Context Window with 100 Samples
Paper • 2401.07004 • Published • 16 -
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon
Paper • 2401.03462 • Published • 28 -
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry
Paper • 2402.04347 • Published • 15
-
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
Paper • 2401.00448 • Published • 30 -
Improving Text Embeddings with Large Language Models
Paper • 2401.00368 • Published • 82 -
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models
Paper • 2401.06951 • Published • 26 -
The Unreasonable Ineffectiveness of the Deeper Layers
Paper • 2403.17887 • Published • 82
-
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models
Paper • 2401.06951 • Published • 26 -
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Paper • 2401.01325 • Published • 27 -
Extending LLMs' Context Window with 100 Samples
Paper • 2401.07004 • Published • 16 -
LongAlign: A Recipe for Long Context Alignment of Large Language Models
Paper • 2401.18058 • Published • 24
-
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search
Paper • 2408.08152 • Published • 61 -
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition
Paper • 2402.15220 • Published • 20 -
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 56 -
Simple linear attention language models balance the recall-throughput tradeoff
Paper • 2402.18668 • Published • 20
-
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 150 -
YaRN: Efficient Context Window Extension of Large Language Models
Paper • 2309.00071 • Published • 79 -
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon
Paper • 2401.03462 • Published • 28 -
Extending LLMs' Context Window with 100 Samples
Paper • 2401.07004 • Published • 16
-
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon
Paper • 2401.03462 • Published • 28 -
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
Paper • 2305.07185 • Published • 10 -
YaRN: Efficient Context Window Extension of Large Language Models
Paper • 2309.00071 • Published • 79 -
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
Paper • 2401.02669 • Published • 17
-
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 150 -
SparQ Attention: Bandwidth-Efficient LLM Inference
Paper • 2312.04985 • Published • 40 -
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
Paper • 2401.04658 • Published • 27 -
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models
Paper • 2401.06951 • Published • 26
-
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper • 2402.13753 • Published • 116 -
Data Engineering for Scaling Language Models to 128K Context
Paper • 2402.10171 • Published • 25 -
LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration
Paper • 2402.11550 • Published • 19 -
The What, Why, and How of Context Length Extension Techniques in Large Language Models -- A Detailed Survey
Paper • 2401.07872 • Published • 2
-
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models
Paper • 2401.06951 • Published • 26 -
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Paper • 2401.01325 • Published • 27 -
Extending LLMs' Context Window with 100 Samples
Paper • 2401.07004 • Published • 16 -
LongAlign: A Recipe for Long Context Alignment of Large Language Models
Paper • 2401.18058 • Published • 24
-
LCM-LoRA: A Universal Stable-Diffusion Acceleration Module
Paper • 2311.05556 • Published • 87 -
LongAlign: A Recipe for Long Context Alignment of Large Language Models
Paper • 2401.18058 • Published • 24 -
Efficient Tool Use with Chain-of-Abstraction Reasoning
Paper • 2401.17464 • Published • 21 -
Transfer Learning for Text Diffusion Models
Paper • 2401.17181 • Published • 17
-
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search
Paper • 2408.08152 • Published • 61 -
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition
Paper • 2402.15220 • Published • 20 -
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 56 -
Simple linear attention language models balance the recall-throughput tradeoff
Paper • 2402.18668 • Published • 20
-
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Paper • 2401.10774 • Published • 59 -
APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decoding
Paper • 2401.06761 • Published • 1 -
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
Paper • 2401.02669 • Published • 17 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 60
-
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 150 -
YaRN: Efficient Context Window Extension of Large Language Models
Paper • 2309.00071 • Published • 79 -
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon
Paper • 2401.03462 • Published • 28 -
Extending LLMs' Context Window with 100 Samples
Paper • 2401.07004 • Published • 16
-
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models
Paper • 2401.06951 • Published • 26 -
Extending LLMs' Context Window with 100 Samples
Paper • 2401.07004 • Published • 16 -
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon
Paper • 2401.03462 • Published • 28 -
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry
Paper • 2402.04347 • Published • 15
-
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon
Paper • 2401.03462 • Published • 28 -
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
Paper • 2305.07185 • Published • 10 -
YaRN: Efficient Context Window Extension of Large Language Models
Paper • 2309.00071 • Published • 79 -
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
Paper • 2401.02669 • Published • 17
-
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
Paper • 2401.00448 • Published • 30 -
Improving Text Embeddings with Large Language Models
Paper • 2401.00368 • Published • 82 -
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models
Paper • 2401.06951 • Published • 26 -
The Unreasonable Ineffectiveness of the Deeper Layers
Paper • 2403.17887 • Published • 82
-
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 150 -
SparQ Attention: Bandwidth-Efficient LLM Inference
Paper • 2312.04985 • Published • 40 -
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
Paper • 2401.04658 • Published • 27 -
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models
Paper • 2401.06951 • Published • 26