Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2309.14322

Post-LayerNorm Is Back: Stable, ExpressivE, and Deep

Paper • 2601.19895 • Published Jan 27 • 24
Elastic Attention: Test-time Adaptive Sparsity Ratios for Efficient Transformers

Paper • 2601.17367 • Published Jan 24 • 34
Small-scale proxies for large-scale Transformer training instabilities

Paper • 2309.14322 • Published Sep 25, 2023 • 22
Decouple Searching from Training: Scaling Data Mixing via Model Merging for Large Language Model Pre-training

Paper • 2602.00747 • Published 30 days ago • 9

Hyperparameters

Small-scale proxies for large-scale Transformer training instabilities

Paper • 2309.14322 • Published Sep 25, 2023 • 22
PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization

Paper • 2306.05087 • Published Jun 8, 2023 • 7

Small-scale proxies for large-scale Transformer training instabilities

Paper • 2309.14322 • Published Sep 25, 2023 • 22

Textbooks Are All You Need II: phi-1.5 technical report

Paper • 2309.05463 • Published Sep 11, 2023 • 89
When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale

Paper • 2309.04564 • Published Sep 8, 2023 • 17
Large-Scale Automatic Audiobook Creation

Paper • 2309.03926 • Published Sep 7, 2023 • 56
The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute

Paper • 2309.11197 • Published Sep 20, 2023 • 5

A Loss Curvature Perspective on Training Instability in Deep Learning

Paper • 2110.04369 • Published Oct 8, 2021
Why Do We Need Weight Decay in Modern Deep Learning?

Paper • 2310.04415 • Published Oct 6, 2023
Small-scale proxies for large-scale Transformer training instabilities

Paper • 2309.14322 • Published Sep 25, 2023 • 22
Transformers Can Navigate Mazes With Multi-Step Prediction

Paper • 2412.05117 • Published Dec 6, 2024 • 5

Advanced and Recent Papers

Advanced and recent papers about deep learning. Please send your recommend paper to email: Tinghao.Zhang22@student.xjtlu.edu.cn

AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models

Paper • 2309.16414 • Published Sep 28, 2023 • 19
Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model

Paper • 2309.13018 • Published Sep 22, 2023 • 9
Robust Speech Recognition via Large-Scale Weak Supervision

Paper • 2212.04356 • Published Dec 6, 2022 • 51
Language models in molecular discovery

Paper • 2309.16235 • Published Sep 28, 2023 • 10

Language Modeling Is Compression

Paper • 2309.10668 • Published Sep 19, 2023 • 84
Small-scale proxies for large-scale Transformer training instabilities

Paper • 2309.14322 • Published Sep 25, 2023 • 22
Evaluating Cognitive Maps and Planning in Large Language Models with CogEval

Paper • 2309.15129 • Published Sep 25, 2023 • 7
Vision Transformers Need Registers

Paper • 2309.16588 • Published Sep 28, 2023 • 86

Post-LayerNorm Is Back: Stable, ExpressivE, and Deep

Paper • 2601.19895 • Published Jan 27 • 24
Elastic Attention: Test-time Adaptive Sparsity Ratios for Efficient Transformers

Paper • 2601.17367 • Published Jan 24 • 34
Small-scale proxies for large-scale Transformer training instabilities

Paper • 2309.14322 • Published Sep 25, 2023 • 22
Decouple Searching from Training: Scaling Data Mixing via Model Merging for Large Language Model Pre-training

Paper • 2602.00747 • Published 30 days ago • 9

A Loss Curvature Perspective on Training Instability in Deep Learning

Paper • 2110.04369 • Published Oct 8, 2021
Why Do We Need Weight Decay in Modern Deep Learning?

Paper • 2310.04415 • Published Oct 6, 2023
Small-scale proxies for large-scale Transformer training instabilities

Paper • 2309.14322 • Published Sep 25, 2023 • 22
Transformers Can Navigate Mazes With Multi-Step Prediction

Paper • 2412.05117 • Published Dec 6, 2024 • 5

Hyperparameters

Small-scale proxies for large-scale Transformer training instabilities

Paper • 2309.14322 • Published Sep 25, 2023 • 22
PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization

Paper • 2306.05087 • Published Jun 8, 2023 • 7

Advanced and Recent Papers

Advanced and recent papers about deep learning. Please send your recommend paper to email: Tinghao.Zhang22@student.xjtlu.edu.cn

AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models

Paper • 2309.16414 • Published Sep 28, 2023 • 19
Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model

Paper • 2309.13018 • Published Sep 22, 2023 • 9
Robust Speech Recognition via Large-Scale Weak Supervision

Paper • 2212.04356 • Published Dec 6, 2022 • 51
Language models in molecular discovery

Paper • 2309.16235 • Published Sep 28, 2023 • 10

Small-scale proxies for large-scale Transformer training instabilities

Paper • 2309.14322 • Published Sep 25, 2023 • 22

Language Modeling Is Compression

Paper • 2309.10668 • Published Sep 19, 2023 • 84
Small-scale proxies for large-scale Transformer training instabilities

Paper • 2309.14322 • Published Sep 25, 2023 • 22
Evaluating Cognitive Maps and Planning in Large Language Models with CogEval

Paper • 2309.15129 • Published Sep 25, 2023 • 7
Vision Transformers Need Registers

Paper • 2309.16588 • Published Sep 28, 2023 • 86

Textbooks Are All You Need II: phi-1.5 technical report

Paper • 2309.05463 • Published Sep 11, 2023 • 89
When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale

Paper • 2309.04564 • Published Sep 8, 2023 • 17
Large-Scale Automatic Audiobook Creation

Paper • 2309.03926 • Published Sep 7, 2023 • 56
The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute

Paper • 2309.11197 • Published Sep 20, 2023 • 5

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs