Priority List of Papers
These are must read papers for us and may be useful to do it in the public reading group. Not in any particular order.
Training
Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices
CocktailSGD: Fine-tuning Foundation Models over 500Mbps Networks
DeMo: Decoupled Momentum Optimization
Asynchronous Local-SGD Training for Language Modeling
Inference
Distributed Inference and Fine-tuning of Large Language Models Over The Internet
HexGen: Generative Inference of Large-Scale Foundation Model over Heterogeneous Decentralized Environment
- Optimized Multi-Token Joint Decoding with Auxiliary
Model for LLM Inference
- Any inference time scaling papers?
Model Stealing
Stealing part of a production language model (ICML 2024 best paper nom.)
List of potential papers
This is a (largely unorganized) list of papers that could be read for our reading group
Decentralized Training and Inference
SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient