Priority List of Papers

These are must read papers for us and may be useful to do it in the public reading group. Not in any particular order.

Training

  1. Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices
  2. CocktailSGD: Fine-tuning Foundation Models over 500Mbps Networks
  3. DeMo: Decoupled Momentum Optimization
  4. Asynchronous Local-SGD Training for Language Modeling

Inference

  1. Distributed Inference and Fine-tuning of Large Language Models Over The Internet
  2. HexGen: Generative Inference of Large-Scale Foundation Model over Heterogeneous Decentralized Environment
  3. Optimized Multi-Token Joint Decoding with Auxiliary Model for LLM Inference
  4. Any inference time scaling papers?

Model Stealing

  1. Stealing part of a production language model (ICML 2024 best paper nom.)

List of potential papers

This is a (largely unorganized) list of papers that could be read for our reading group

Decentralized Training and Inference

  1. SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient