<aside> <img src="/icons/info-alternate_gray.svg" alt="/icons/info-alternate_gray.svg" width="40px" /> Papers are ordered by importance in each section. Start with reading first one.

</aside>

Decentralized Training and Inference

  1. SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient
  2. Decentralized Training of Foundation Models in Heterogeneous Environments
  3. Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices
  4. HexGen: Generative Inference of Large-Scale Foundation Model over Heterogeneous Decentralized Environment
  5. Distributed Deep Learning In Open Collaborations
  6. Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts
  7. DiLoCo: Distributed Low-Communication Training of Language Models
  8. FusionAI: Decentralized Training and Deploying LLMs with Massive Consumer-Level GPU

Model Stealing/Imitation

  1. Stealing part of a production language model (ICML 2024 best paper nom.)
  2. Polynomial Time Cryptanalytic Extraction of Neural Network Models
  3. Knockoff Nets: Stealing Functionality of Black-Box Models
  4. The False Promise of Imitating Proprietary LLMs
  5. Imitation Attacks and Defenses for Black-box Machine Translation Systems

Byzantine Gradient Robustness

  1. Secure Distributed Training at Scale
  2. Fast and Robust Distributed Learning in High Dimension
  3. Byzantine-Robust Decentralized Stochastic Optimization over Static and Time-Varying Networks