<aside>
<img src="/icons/info-alternate_gray.svg" alt="/icons/info-alternate_gray.svg" width="40px" /> Papers are ordered by importance in each section. Start with reading first one.
</aside>
Decentralized Training and Inference
- SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient
- Decentralized Training of Foundation Models in Heterogeneous Environments
- Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices
- HexGen: Generative Inference of Large-Scale Foundation Model over Heterogeneous Decentralized Environment
- Distributed Deep Learning In Open Collaborations
- Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts
- DiLoCo: Distributed Low-Communication Training of Language Models
- FusionAI: Decentralized Training and Deploying LLMs with Massive Consumer-Level GPU
Model Stealing/Imitation
- Stealing part of a production language model (ICML 2024 best paper nom.)
- Polynomial Time Cryptanalytic Extraction of Neural Network Models
- Knockoff Nets: Stealing Functionality of Black-Box Models
- The False Promise of Imitating Proprietary LLMs
- Imitation Attacks and Defenses for Black-box Machine Translation Systems
Byzantine Gradient Robustness
- Secure Distributed Training at Scale
- Fast and Robust Distributed Learning in High Dimension
- Byzantine-Robust Decentralized Stochastic Optimization over Static and Time-Varying Networks