<aside> <img src="/icons/checkmark_gray.svg" alt="/icons/checkmark_gray.svg" width="40px" /> Decentralized training is fundamentally different. It’s a genuinely novel setting.

</aside>

In federated learning, a global model is trained across multiple decentralized nodes (often edge devices like smartphones) that hold local data samples. A federated training loop proceeds as;

  1. A global model is initialized on a central server.
  2. The model is sent to participating nodes.
  3. Each node improves the model using its local data.
  4. The nodes send their updated models or gradients back to the central server.
  5. The server aggregates these updates to construct an improved global model.

Data privacy is a central motivation. Data never leaves the local device only model updates. A central server owns the global model. Full model replica’s exist on each edge device.

In decentralized training, no node gets a full copy of the device. Instead the model is sharded across all the nodes. We don’t really care about data privacy. We have the same problem of low-bandwidth node-to-node connections.