<aside>
<img src="/icons/report_gray.svg" alt="/icons/report_gray.svg" width="40px" /> NOTE: THIS PAGE OUT OF DATE NOW
Goal is research level advances into these problems. This is good because it doubles as marketing/attention harvesting. Each time we put out a paper, we will use it to drive more eyeballs to the project, which is the other core stream of work that has to happen. We must do extensive marketing on everything. Currently no other team in this space will be able to produce tier-1 papers, so this give us some core differentiability. Itβs also good to battle test our ideas in the review process no matter how noisy it is.
Ideally we would do 1. and 3. in parallel.
π° Icons means the milestone is a research-level problem and should result in a paper or papers.
π€ Icons mean public demonstrations/reports of a real run.
π Means an opensource code release
</aside>
- π° Demonstrate decentralized training in the volunteer setting, for a large model. This is one of the core technical pushes early. We must get this to work. Relax all assumptions except for the low bandwidth between nodes. The start here is to directly extend swarm parallel and the hivemind lib by making the pipeline deeper, and seeing how far this can be pushed. The other immediate direction is asynchronous sharded updates.
- π€ π Report results of a large scale model (ideally >1B) GPT-style model trained in this setting with minimal overhead on a swarm we completely control.
- π° Extend the above to a more general decentralized case. Allow for an elastic swarm and heterogenous nodes. This would likely be a paper extending our milestone 1 result.
- π° Demonstrate trustless decentralized training. This is the problem of knowing the gradients a participant has sent are correct. As soon as there are incentives, this becomes a problem as there is the clear incentive to submit gibberish to get ownership/get paid. Even in the case where the incentive is fractional model ownership, and hence submitting gibberish is dumb because you get ownership of a model that has diverged, you need this because people will try and attack other working model runs.
- The initial direction to try is Truebit, or Ora Optimistic ML style game-theory optimal verification. I do not believe exact verification like ZK-SNARKS will be performant enough anytime soon.
- π€π Report the results of a real run - of a large scale (ideally >7B) GPT-style model trained using the above techniques, matching convergence curves of a centralized run, with an acceptable level of overhead. We perform this run. No external participants.
- Demonstrate Unextractibility and Fractional Ownership. This is the problem of value accrual to trainers. The current way to approach this is as follows;
- π° Make the model unextractable from the swarm. So participants can train, but there is no way for any one participant to extract all model weights. This is a very novel research-level problem and is currently something no one else is discussing/thinking about. A way to start may be by just focusing on setting up inference in a way thatβs unextractable.
- π€ Once you have unextractibility, you create fractional ownership of the model by allocating a fixed pool of inference tokens to trainers, proportional to their contribution. They can sell these tokens to other users obviously. There needs to be a mechanism for the swarm to only perform inference when this token is present. We need to demonstrate this works/is secure. When an inference token is spent, a new one is generated by the swarm and allocated back to the training pool, who can sell again if they want.
- With the two above properties, you have a fractional model ownership by the swarm, proportional to trainer contribution. This would be very significant.
<aside>
<img src="/icons/info-alternate_gray.svg" alt="/icons/info-alternate_gray.svg" width="40px" /> At this point we can move towards implementing a working Testnet. The series A should be done and we should also have significant eyeballs on this project by now an have a clear path to a token launch. I expect many of the early model runs will be initialized by us, using VC money on the compute to create useful models.
All the way up to this point we have also only done models that train with public data. Prior to this, we should begin to think about how we are going to scale the data aggregation, as well as just the compute, which is the plan here. The main thing though is data does not become a constraint until we really start training very big models. Fineweb or similar will be enough all the way up to GPT-3.5 level. But we do need to have an answer.
</aside>