1. How are Swarm-Locked models actually going to work? Probably are going to need to define some cost to join or something. Some compute that you have to compute for a long time before you get a weight shard. Is there anything you can actually do thats useful without a weight?
  2. How to handle phase change from pretraining to post-training?
  3. Should we allow private ‘research’ runs. i.e. not likely model is actually useful, but trying to answer a technical question so that later a better model can be trained.
  4. Data contribution and incentivization is a major unexplored part of this whole thing currently. Today, the working plan is get incentivized compute working and we can partner with data providers since there are quite a few people focused just on this.
  5. How to handle training to hosting/serving transition? Yes trainers are incentivized to host, but how do we implement this in practice?
  6. How to do authorization once inference keys have been generated. Decentralized Auth is not simple.
  7. Where does the protocol (what we’re building) stop and where do model trainers/orchestrators start?
    1. For example - can the orchestrators change how gradients are communicated between nodes, or do we set one single way that we’ve figured out words for communication efficient training. Does this constrain the types of models that can be created too much?
    2. Do we fix the optimizer if we require byzantine gradient aggregation in order to get the compute verification to work? If so is this too restrictive? Should we offer a selection of optimizers that are compatible?
  8. Should we support finetuning and lightweight alteration?
Pros not to Cons not to
Removes a lot of technical work Finetuning and LoRA pretty useful today.
Can do finetunes of public weights still No ability to do value flow back from derivative models
Removes all risk categories of lightweight alteration
Slightly alters meta-objective for big runs to be maximally useful off the shelf.
  1. How do we reward early trainers more? They take more risk, because model performance is more unknown at the start.
  2. The openai guy told me the compression they're able to achieve via distillation is crazy. This means, we need a way to compress the model after it's trained if we ever want inference to even be remotely competitive.