Mohammad Goudarzi
- wasn’t familar with large scale training, didn’t know pipeline parrelel
- background in distributed computing
- resource management and scheduling
- he might make a lot of sense to do a simple scheduling and optimization paper for heterogenous setup, noisy as well
- Resource
- Has 2 Phds now - but they are at end of PhD
- 1 has one year left
- 1 in early PhD, resource management in transport applications.
- can recruit one PhD.
- But not hired.
- Will have PhD’s next year
- In meantime, it potentially would be him
- Industry PhD (sponsor whole thing)
- 10k from industry partner per year (peanuts)
- data61 would cover remainder
Meeting between Gil and Mohammad on the 14/11:
- Explained the problem domain to him as a dual graph problem, where the level 1 graph is a mesh of connected nodes where each node depicts a computational resource and edges connect them depict network connection. The level 2 graph is directional and acyclic and is derived from the level 1 graph, where level 2 graph is effectively the ML computational graph.
- The problem could be made very complex by characterizing the nodes as gpu, cpu. ram etc. and cyclical nature where big nodes hold multiple layers. Graph edges can be described by latency, bandwidth and those too can be throttled.
- Mohammad understood well the problem domain in that perspective and is keen to initially start working on it himself and later when a student is available to have them participate.
- He asked if Pluralis has an ABN, mentioned CSIRO can be involved and provide compute resources.
- Follow up would be an in person meeting with Mohammad where we do some whiteboarding to make the problem more concrete.
Follow-up meeting Gil and Mohammad 20/11:
- We discussed in depth on a whiteboard the Decentralized Training of Foundation Models in Heterogeneous Environments, and Swarm. We talked how both are related to what Pluralis are doing and how the Pluralis end-system might look like if Swarm was extended with some paradigms from the other paper.
- Mohammod offered several ways which the collaboration with him can continue with different levels of PhD funding: (where we co-supervise in all cases)
- Partnership with him and CSIRO where Pluralis funds 12k a year for a PhD student and CSIRO the rest. Under this term the IP of work done is shared between Pluralis and CSIRO
- Mohammad mentioned we can have negotiation directly with CSIRO to try and relax this condition.
- Pluralis fully funds a PhD to work under Mohammad.
- Mohammad very enthusiastic about what Pluralis is doing. Happy to keep discussing with me without any formal agreement. Few points to check with Alex:
- Do we need some NDA agreement with Mohammad? (On our side) We are going into very technical details in our conversations. He said he’ll be happy to sign.
- From his end and monash, Mohammad said he is ok to keep engaging with us without anything formal because he is very interested in the topic.
- Mohammad did mention he plans to write a grant proposal for decentralized inference, and there will be overlap with what we are doing.