Invited Talk
in
Workshop: 2nd Workshop on Advancing Neural Network Training : Computational Efficiency, Scalability, and Resource Optimization (WANT@ICML 2024)
Architecting and deploying compute clusters for large language models
Adam DeConinck
Abstract:
As the size of large language models and the processing needs keep on increasing, the compute infrastructure needs to adapt to be able to handle these reliably. In particular in addition to having a large number of processing units, the platform needs to provide guarantees on fabric and IO but also software strategies to schedule jobs and cache data reliably. In this work, we will show how some strategic choices on reference design definitions, combined with versatile scheduling and checkpointing strategies can help leverage the infrastructure for best performance. We will also review how scaling up to extreme scale impacts the hardware and software implementation choices for LLMs.
Chat is not available.