Scaling LLMs on Supercomputers

This research presents practical engineering solutions for training large language models efficiently on High-Performance Computing (HPC) systems based on real-world experience.

Achieved scalable training of a 7B parameter model (Teuken-7B) on the JUWELS Booster supercomputer
Developed optimized workflows to maximize computational efficiency and resource utilization
Created specialized software stacks that overcome distributed training challenges
Established best practices for multilingual model training with European language focus

These findings matter for engineering teams building AI infrastructure by providing tested solutions to common scaling bottlenecks, potentially reducing costs and accelerating development of specialized language models.

Training LLMs on HPC Systems: Best Practices from the OpenGPT-X Project