Kareus: Joint Reduction of Dynamic and Static Energy in Large Model Training
About
The computing demand of AI is growing at an unprecedented rate, but energy supply is not keeping pace. As a result, energy has become an expensive, contended resource that requires explicit management and optimization. Although recent works have made significant progress in large model training optimization, they focus only on a single aspect of energy consumption: dynamic or static energy. We find that fine-grained kernel scheduling and frequency scaling jointly and interdependently impact both dynamic and static energy consumption. Based on this finding, we design Kareus, a training system that pushes the time--energy tradeoff frontier by optimizing both aspects. Kareus decomposes the intractable joint optimization problem into local, partition-based subproblems. It then uses a multi-pass multi-objective optimization algorithm to find execution schedules that push the time--energy tradeoff frontier. Compared to the state of the art, Kareus reduces training energy by up to 28.3% at the same training time, or reduces training time by up to 27.5% at the same energy consumption.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| LLM Training Optimization | Qwen 3 1.7B | Time Reduction0.149 | 18 | |
| LLM Training Optimization | Llama 3.2 3B | Training Time Reduction (%)12.3 | 12 | |
| Large Language Model Training Efficiency | Llama 1.7B 3.2 | Energy Reduction (Iso-Time)28.3 | 11 | |
| LLM Training | Llama 70B Emulation 3.3 | Time Reduction9.3 | 8 | |
| Large-scale model training | Llama 3.3 70B Emulation (train) | Energy Reduction (Iso-Time)15.3 | 4 |