Modeling Energy Consumption in Deep Learning Architectures Using Power Laws
In one sentence
This paper proposes a practical way to estimate the training energy consumption of deep learning architectures from their computational operations, using hardware efficiency factors modeled by power-law-like saturation curves.
Why this problem matters
Modern deep learning models are increasingly expensive to train. Architectures such as Transformers, LSTMs, and GRUs can require large amounts of computation, which translates into energy consumption, financial cost, and environmental impact.
A central difficulty is that FLOPs alone do not fully explain energy use. Two operations with similar FLOP counts may consume different amounts of energy because the GPU may execute them with different efficiency.
Method overview
The proposed approach decomposes a neural architecture into elementary operations. Each operation is associated with a computational cost and a hardware efficiency factor. These quantities are then combined to estimate the operation duration and the total energy consumption.
The core equations
1. Hardware efficiency factor
The hardware efficiency factor compares the actual throughput of an operation with the maximum theoretical throughput of the GPU:
Here, \(c\) is the computational cost in FLOPs, \(t\) is the operation duration, and \(v_{\max}\) is the maximum theoretical throughput of the GPU.
2. Duration of an operation
Once the efficiency is known, the duration of an operation can be estimated as:
3. Power-law-like efficiency model
The paper models hardware efficiency as a saturation curve depending on the operation FLOP count:
The parameter \(\eta_{\max}\) represents the maximum efficiency level, while \(k\) and \(\alpha\) control how fast the efficiency approaches this saturation level.
4. Energy estimation
Total energy is modeled as a linear combination of estimated operation durations:
What was evaluated?
The experiments vary model parameters such as number of layers, hidden dimensions, sequence length, embedding dimension, batch size, and attention heads.
Main findings
- Hardware efficiency is essential. FLOPs alone are not sufficient because different operations use the GPU with different efficiency.
- Efficiency saturates. Larger workloads often use the GPU more efficiently, but the improvement eventually reaches a plateau.
- The proposed model is accurate. For Transformer models, the reported regression achieves a high coefficient of determination (\(R^2 \approx 0.96\)). A test evaluation across heterogeneous GPU platforms reports \(R^2 \approx 0.98\).
- Model depth and dimensionality matter strongly. For Transformers, the number of layers and the model dimension have a much larger influence on energy than the number of attention heads.
- Energy-aware selection becomes possible. In the example discussed in the paper, a larger Transformer configuration is estimated to consume more than twice the energy per epoch of a smaller configuration under the same input settings.
Example: energy-aware model selection
The paper compares two Transformer configurations under the same input conditions. Model A has fewer layers and a smaller model dimension, while Model B is deeper and wider.
Why this contribution is useful
The method is useful for Green AI because it moves energy estimation earlier in the model design process. Researchers and engineers can compare architectures using estimated energy consumption, rather than relying only on accuracy, parameter count, or FLOPs.
This is particularly relevant when training resources are limited, when deployment is constrained by energy, or when sustainability objectives are part of model selection.
Limitations and future directions
- The experiments are conducted on single-GPU setups, while very large models often use distributed training.
- Multi-GPU communication, load balancing, memory bandwidth, and data sharding can affect energy consumption.
- Future extensions may include mixed precision training, more hardware platforms, and emerging architectures such as Mixture-of-Experts models.
How to cite
Mansour Zoubeirou a Mayaki and Victor Charpenay. Modeling Energy Consumption in Deep Learning Architectures Using Power Laws. ECAI 2025. DOI: 10.3233/FAIA250900.