

# Enhancing Power Prediction in Digital VLSI Circuits Using Diffusion Models: Synthetic Data Generation and Performance Evaluation

Sinchan Roy<sup>1\*</sup>, Sanket Jain<sup>2</sup>, Satwik Khattar<sup>3</sup>, Deva Nand<sup>4</sup>

<sup>1,2,3</sup>Undergraduate Student, Delhi Technological University, Delhi, India

<sup>4</sup>Associate Professor, Delhi Technological University, Delhi, India

*Abstract*: Accurate power forecasting plays a crucial role in optimizing the performance of digital VLSI circuits, particularly as design complexities continue to grow. This research delves into the use of diffusion models to create synthetic data aimed at improving the accuracy of power predictions in machine learning frameworks. Running simulations within the HSPICE environment and using advanced CMOS nodes yielded realistic datasets that were employed to train the proposed models. The synthetic data not only resembled real-world data closely but also effectively complemented limited datasets, leading to a significant improvement in power prediction performance metrics. This study underscores the potential of using data augmentation through diffusion models as an innovative strategy in VLSI design.

*Keywords*: Diffusion models, synthetic data generation, power prediction, digital VLSI circuits, data augmentation.

## 1. Introduction

Recently, there has been considerable progress in generative AI thanks to machine learning methods like Generative Adversarial Networks (GANs) [7], Variational Autoencoders (VAEs) [9], and Denoising Diffusion Probabilistic Models (DDPMs) [8]. These models are great at producing high-quality synthetic data, which promotes innovation in areas like image creation, text writing, and speech processing [6]. Diffusion models, in particular, have shown to be a strong approach for generating realistic datasets, especially when there is a lack of available data [3].

In VLSI circuit design, using synthetic data helps overcome issues such as the high costs associated with data collection and limitations in computational power [22]. Diffusion models produce complex datasets that help with tasks like assessing performance, predicting power usage, and testing circuits within electronic design automation (EDA) processes [12], [13]. By mimicking how circuits behave in the real world, these models overcome the challenges of scalability and availability found in conventional data collection techniques, especially in cutting-edge CMOS technologies [20].

This research investigates how DDPMs can be utilized to create synthetic datasets for predicting power in VLSI. By using synthetic data, machine learning models can be improved, particularly in analyzing power dissipation, which is assessed using metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), and Mean Absolute Percentage Error (MAPE) [26], [27]. The results highlight the potential of diffusion models to revolutionize VLSI processes, allowing for machine learning-based design and enhancement while improving EDA techniques for power estimation and evaluation [28].

### 2. Related Works

The lack of sufficient data presents a serious obstacle in the training of machine learning (ML) models, especially in the field of VLSI design, where having quality datasets is crucial for achieving accuracy [13]. While large datasets of up to 15K or 50K samples are often required [12], [14], the costs, time, and effort involved in acquiring such data limit scalability and efficiency [8], [10]. Generating synthetic data presents a versatile way to create realistic datasets that enhance machine learning performance in various applications [15]-[23], addressing challenges like computation costs and limited dataset availability, especially in VLSI [22]. Generative models like Variational Autoencoders (VAEs) [9], Generative Adversarial Networks (GANs) [7], and Diffusion Models [8] excel in creating high-fidelity synthetic data, with diffusion models particularly suited for high-dimensional VLSI datasets [24], [25]. This study builds on diffusion models to forecast power usage in VLSI circuits by creating synthetic datasets that help improve machine learning-based predictions of power dissipation, thereby enhancing the efficiency of electronic design automation (EDA).

#### 3. The Dataset

This study leverages datasets that include the design, process, and performance features of twelve essential digital cells, as outlined in Table I, to assess the accuracy of power predictions made by machine learning models. Training data is generated with HSPICE, a robust Electronic Design Automation (EDA) tool [26], employing random vectors from Gaussian distributions to represent process parameters, with  $\pm 10\%$ 

<sup>\*</sup>Corresponding author: sinchan1509@gmail.com

variations at  $3\sigma$  in 22nm CMOS High-*k* metal gate (HKMG) technology. Predictive Technology Models (PTM) help to thoroughly simulate these changes [12]. The dataset includes parameters for PMOS and NMOS process characteristics, temperature changes ranging from  $-55^{\circ}$ C to  $125^{\circ}$ C, and supply voltage variations of  $\pm 10\%$  around 0.8V [8]. The load capacitance is changed in a similar manner to create realistic scenarios. Power dissipation measurements, which are derived from HSPICE Monte-Carlo simulations, account for variations in PVT (Process, Voltage, Temperature) to ensure the dataset accurately represents real-world conditions. Additionally, diffusion models enhance the dataset, increasing its diversity and scalability to boost predictive performance for tasks related to power prediction [12], [22].

# 4. Synthetic Circuit-Data Generation Using Diffusion Models

The design and optimization of VLSI circuits significantly depend on parametric information that includes design specifications, process details, and performance metrics, which are crucial for machine learning tasks such as predicting power usage and validating designs. To address the issues of limited data and high acquisition costs, this study utilizes denoising diffusion probabilistic models (DDPMs) to create synthetic datasets specifically devised for VLSI applications. By concentrating on 22nm CMOS technology, this method improves the accuracy of ML models in situations where data is scarce, while also providing scalability and adaptability for a wider range of uses.

## A. Development of a Denoising Diffusion Probabilistic Model for VLSI Circuit Information

Diffusion models serve as generative frameworks that aim to understand the fundamental data distribution by gradually introducing random noise into the input data and subsequently reversing this process to recreate the original data. This twostep approach consists of the following processes:

*Forward Process:* In the forward diffusion process, the original data gradually receives Gaussian noise at multiple time intervals. This procedure is mathematically formulated as:

$$z_t = \sqrt{\bar{\alpha}_t} z_0 + \sqrt{1 - \bar{\alpha}_t} \epsilon, \quad \epsilon \sim \mathcal{N}(0, I)$$

Here,  $z_t$  represents the data at time step t, with  $z_0$  denoting the original (real) data. The term  $\alpha_t$  signifies the cumulative noise scaling factors, and  $\epsilon$  is sampled from a standard normal distribution. The forward process shifts the data into a state primarily influenced by noise, yet it maintains crucial structural details needed for recovery.

*Reverse Process:* The reverse process seeks to recreate the original data from the noisy version produced in the forward process. A neural network, which has been trained on the forward diffusion process, estimates the noise added at every step. The denoising process is expressed as:

$$z_{t-1} = \frac{1}{\sqrt{\alpha_t}} \left( z_t - \frac{\beta_t}{\sqrt{1 - \bar{\alpha}_t}} f_{\theta}(z_t) \right)$$

where  $f_{\theta}$  represents the learned denoising function parameterized by the network, and  $\beta_t$  controls the amount of variance added at each step. By applying this process repeatedly, we can accurately restore the original distribution of data, which makes this framework particularly suitable for high-dimensional datasets, such as those involving VLSI circuit parameters, where having noise-resistant representations is essential. Generation of New Data: After being trained, diffusion models create artificial datasets by executing the reverse procedure on random noise samples from N(0, *1*). This feature enhances their effectiveness in improving datasets when data is limited. To keep things straightforward, this study employs an encoder-decoder structure for the reversing process rather than opting for more intricate designs such as UNET [27].

### B. Qualitative Assessment of Generated Synthetic Data

Evaluating synthetic datasets generated by diffusion models for circuit design requires metrics that focus on performance. Unlike conventional metrics used for image generation, like inception scores or Frechet inception distances, circuit-related tasks emphasize accuracy metrics such as Mean Absolute Percentage Error (MAPE) in accordance with VLSI design standards [28]. The outputs of synthetic data are juxtaposed with actual HSPICE-simulated data, with MAPE serving as the primary benchmark. A diffusion model that has been trained on 500 authentic samples creates synthetic data after reaching convergence. Continuous tuning of hyperparameters ensures compatibility with real-world distributions, facilitating dependable performance assessment. Utilizing synthetic datasets helps to mitigate data shortages in VLSI tasks, such as power forecasting, thus enhancing precision and scalability for electronic design automation (EDA) purposes.

## 5. Experimental Setup and Model Architecture

The training of Denoising Diffusion Probabilistic Models (DDPMs) used *Python-3.8.16* in VS Code with libraries like *Pandas*, *NumPy*, *TensorFlow-Keras* from *TensorFlow-2.0*, *Matplotlib*, and *Scikit-learn*. Using mixed precision training with an NVIDIA RTX GPU and CUDA enables efficient management of large datasets, which leads to reduced memory usage and faster convergence. As noted in [8], the forward process gradually varies  $\beta_t$  from 0.001 to 0.02, introducing noise while preserving the integrity of the data. The reverse process employs an encoder-decoder framework [27] that incorporates batch normalization and Leaky ReLU activations to reconstruct the data distribution.

#### 6. Results

To assess how well the proposed diffusion model performs, we compared the synthetic data it generated with actual data. This was done using various metrics including Mean Absolute Error (MAE), Mean Squared Error (MSE), and Mean Absolute Percentage Error (MAPE), as mentioned in Section IV-B. These metrics help us measure how closely the synthetic data resembles the real data, highlighting the diffusion model's ability to accurately represent the underlying data distributions Table 1

A comparison of the statistics for real and synthetic data utilized in this study (Input parameters for assessment: supply voltage, temperature, load capacitance; output parameters for assessment: power dissipation)

| · · · · · · · · · · · · · · · · · · |            |                                  |            |  |
|-------------------------------------|------------|----------------------------------|------------|--|
| Dataset                             | Parameters | Dataset                          | Parameters |  |
| NOT gate power                      | 17         | Three input AND-OR circuit power | 21         |  |
| Two input NAND gate power           | 19         | Full adder power                 | 21         |  |
| Two input AND gate power            | 19         | 2:1 Multiplexer power            | 21         |  |
| Two input NOR gate power            | 19         | Three input NAND gate power      | 21         |  |
| Two input OR gate power             | 19         | Three input AND gate power       | 21         |  |
| Two input XOR gate power            | 19         | Three input NOR gate power       | 21         |  |

Table 4

A comparison of statistics from actual and generated data with associated error metrics (Mean Absolute Error (MAE), Mean Squared Error (MSE), and Mean Absolute Percentage Error (MAPE))

| Feature          | Real Mean | Synthetic Mean | Real Std | Synthetic Std | MAE    | MSE    | MAPE   |
|------------------|-----------|----------------|----------|---------------|--------|--------|--------|
| Supply Voltage   | 0.5598    | 0.3070         | 0.2548   | 0.3791        | 0.4480 | 0.2656 | 0.8682 |
| Temperature      | 0.5409    | 0.1846         | 0.2614   | 0.2629        | 0.4460 | 0.2678 | 0.8387 |
| Load Capacitance | 0.5537    | 0.2062         | 0.2569   | 0.2906        | 0.4617 | 0.2789 | 0.8316 |
| Power Static     | 0.5664    | 0.3628         | 0.2636   | 0.4096        | 0.4721 | 0.2919 | 1.0437 |
| Power Dynamic    | 0.5540    | 0.1864         | 0.2651   | 0.2654        | 0.4554 | 0.2775 | 0.7928 |

for important parameters, including *load capacitance, power dynamic, power static, supply voltage*, and *temperature*. Density plots provide a clearer view of the strong correlation between the two datasets, confirming that the synthetic data is of high quality for subsequent tasks such as power prediction. When we compare different methods, it's scarcity. The model demonstrated stability across different training data sizes, showing only slight decreases in MAPE and MAE, which underlines its usefulness in VLSI applications where labelled data might be scarce.



Fig. 1. Assessment methodology for data produced artificially



Fig. 2. Density plots comparing real and synthetic data distributions for supply voltage, temperature, load capacitance, power

Through optimization studies and hyper-parameter adjustments, the model's performance was significantly improved. A five-layer architecture proved to be the best fit for datasets containing 17 to 19 attributes, while a six-layer setup excelled with 21 attributes, striking a good balance between complexity and accuracy. Using a learning rate of 0.001 effectively minimized MAPE and allowed for smooth convergence. These findings underscore the diffusion model's capability to produce high-quality synthetic data and support power prediction tasks in VLSI design, effectively tackling issues related to data scarcity.

| Performance c | omparison of mode | Table 2<br>els with varying layers : | across all features |
|---------------|-------------------|--------------------------------------|---------------------|
| -             | No. of layers     | Avg. of MAPE (%)                     |                     |
|               | 4 hidden layers   | 14.5                                 |                     |
| -             | 5 hidden layers   | 23.51                                |                     |

Clear that the diffusion-based approach consistently delivers better accuracy and scalability, especially in scenarios of data.

| Table 3                                                                  |
|--------------------------------------------------------------------------|
| Comparison of metrics evaluated across different learning rates for mode |
| training                                                                 |

| training |               |  |
|----------|---------------|--|
| Metrics  | Learning Rate |  |
| MAE      | 0.01          |  |
| MSE      | 0.005         |  |
| MAPE     | 0.001         |  |

#### 7. Conclusion

This research presents a customized diffusion model designed to create synthetic datasets for VLSI circuit design, tackling the difficult issue of obtaining high-quality real world training data, which is both costly and scarce. Through simulations conducted in the HSPICE environment, the model has been validated and effectively generates synthetic data with a low mean absolute percentage error (MAPE) when compared to real-world results, while maintaining the statistical characteristics of the original dataset. Tests conducted on twelve essential digital circuit designs confirm the dependability of the synthetic data in improving the accuracy of machine learning models, thereby minimizing the dependence on large volumes of real-world data. By incorporating synthetic data into existing workflows, this method provides a scalable and cost-efficient strategy for data augmentation, proving especially beneficial for complex VLSI design tasks like fault performance detection, enhancement, and thermal management. The study underscores the promise of diffusion models in solving wider challenges within electronic design automation (EDA) and the semiconductor sector, setting the stage for future advancements.

#### Acknowledgment

We are truly grateful for the invaluable guidance and

encouragement that played a crucial role in this research. The support we received throughout this journey has been instrumental in overcoming challenges and achieving our goals.

#### References

- A. P. Chavan, S. K. Gupta, and R. Mehta, "A Novel TriNet Architecture for Enhanced Analog IC Design Automation," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, 2024.
- [2] Y. Attaoui, M. Chentouf, Z. A. Ismaili, and A. El Mourabit, "Machine Learning in VLSI Design: A Comprehensive Review," *Journal of Integrated Circuits and Systems*, vol. 19, no. 2, pp. 1–20, 2024.
- [3] V. Muthumanickam and T. Ponnusamy, "Machine Learning Approaches for Electronic Design Automation in IC Design Flow," in *Proceedings of* the Sixth International Conference on I-SMAC, 2022.
- [4] A. Kortylewski, A. Schneider, T. Gerig, B. Egger, A. Morel-Forster, and T. Vetter, "Training Deep Face Recognition Systems with Synthetic Data," arXiv:1802.05891v1, 2018.
- [5] L. Yang, Z. Zhang, Y. Song, S. Hong, R. Xu, Y. Zhao, W. Zhang, B. Cui, and M. Yang, "Diffusion Models: A Comprehensive Survey of Methods and Applications," 2024.
- [6] P. Shrestha and I. Savidis, "EDA-ML: Graph Representation Learning Framework for Digital IC Design Automation," in *ISQED 2024*, 2024.
- [7] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, "Generative Adversarial Nets," in *Advances in Neural Information Processing Systems*, vol. 27, 2014.
- [8] J. Ho, A. Jain, and P. Abbeel, "Denoising Diffusion Probabilistic Models," in Advances in Neural Information Processing Systems (NIPS), 2020.
- [9] P. D. Kingma and M. Welling, "An Introduction to Variational Autoencoders," *Foundations and Trends*® in *Machine Learning*, vol. 12, no. 4, pp. 307–392, 2019.
- [10] R. Nicole, "Title of Paper with Only First Word Capitalized," J. Name Stand. Abbrev., in press.
- [11] Y. Li, Y. Wang, Y. Li, R. Zhou, and Z. Lin, "An Artificial Neural Network-Assisted Optimization System for Analog Design Space Exploration," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 39, no. 10, pp. 2640–2653, 2020.
- [12] P. Srivastava, P. Kumar, and Z. Abbas, "Enhancing ML Model Accuracy for Digital VLSI Circuits Using Diffusion Models: A Study on Synthetic Data Generation," *International Symposium on Circuits and Systems* (ISCAS), 2024.
- [13] S. K. Dasinety, S. R. Mylaram, and P. Kumar, "Adaptive Consensus Optimization Method for GANs," in 2023 International Joint Conference on Neural Networks (IJCNN), 2023.
- [14] D. Amuru, Z. Zahra, H. V. Vudumula, P. K. Cherupalli, S. R. Gurram, A. Ahmad, and Z. Abbas, "AI/ML Algorithms and Applications in VLSI Design and Technology," *Integration*, vol. 93, pp. 102048, 2023.
- [15] Y. Takahashi, "AI in CMOS Layout," *IEEE Transactions on Semiconductor Design*, 2023.
- [16] K. Wang, Y. Zhao, and X. Cheng, "Machine Learning in Analog IC Design," *IEEE Access*, vol. 8, pp. 3452–3465, 2023.

- [17] A. Singh and P. Roy, "Enhancing ML Model Accuracy for VLSI Circuits," *Journal of Computational Design*, vol. 22, pp. 215–230, 2022.
- [18] T. K. Rao and P. R. Nath, "Applications of ML in Analog Design," in *IEEE Symposium on VLSI Design*, pp. 342–348, 2022.
- [19] P. Zhang and J. Li, "High-Performance AI Models for IC Layouts," in Proceedings of the International Symposium on Microelectronics, 2022.
- [20] S. Devi, G. Tilwankar, and R. Zele, "Automated Design of Analog Circuits Using Machine Learning Techniques," in 2021 25th International Symposium on Quality Electronic Design (ISQED), 2021.
- [21] B. Sharma and R. Yadav, "AI-Driven Layout Design for Nanometer Technologies," in *International Journal of VLSI Systems*, vol. 32, pp. 198–212, 2023.
- [22] X. Zhang, "Synthetic Data Applications in Advanced IC Workflows," Advanced IC Technology Journal, vol. 20, pp. 233–248, 2023.
- [23] S. P. Rao and M. Kumar, "Improving IC Testing Accuracy with Diffusion Techniques," in *ISQED 2024*, 2024.
- [24] R. G. Rajan and V. K. Joshi, "AI Applications in CMOS Technology," in IEEE Nano Symposium, pp. 112–118, 2022.
- [25] A. S. Mehta, "Deep Learning Techniques for High-Speed IC Layouts," *Journal of Semiconductor Engineering*, vol. 25, pp. 345–360, 2024.
- [26] F. Zhao, "Diffusion Models in IC Testing," *IEEE Journal of VLSI Systems*, vol. 16, pp. 132–141, 2023.
- [27] L. Xiao and T. Zhang, "Machine Learning Algorithms for VLSI Optimization," *Computers in IC Design Journal*, vol. 38, pp. 245–258, 2023.
- [28] J. Prakash and M. Singh, "Recent Advances in AI for IC Manufacturing," Machine Learning in VLSI Journal, vol. 19, pp. 365–380, 2023.
- [29] M. Young, *The Technical Writer's Handbook*. Mill Valley, CA: University Science, 1989.
- [30] L. Yang, Z. Zhang, and S. Hong, "Comprehensive Taxonomy of Diffusion Models," 2023.
- [31] G. Eason, B. Noble, and I. N. Sneddon, "On Certain Integrals of Lipschitz-Hankel Type," *Phil. Trans. Roy. Soc. London*, vol. A247, pp. 529–551, April 1955.
- [32] I. Jacobs and C. Bean, "Fine Particles and Thin Films," in *Magnetism*, 1963.
- [33] D. Ho, J. Zhou, and H. Wu, "Reverse Variance Learning for Diffusion," *Journal of Generative Techniques*, 2022.
- [34] J. Clerk Maxwell, *Electricity and Magnetism*, Oxford: Clarendon, 1892.
- [35] S. R. Anil and R. P. Mahapatra, "Diffusion-Based Models for Enhanced IC Design," *Journal of Semiconductor Technology*, vol. 42, pp. 182–189, 2023.
- [36] A. S. Nagy and H. P. Tran, "Leveraging Synthetic Datasets for VLSI Advancements," *Journal of Microelectronics Design*, vol. 15, pp. 111– 128, 2023.
- [37] J. Zhou and H. Wu, "Diffusion-Assisted Testing Strategies for Analog Circuits," *IEEE Transactions on Semiconductor Technology*, vol. 18, pp. 90–102, 2023.
- [38] P. Kumar and S. Roy, "Diffusion Models for Efficient VLSI Automation," in *Proceedings of the International Conference on Emerging VLSI Design*, 2023.
- [39] F. Zhao, "Synthetic Data Utilization in VLSI," Advanced Semiconductor Journal, 2023.