Faculty
Jun Yang
PhD Students
Wei Wu, Lingling Jin
External Collaborators
Sheldon X.-D. Tan (UCR), Jie Chen (UCR)
Description
The evolution of microprocessors has been hindered by their increasing power consumption and the speed heat is generated on-die. High temperature impairs the processor reliability and reduces its lifetime. While heat can be removed by the cooling package, designing such package for the worst-case temperature is not cost-effective as the worst-cases are rare but the cost increases super-linearly. Therefore, a more reasonable solution is to use a less expensive package with dynamic thermal managements (DTM) that throttles the processor performance when the cooling is inadequate.
To ensure an effective control of the chip temperature, it is imperative to be able to monitor the temperature variations across the die timely and accurately. Most current techniques rely on on-chip thermal sensors, typically one or two, to report the temperature of the processor. Unfortunately, the significant variation in chip temperature both spatially and temporally exposes the limitation of the sensors since hot spots migrate with workloads. We present an alternative approach to tracking chip temperature through an OS resident software module that generates live power and thermal spectral distributions of the processor. We developed such a software thermal sensor (STS) with low overhead in a Linux system with a Pentium 4 Northwood core. The software thermal sensor offers detailed power and temperature breakdowns of each functional unit at runtime.
We also develope a thermal-aware job scheduling mechanism for reducing the performance loss due to the thermal pressure. Our methods leverage the natural discrepancies in thermal behavior among different workloads, and schedule them to keep the chip temperature within the cooling limit so as to minimize the amount of throttling. Our Linux kernel implementation of the entire framework shows noticeable performance improvements over a traditional thermal-oblivious job scheduling method while retaining its requirements for real-time and interactive jobs.
Publications
- Pu Liu, Hang Li, Lingling Jin, Wei Wu, Sheldon Tan, Jun Yang, "Thermal Simulation for Run-Time Temperature Tracking and Management," IEEE Transaction on Computer-Aided Design of Integrated Circuits and Systems, Vol. 25, No. 12, pp. 2882-2894, December 2006.
- Lingling Jin, Wei Wu, Jun Yang, Chuanjun Zhang, Youtao Zhang, "Reduce Register File Leakage through Cell Discharging," Internatioal Conference on Computer Design, 2006.
- Wei Wu, Lingling Jin, Jun Yang, Pu Liu, Sheldon Tan, "Efficient Method for Functional Unit Power Estimation in Modern Microprocessors," IEEE/ACM Design Automation Conference, pp. 554-557, 2006.
- Lingling Jin, Wei Wu, Jun Yang, Chuanjun Zhang, Youtao Zhang, "Dynamic Co-allocation of Resources for Level One Caches," The 2nd International Conference on Embedded Software and Systems, pp. 373-385, LNCS 3820, Springer Verlag, 2005.
- Pu Liu, Zhenyu Qi, Hang Li, Lingling Jin, Wei Wu, Sheldon Tan, Jun Yang, "Fast Thermal Simulation for Architecture Level Dynamic Thermal Management," International Conference on Computer-Aided Design, pp. 639-644, 2005.
- Hang Li, Pu Liu, Zhenyu Qi, Lingling Jin, Wei Wu, Sheldon Tan, Jun Yang, "Efficient Thermal Simulation for Run-Time Temperature Tracking and Management," International Conference on Computer Design, pp. 130-133, 2005.