[1]卜爱国,余翩翩,吴建兵,等.基于自适应门控时钟的CPU功耗优化和VLSI设计[J].东南大学学报(自然科学版),2015,45(2):219-223.[doi:10.3969/j.issn.1001-0505.2015.02.004]
 Bu Aiguo,Yu Pianpian,Wu Jianbing,et al.Power optimization and VLSI design of CPU based on adaptive clock-gating[J].Journal of Southeast University (Natural Science Edition),2015,45(2):219-223.[doi:10.3969/j.issn.1001-0505.2015.02.004]
点击复制

基于自适应门控时钟的CPU功耗优化和VLSI设计()
分享到:

《东南大学学报(自然科学版)》[ISSN:1001-0505/CN:32-1178/N]

卷:
45
期数:
2015年第2期
页码:
219-223
栏目:
电路与系统
出版日期:
2015-03-20

文章信息/Info

Title:
Power optimization and VLSI design of CPU based on adaptive clock-gating
作者:
卜爱国余翩翩吴建兵单伟伟
东南大学国家专用集成电路系统工程研究中心, 南京 210096
Author(s):
Bu Aiguo Yu Pianpian Wu Jianbing Shan Weiwei
National ASIC System Engineering Research Center, Southeast University, Nanjing 210096, China
关键词:
低功耗 自适应时钟门控 流水线阻塞
Keywords:
low power adaptive clock-gating pipeline stall
分类号:
TN47
DOI:
10.3969/j.issn.1001-0505.2015.02.004
摘要:
提出了一种CPU的功耗优化方法,即通过自适应时钟门控来解决CPU中由于流水线阻塞、浮点处理器(FPU)和多媒体协处理器空闲所导致的动态功耗浪费.首先,设计了模块级自适应时钟门控单元,并通过芯片内部硬件电路来自动监测上述模块是否空闲,模块空闲时时钟关闭,从而消除了不需要的时钟翻转带来的模块内部动态功耗消耗.然后,将自适应时钟门控单元应用于国产处理器Unicore-2中,对其流水线阻塞、FPU和多媒体协处理器空闲的产生进行功耗优化.最后,基于TSMC 65 nm工艺下已流片芯片的网表和寄生参数文件,通过反标芯片的波形获得电路翻转率,并用PrimeTime PX工具进行了功耗仿真.仿真结果表明,利用本方法运行Dhrystone, Whestone和Stream三个典型测试程序时可获得18%~28%的功耗收益,其面积代价可以忽略,并对CPU性能没有影响.
Abstract:
A power optimization method of embedded processors based on self-adaptive clock gating is proposed, which can reduce the power waste caused by pipeline stall, FPU(float point unit )idle and multimedia co-processor idle. First, an adaptive module level clock-gating cell is designed, which can detect automatically whether the status of each module is idle through on-chip hardware. When the module is idle, its clock is turned off to save the dynamic power caused by unneeded clock toggling. Then, the adaptive clock-gating cell is applied to a domestic CPU(central processing unit)Unicore-2, and the power caused by pipeline stall, FPU and multimedia co-processor idle is optimized. Finally, based on the netlist and parasitic files of the previously fabricated TSMC 65 nm chip, the chip waveform is annotated to obtain the nets’ toggle rates, and then the power simulations are performed by the PrimeTime PX tool. The results show that an average of 18% to 28% power reduction can be obtained under typical test benchmarks of Dhrystone, Whestone and Stream, with negligible area overhead and no impact on CPU performance.

参考文献/References:

[1] Gonzalez R, Horowitz M. Energy dissipation in general purpose microprocessors [J]. IEEE Journal of Solid-State Circuits, 1996, 31(9): 1277-1284.
[2] Lotfi-Kamran P, Salehpour A A, Rahmani A M, et al. Dynamic power reduction of stalls in pipelined architecture processors[J]. International Journal of Design, Analysis & Tools for Integrated Circuits & Systems, 2011, 1(1):9-4.
[3] Choi K, Soma R, Pedram M. Dynamic voltage and frequency scaling based on workload decomposition[C]//ACM International Symposium on Low Power Electronics and Design. Newport Beach, CA, USA, 2004: 174-179.
[4] Jain S, Khare S, Yada S, et al. A 280 mV-to-1.2 V wide-operating-range IA-32 processor in 32 nm CMOS[C]//IEEE International Solid-State Circuits Conference Digest of Technical Papers. San Francisco, CA,USA, 2012: 66-68.
[5] Chang X, Zhang M, Zhang G, et al. Adaptive clock gating technique for low power IP core in SoC design [C]//IEEE International Symposium on Circuits and Systems. New Orleans, LA, USA, 2007: 2120-2123.
[6] Simon Tyler A, Ward William A, Boss Alan P. Performance analysis of Intel multiprocessors using astrophysics simulations [J]. Concurrency and Computation: Practice and Experience, 2012,24(2): 155-166.
[7] Padua David. Encyclopedia of parallel computing [M]. New York: Springer-Verlag, 2011: 127-129.
[8] Carazo P, Apolloni R, Castro F, et al. L1 data Cache power reduction using a forwarding predictor [J]. Lecture Notes on Computer Science, 2011, 6448: 116-125.
[9] Miller M, Janik K, Lu S L. Non-stalling counterflow microarchitecture [C]//4th International Symposium on High Performance Computer Architecture. Las Vegas, Nevada, USA, 1998: 120-126.

相似文献/References:

[1]梁宇,韩奇,魏同立,等.低功耗数字系统设计方法[J].东南大学学报(自然科学版),2000,30(5):136.[doi:10.3969/j.issn.1001-0505.2000.05.030]
 Liang Yu,Han Qi,Wei Tongli,et al.Low Power Design Methodology[J].Journal of Southeast University (Natural Science Edition),2000,30(2):136.[doi:10.3969/j.issn.1001-0505.2000.05.030]
[2]胡晨,张哲,史又华,等.模拟退火算法在低功耗BIST中的应用[J].东南大学学报(自然科学版),2002,32(2):177.[doi:10.3969/j.issn.1001-0505.2002.02.006]
 Hu Chen,Zhang Zhe,Shi Youhua,et al.Simulated annealing algorithm applied in low power BIST scheme[J].Journal of Southeast University (Natural Science Edition),2002,32(2):177.[doi:10.3969/j.issn.1001-0505.2002.02.006]

备注/Memo

备注/Memo:
收稿日期: 2014-09-16.
作者简介: 卜爱国(1978—),男,博士,副研究员;单伟伟(联系人),女,博士,副教授,wwshan@seu.edu.cn.
基金项目: 江苏省“青蓝工程”资助项目.
引用本文: 卜爱国,余翩翩,吴建兵,等.基于自适应门控时钟的CPU功耗优化和VLSI设计[J].东南大学学报:自然科学版,2015,45(2):219-223. [doi:10.3969/j.issn.1001-0505.2015.02.004]
更新日期/Last Update: 2015-03-20