[1]曹鹏,杨锦江,梅晨.基于粗粒度可重构架构的并行FFT算法实现[J].东南大学学报(自然科学版),2013,43(6):1174-1179.[doi:10.3969/j.issn.1001-0505.2013.06.008]
 Cao Peng,Yang Jinjiang,Mei Chen.Parallel FFT algorithm implementation based on coarse-grained reconfigurable architecture[J].Journal of Southeast University (Natural Science Edition),2013,43(6):1174-1179.[doi:10.3969/j.issn.1001-0505.2013.06.008]
点击复制

基于粗粒度可重构架构的并行FFT算法实现()
分享到:

《东南大学学报(自然科学版)》[ISSN:1001-0505/CN:32-1178/N]

卷:
43
期数:
2013年第6期
页码:
1174-1179
栏目:
电子科学与工程
出版日期:
2013-11-20

文章信息/Info

Title:
Parallel FFT algorithm implementation based on coarse-grained reconfigurable architecture
作者:
曹鹏杨锦江梅晨
东南大学国家专用集成电路系统工程技术研究中心, 南京 210096
Author(s):
Cao Peng Yang Jinjiang Mei Chen
National ASIC System Engineering Research Center, Southeast University, Nanjing 210096, China
关键词:
粗粒度可重构架构 并行FFT算法 REMUS_LPP
Keywords:
coarse-grained reconfigurable architecture(CGRA) parallel fast Fourier transform(FFT)algorithm REMUS_LPP(reconfigurable embedded multimedia system low performance processor)
分类号:
TN302
DOI:
10.3969/j.issn.1001-0505.2013.06.008
摘要:
为了提升并行FFT算法的计算性能,基于粗粒度可重构架构REMUS_LPP(reconfigurable embedded multimedia system, low performance processor)提出了一种新的复数FFT实现方法.在实现FFT算法过程中,首先通过局部串行方式完成低阶部分,然后交换低阶部分结果后并行执行高阶部分.针对RCA内和RCA间的数据流优化,提出了流水气泡消除技术和数据块重排技术,从而提升了算法实现性能并降低了片上存储需求.芯片实测结果表明,提出的FFT算法实现方法的执行速度是其他同类并行计算架构的2.15~13.60倍,片上存储减少为其他方法的7.0%~28.1%.
Abstract:
In order to enhance the performance of the fast Fourier transform(FFT)algorithm, an implementation of complex FFT based on REMUS_LPP(reconfigurable embedded multimedia system,low performance processor), which is a coarse-grained reconfigurable architecture(CGRA)-based architecture, is proposed. The lower stages of the FFT algorithm are performed in local serial mode, and then the higher stages are carried out in parallel mode with the exchanged intermediate result of lower stages. Aiming at the optimization of data transfer in and between reconfigurable computing arrays(RCAs), the technique of pipeline bubble elimination and data block location rearrangement are presented to enhance the performance and reduce the on-chip memory cost. The proposed FFT algorithm was realized with real chip. The processing speed of the proposed FFT algorithm implementation is 2.15 to 13.60 times higher than that of other parallel FFT implementations with only a 7.0% to 28.1% local memory cost.

参考文献/References:

[1] Cervero T, López S, Callicó G M, et al. Survey of reconfigurable architectures for multimedia applications[C]//SPIE Proceedings of VLSI Circuits and Systems Ⅳ. Dresden, Germany, 2009: 736303-01-736303-12.
[2] Hartenstein R. A decade of reconfigurable computing: a visionary retrospective [C]//Proceedings of the Conference on Design, Automation and Test. Munich, Germany, 2001: 642-649.
[3] PACT Inc. White paper of reconfiguration on XPP-Ⅲ processor[R]. Munich, Germany: PACT Inc, 2006.
[4] Palkovic M, Cappelle H, Glassee M, et al. Mapping of 40 MHz MIMO SDM-OFDM baseband processing on multi-processor SDR platform [C]//11th IEEE Workshop on Design and Diagnostics of Electronic Circuits and System. Bratislava, Czechoslovakia, 2008: 86-91.
[5] Mei B, Sutter B, Aa T, et al. Implementation of a coarse-grained reconfigurable media processor for AVC decoder [J]. Journal of Signal Processing Systems, 2008, 51(1): 225-243.
[6] Mei B, Veredas F J, Masschelein B. Mapping an H.264/AVC decoder onto the ADRES reconfigurable architecture [C]//International Conference on Field Programmable Logic and Applications. Tampere, Finland, 2005: 622-625.
[7] Bahn J H, Yang J S, Bagherzadeh N. Parallel FFT algorithms on network-on-chips [C]//Fifth International Conference on Information Technology: New Generations. Las Vegas, NV, USA, 2008: 1087-1093.
[8] Kamalizad A H, Pan C, Bagherzadeh N. Fast parallel FFT on a reconfigurable computation platform [C]//15th Symposium on Computer Architecture and High Performance Computing. St Pauls, Brazil, 2003: 254-259.
[9] Cao L, Huang X M. Mapping parallel FFT algorithm onto SmartCell coarse-grained reconfigurable architecture [J]. IEICE Transactions on Electronics, 2010, E93C(3): 407-415.
[10] Liu Z Y, Xie Q F, Wang H K, et al. A high performance implementation of non-power-of-two FFT with EPUMA platform[C]//International Workshop on Information and Electronics Engineering. Harbin, China, 2012, 29: 3408-3412.
[11] Nguyen K, Cao P, Wang X X, et al. Implementation of H.264/AVC encoder on coarse-grained dynamically reconfigurable computing system [C]//Fourth International Conference on Communications and Electronics. Hue, Vietnam, 2012: 483-488.
[12] Liu B, Cao P, Zhu M, et al. Reconfiguration process optimization of dynamically coarse grain reconfigurable architecture for multimedia applications [J]. IEICE Transactions on Information and Systems, 2012, E95D(7): 1858-1871.
[13] Xiao J, Zhang J G, Zhu M, et al. Fast AdaBoost-based face detection system on a dynamically coarse grain reconfigurable architecture [J]. IEICE Transactions on Information and Systems, 2012, E95D(2): 392-402.
[14] Cooley J W, Tukey J W. An algorithm for the machine calculation of complex Fourier series [J]. Mathematics of Computation, 1965, 19(90): 297-301.

备注/Memo

备注/Memo:
作者简介: 曹鹏(1980—),男,博士,讲师,caopeng@seu.edu.cn.
基金项目: 国家自然科学基金资助项目(61204023, 61203251, 61272183)、国家高技术研究发展计划(863计划)资助项目(2012AA012703).
引文格式: 曹鹏,杨锦江,梅晨.基于粗粒度可重构架构的并行FFT算法实现[J].东南大学学报:自然科学版,2013,43(6):1174-1179. [doi:10.3969/j.issn.1001-0505.2013.06.008]
更新日期/Last Update: 2013-11-20