[1]李亚军,徐宝文,周晓宇.基于AST的克隆序列与克隆类识别[J].东南大学学报(自然科学版),2008,38(2):228-232.[doi:10.3969/j.issn.1001-0505.2008.02.009]
 Li Yajun,Xu Baowen,Zhou Xiaoyu.Detection of clone sequences and classes using AST[J].Journal of Southeast University (Natural Science Edition),2008,38(2):228-232.[doi:10.3969/j.issn.1001-0505.2008.02.009]
点击复制

基于AST的克隆序列与克隆类识别()
分享到:

《东南大学学报(自然科学版)》[ISSN:1001-0505/CN:32-1178/N]

卷:
38
期数:
2008年第2期
页码:
228-232
栏目:
计算机科学与工程
出版日期:
2008-03-20

文章信息/Info

Title:
Detection of clone sequences and classes using AST
作者:
李亚军 徐宝文 周晓宇
东南大学计算机科学与工程学院, 南京 210096; 江苏省软件质量研究所, 南京 210096
Author(s):
Li Yajun Xu Baowen Zhou Xiaoyu
School of Computer Science and Engineering, Southeast University, Nanjing 210096, China
Jiangsu Institution of Software Quality, Nanjing 210096, China
关键词:
代码克隆 克隆识别 克隆类 软件维护
Keywords:
code clone clone detection clone class software maintenance
分类号:
TP311
DOI:
10.3969/j.issn.1001-0505.2008.02.009
摘要:
为了减少代码冗余,改善程序结构,提出一种新的基于抽象语法的代码克隆识别方法,归纳出常见的代码克隆形式并给出相应的重构技术.用二叉树表示源程序的抽象语法(BAST),逐条判断各语句BAST子树的同构性,识别出相似的语句序列作为克隆序列; 根据子树同构识别一元克隆类,然后通过克隆类的连接操作,逐步识别二元及任意元数的克隆类.实验分析了多个开源软件,识别出了其中的克隆序列以及克隆类,从中归纳出4种常见的代码克隆,其基本特征分别为:相同的程序点访问同类对象的不同属性、部分变量名不同、针对不同的数据类型实施相同的操作、修改克隆区域外定义的变量,并对这4种代码有效地实施了重构.
Abstract:
In order to reduce code redundancy and improve program structure, a novel approach based on abstract syntax is presented to detect clone code, and several kinds of code clones that occur frequently in programs are outlined. Corresponding refactoring techniques are also presented. Abstract syntax of the analyzed program is represented as binary tree(BAST). Isomorphism of sub-BAST is judged statement by statement. Similar statement sequences are detected as clone sequences. 1-tuple clone classes are detected according to isomorphism of sub-BAST. By the join operation of clone classes 2-tuple and other clone classes can be achieved stage by stage. The experiment analyzes several open source projects, and clone sequences and classes are detected. Four kinds of code clones are induced from the detection result which have the following characters respectively: accessing different properties of the same class’s objects at the same program point, modifying some variable names, applying the same operation to different types, modifying variables defined outside the clone area. All the four kinds of clone codes are refactored successfully.

参考文献/References:

[1] Fowler M.Refactoring:improving the design of existing code [M].New York:Addison Wesley,1999.
[2] Kamiya T,Kusumoto S,Inoue K.CCFinder:a multilinguistic token-based code clone detection system for large scale source code [J]. IEEE Transaction on Software Engineering,2002,28(7):654-670.
[3] Mann Z A.Three public enemies:cut,copy,and paste [J].IEEE Computer,2006,39(7):31-35.
[4] Baxter I D,Yahin A,Moura L,et al.Clone detection using abstract syntax trees[C] //IEEE International Conference on Software Maintenance.Bethesda,USA,1998:368-377.
[5] Bellon S.Detection of software clones [EB/OL].(2004-10-1)[2007-09-21].http://www.bauhaus-stuttga rt.de/clones/.
[6] Krinke J.Identifying similar code with program dependence graphs [C] //The 8th Working Conference on Reverse Engineering.Stuttgart,Germany,2001:301-309.
[7] Komondoor R V.Automated duplicated-code detection and procedure extraction [D].USA:University of Wisconsin-Madison,2003.
[8] Monden A,Nakae D,Kamiya T,et al.Software quality analysis by code clones in industrial legacy software [C] //The 8th IEEE International Software Metrics Symposium.Ottawa,Canada,2002:87-94.
[9] Greenan K.Method-level code clone detection on transformed abstract syntax trees using sequence matching algorithms [R].Santa Cruz:University of California,2005:1-17.
[10] Tairas R,Gray J.Phoenix-based clone detection using suffix trees [C] //The 44th Annual Southeast Regional Conference.Melbourne,Florida,USA,2006:679-684.
[11] Godfry M,Zou L.Using origin analysis to detect merging and splitting of source code entities [J].IEEE Transactions on Software Engineering,2005,31(2):166-181.
[12] Kim M,Notkin D.Using a clone genealogy extractor for understanding and supporting evolution of code clones[C] //International Workshop on Mining Software Repositories.Saint Louis,Missouri,USA,2005,30(4):1-5.
[13] Fowler M.Refactoring tools [EB/OL].(2007-08-16)[2007-09-21].http://www.refactoring.com/tools.html.
[14] 陈意云.形式语义学基础[M].合肥:中国科技大学出版社,1993:24-27.
[15] Cormen T H,Leiserson C E,Rivest R L,et al.Introduction to algorithms [M].2nd ed.USA:The MIT Press,2001:221-252.

备注/Memo

备注/Memo:
作者简介: 李亚军(1982—),男,硕士生; 徐宝文(联系人),男 博士,教授,博士生导师,bwxu@seu.edu.cn.
基金项目: 国家杰出青年科学基金资助项目(60425206)、国家自然科学基金资助项目(60503020)、江苏省自然科学基金资助项目(BK2006094)、江苏省高技术研究资助项目(BG2005032).
引文格式: 李亚军,徐宝文,周晓宇.基于AST的克隆序列与克隆类识别[J].东南大学学报:自然科学版,2008,38(2):228-232.
更新日期/Last Update: 2008-03-20