[1]胡孔法,唐小丽,达庆利,等.一种高效挖掘高维数据的频繁闭合模式算法[J].东南大学学报(自然科学版),2007,37(4):569-573.[doi:10.3969/j.issn.1001-0505.2007.04.005]
 Hu Kongfa,Tang Xiaoli,Da Qingli,et al.Efficient algorithm for frequent closed patterns mining from high dimensional data[J].Journal of Southeast University (Natural Science Edition),2007,37(4):569-573.[doi:10.3969/j.issn.1001-0505.2007.04.005]
点击复制

一种高效挖掘高维数据的频繁闭合模式算法()
分享到:

《东南大学学报(自然科学版)》[ISSN:1001-0505/CN:32-1178/N]

卷:
37
期数:
2007年第4期
页码:
569-573
栏目:
经济与管理
出版日期:
2007-07-20

文章信息/Info

Title:
Efficient algorithm for frequent closed patterns mining from high dimensional data
作者:
胡孔法12 唐小丽2 达庆利1 陈崚2
1 东南大学经济管理学院, 南京 210096; 2 扬州大学计算机科学与工程系, 扬州 225009
Author(s):
Hu Kongfa12 Tang Xiaoli2 Da Qingli1 Chen Ling2
1 School of Economics and Management, Southeast University, Nanjing 210096,China
2 Department of Computer Science and Engineering, Yangzhou University, Yangzhou 225009, China
关键词:
数据挖掘 频繁闭合模式 行枚举 混合树
Keywords:
data mining frequent closed patterns row enumeration compound tree
分类号:
N945;TP311
DOI:
10.3969/j.issn.1001-0505.2007.04.005
摘要:
为了克服传统高维数据挖掘频繁闭合模式算法迭代产生子表,引起算法执行时间长和存储开销大等问题,提出了一种高效挖掘高维数据的频繁闭合模式的算法EMHCP. EMHCP算法采用一种新型结构位图表来压缩存储数据,在仅扫描数据库一次后,建立位图转换表.根据位图转换表来构建混合树结构,采用深度优先的方式和有效的剪枝策略高效挖掘出所有的闭合模式.从而有效地缩小了搜索空间,加快了处理速度.通过在生物数据库应用的实验结果表明, EMHCP算法比已有的CARPENTER和TD-close等算法更为有效.
Abstract:
The traditional algorithms for mining frequent closed patterns from high dimensional data interactively generate conditional tables, which costs much runtime and memory space. To solve these problems, a new algorithm—EMHCP(efficient mining of frequent closed patterns from high dimensional data)is proposed. The EMHCP algorithm adopts a novel structure, a bit map table, to compress the store data. With the table, a compound tree is constructed after scanning the database only once. By searching with the depth preferentially and using efficient pruning strategies, EMHCP can mine all frequent closed patterns efficiently. Therefore, the search space is reduced, and the mining speed is accelerated. The experiments on real bioinformatics datasets show that EMHCP is more efficient than previous algorithms such as CARPENTER and TD-close.

参考文献/References:

[1] Pasquier N,Bastide Y.Discovering frequent closed itemsets for association rules[C] //Proceedings of the 7th Int’l Conf on Database Theory.Jerusalem:Springer-Verlag,1999:398-416.
[2] Pei J,Han J,Mao R.CLOSET:an efficient algorithm for mining frequent closed itemsets[C] //Proc 2000 ACM-SIGMOD Int Workshop Data Mining and Knowledge Discovery.New York:ACM Press,2000:11-20.
[3] Wang J,Han J,Pei J.Closet+:searching for the best strategies for mining frequent closed itemsets[C] //Proc 2003 ACM SIGKDD.New York:ACM Press,2003:236-245.
[4] Zaki M,Hsiao C.ChARM:an efficient algorithm for closed association rule mining[C] //Proc of 2002 SIAM Data Mining Conf.Arlington,VA,2002:457-473.
[5] Pan F,Cong G,Zaki M.CARPENTER:finding closed patterns in long biological datasets[C] //Proc ACM SIGKDD 2003.New York:ACM Press,2003:637-642.
[6] Cong G,Tung A,Xu X.FARMER:finding interesting rule groups in microarray datasets[C] //Proc 23rd ACM Int Conf Management of Data.New York:ACM Press,2004:143-154.
[7] Liu H,Han J,Xin D,et al.Mining interesting patterns from very high dimensional data:a top-down row enumeration approach [C/OL] //Proc of the 6th SIAM International Conference on Data Mining.Bethesda,MD,2006.http://www.siam.org/meetings/sdmob/proceedings/026liuh.pdf.
[8] Liu H,Han J,Xin D,et al.Top-down mining of interesting patterns from very high dimensional data[C] //Proc 22nd International Conference on Data Engineering.Los Alamitos:IEEE Computer Society Press,2006:114-116.
[9] Creighton C,Hanash S.Mining gene expression databases for association rules[J]. Bioinformatics,2003,19(1):79-86.

相似文献/References:

[1]赵传申,孙志挥.半结构化文档数据流的快速频繁模式挖掘[J].东南大学学报(自然科学版),2006,36(3):452.[doi:10.3969/j.issn.1001-0505.2006.03.025]
 Zhao Chuanshen,Sun Zhihui.Fast mining frequent patterns in semi-structured data stream[J].Journal of Southeast University (Natural Science Edition),2006,36(4):452.[doi:10.3969/j.issn.1001-0505.2006.03.025]
[2]陆建江,徐宝文,邹晓峰,等.模糊关联规则的并行挖掘算法[J].东南大学学报(自然科学版),2005,35(2):165.[doi:10.3969/j.issn.1001-0505.2005.02.001]
 Lu Jianjiang,Xu Baowen,Zou Xiaofeng,et al.Parallel mining algorithm for fuzzy association rules[J].Journal of Southeast University (Natural Science Edition),2005,35(4):165.[doi:10.3969/j.issn.1001-0505.2005.02.001]
[3]丁艺明,金远平.一种基于记录分区的多值关联规则挖掘算法[J].东南大学学报(自然科学版),2000,30(2):6.[doi:10.3969/j.issn.1001-0505.2000.02.002]
 Ding Yiming,Jin Yuanping.A Record Partition Based Algorithm for Mining Quantitative Association Rules[J].Journal of Southeast University (Natural Science Edition),2000,30(4):6.[doi:10.3969/j.issn.1001-0505.2000.02.002]
[4]朱慧云,陈森发,张丽杰.动态环境下多个时期的客户购物模式变化挖掘[J].东南大学学报(自然科学版),2012,42(5):1012.[doi:10.3969/j.issn.1001-0505.2012.05.038]
 Zhu Huiyun,Chen Senfa,Zhang Lijie.Change mining of customer shopping patterns from multi-period datasets under dynamic environment[J].Journal of Southeast University (Natural Science Edition),2012,42(4):1012.[doi:10.3969/j.issn.1001-0505.2012.05.038]
[5]陆介平,刘月波,倪巍伟,等.基于PrefixSpan的快速交互序列模式挖掘算法[J].东南大学学报(自然科学版),2005,35(5):692.[doi:10.3969/j.issn.1001-0505.2005.05.008]
 Lu Jieping,Liu Yuebo,Ni Weiwei,et al.Fast interactive sequential pattern mining algorithm based on PrefixSpan[J].Journal of Southeast University (Natural Science Edition),2005,35(4):692.[doi:10.3969/j.issn.1001-0505.2005.05.008]
[6]张净,孙志挥.GDLOF:基于网格和稠密单元的快速局部离群点探测算法[J].东南大学学报(自然科学版),2005,35(6):863.[doi:10.3969/j.issn.1001-0505.2005.06.007]
 Zhang Jing,Sun Zhihui.GDLOF: fast local outlier detection algorithm with grid-based and dense cell[J].Journal of Southeast University (Natural Science Edition),2005,35(4):863.[doi:10.3969/j.issn.1001-0505.2005.06.007]
[7]杨明,孙志挥,吉根林.一种基于分布式数据库的全局频繁项目集更新算法[J].东南大学学报(自然科学版),2002,32(6):879.[doi:10.3969/j.issn.1001-0505.2002.06.012]
 Yang Ming,Sun Zhihui,Ji Genlin.Algorithm based on distributed database for updating global frequent itemsets[J].Journal of Southeast University (Natural Science Edition),2002,32(4):879.[doi:10.3969/j.issn.1001-0505.2002.06.012]
[8]陈岭,陈元中,陈根才.基于操作序列挖掘的OLAP查询推荐方法[J].东南大学学报(自然科学版),2011,41(3):498.[doi:10.3969/j.issn.1001-0505.2011.03.013]
 Chen Ling,Chen Yuanzhong,Chen Gencai.Operation sequence mining based OLAP query recommendation method[J].Journal of Southeast University (Natural Science Edition),2011,41(4):498.[doi:10.3969/j.issn.1001-0505.2011.03.013]
[9]龚振志,胡孔法,达庆利,等.DMGSP:一种快速分布式全局序列模式挖掘算法[J].东南大学学报(自然科学版),2007,37(4):574.[doi:10.3969/j.issn.1001-0505.2007.04.006]
 Gong Zhenzhi,Hu Kongfa,Da Qingli,et al.DMGSP: an algorithm of distributed mining global sequential pattern on distributed system[J].Journal of Southeast University (Natural Science Edition),2007,37(4):574.[doi:10.3969/j.issn.1001-0505.2007.04.006]
[10]肖利,金远平,徐宏炳,等.一个新的挖掘广义关联规则算法[J].东南大学学报(自然科学版),1997,27(6):76.[doi:10.3969/j.issn.1001-0505.1997.06.015]
 Xiao Li,Jin Yuanping,Xu Hongbing,et al.A New Algorithm for Mining Generalized Association Rules[J].Journal of Southeast University (Natural Science Edition),1997,27(4):76.[doi:10.3969/j.issn.1001-0505.1997.06.015]

备注/Memo

备注/Memo:
基金项目: 国家自然科学基金资助项目(70472033,60473012)、国家科技基础条件平台建设资助项目(2004DKA20310)、江苏省自然科学基金资助项目(BK2005047,BK2005046)、江苏省高校“青蓝工程”基金资助项目.
作者简介: 胡孔法(1970—),男,博士,副教授,硕士生导师, kfhu05@126.com.
更新日期/Last Update: 2007-07-20