[1]龚振志,胡孔法,达庆利,等.DMGSP:一种快速分布式全局序列模式挖掘算法[J].东南大学学报(自然科学版),2007,37(4):574-579.[doi:10.3969/j.issn.1001-0505.2007.04.006]
 Gong Zhenzhi,Hu Kongfa,Da Qingli,et al.DMGSP: an algorithm of distributed mining global sequential pattern on distributed system[J].Journal of Southeast University (Natural Science Edition),2007,37(4):574-579.[doi:10.3969/j.issn.1001-0505.2007.04.006]
点击复制

DMGSP:一种快速分布式全局序列模式挖掘算法()
分享到:

《东南大学学报(自然科学版)》[ISSN:1001-0505/CN:32-1178/N]

卷:
37
期数:
2007年第4期
页码:
574-579
栏目:
经济与管理
出版日期:
2007-07-20

文章信息/Info

Title:
DMGSP: an algorithm of distributed mining global sequential pattern on distributed system
作者:
龚振志1 胡孔法12 达庆利1 张长海2
1 东南大学经济管理学院, 南京 210096; 2 扬州大学计算机科学与工程系, 扬州 225009
Author(s):
Gong Zhenzhi1 Hu Kongfa12 Da Qingli1 Zhang Changhai2
1 School of Economics and Management, Southeast University, Nanjing 210096,China
2 Department of Computer Science and Engineering, Yangzhou University, Yangzhou 225009,China
关键词:
数据挖掘 分布式系统 全局序列模式 语法序列树
Keywords:
data mining distributed system global sequential pattern lexicographic sequence tree
分类号:
N945;TP311
DOI:
10.3969/j.issn.1001-0505.2007.04.006
摘要:
为了解决分布式环境下挖掘全局序列模式常产生过多候选序列,加大网络通信代价问题,提出了一种基于分布式环境下的快速挖掘全局序列模式算法——DMGSP.该算法将分布式环境下的各站点得到的局部序列模式压缩到一种语法序列树上, 避免了重复的序列前缀传输. 采用合并树中结点序列规则和项序扩展策略,对非频繁序列进行剪枝,有效地约简了候选序列,减少了网络传输量,从而快速生成全局序列模式.算法分析和实验结果表明,在大数据集环境下的DMGSP算法性能优越,能够有效地挖掘全局序列模式.
Abstract:
The current distributed sequential pattern mining algorithms usually generate too many candidate sequences and therefore increase communication overhead. To solve this problem, an efficient algorithm—DMGSP(distributed mining of global sequential pattern)of mining global sequential pattern on distributed system is proposed. DMGSP algorithm compresses local frequent sequential patterns into a lexicographic sequence tree, and avoids translation of repeated prefixes. By using the sequences regular of merged trees and efficient item and sequence extension pruning, non-frequent subsequence is pruned and candidate sequences can be reduced effectively. Therefore, communication overhead is reduced and global sequential patterns is generated effectively. The theory and experiments show that the performance of DMGSP is superior, which is advantageous for mining global sequential patterns with huge amount of data.

参考文献/References:

[1] Srikant R,Agrawal R.Mining sequential patterns:generalizations and performance improvements[C] //Proc of the Fifth Int Conference on Extending Database Technology.Heidelberg:Springer,1996:3-17.
[2] Mannila H,Toivonen H,Verkamo A I.Discovery of frequent episodes in sequences[C] //Proc of the First Int Conference on Knowledge Discovery and Data Mining.New York:ACM Press,1995:210-215.
[3] Garofalakis M,Rastogi R,Shim K.Spirit:sequential pattern mining with regular expression constraints [C] //Proc of the 25th Int Conference on Very Large Databases.San Francisco:Morgan Kaufmann,1999:223-234.
[4] Zaki M.Spade:an efficient algorithm for mining frequent sequences [J].Machine Learning,2001,41(1/2):31-60.
[5] Han J,Pei J.FreeSpan:frequent pattern-projected sequential pattern mining[C] //Proc of the International Conference on Knowledge Discovery and Data Mining.New York:ACM Press,2000:355-359.
[6] Pei J,Han J,Mortazavi-Asl B,et al.PrefixSpan:mining sequential patterns efficiently by prefix-projected pattern growth[C] //Proc of the Int Conf Data Engineering.Heidelberg:Springer,2001:215-224.
[7] Yan X,Han J,Afshar R.CloSpan:mining closed sequential patterns in large databases[C] //Proc 2003 SIAM Int Conf Data Mining(SDM’03).San Francisco:Morgan Kaufmann,2003:166-177.
[8] Guralnik V,Garg N,Vipin K.Parallel tree projection algorithm for sequence mining[C] //Euro Par 2001.Manchester:Springer-Verlag,2001:310-320.
[9] Agrawal R,Shafer J.Parallel mining of association rules[J].IEEE Trans on Knowledge and Data Engineering,1996,8(6):962-969.
[10] 陆介平,杨明,孙志挥,等.快速挖掘全局最大频繁项目集[J].软件学报.2005,16(4):553-560.
  Lu Jieping,Yang Ming,Sun Zhihui,et al.Fast mining of global maximum frequent item sets [J]. Journal of Software,2005,16(4):553-560.(in Chinese)
[11] Cheung D W,Han J,Ng V T,et al.A fast distributed algorithm for mining association rules[C] //Proc of the 4th Int Conf on Parallel and Distributed Information Systems.Miami Beach,F1orida,1996:31-44.
[12] 邹翔,张巍,刘洋,等.分布式序列模式发现算法的研究[J].软件学报,2005,16(7):1262-1269.
  Zou Xiang,Zhang Wei,Liu Yang,et al.Study on distributed sequential pattern discovery algorithm [J].Journal of Software,2005,16(7):1262-1269.(in Chinese)

相似文献/References:

[1]赵传申,孙志挥.半结构化文档数据流的快速频繁模式挖掘[J].东南大学学报(自然科学版),2006,36(3):452.[doi:10.3969/j.issn.1001-0505.2006.03.025]
 Zhao Chuanshen,Sun Zhihui.Fast mining frequent patterns in semi-structured data stream[J].Journal of Southeast University (Natural Science Edition),2006,36(4):452.[doi:10.3969/j.issn.1001-0505.2006.03.025]
[2]陆建江,徐宝文,邹晓峰,等.模糊关联规则的并行挖掘算法[J].东南大学学报(自然科学版),2005,35(2):165.[doi:10.3969/j.issn.1001-0505.2005.02.001]
 Lu Jianjiang,Xu Baowen,Zou Xiaofeng,et al.Parallel mining algorithm for fuzzy association rules[J].Journal of Southeast University (Natural Science Edition),2005,35(4):165.[doi:10.3969/j.issn.1001-0505.2005.02.001]
[3]丁艺明,金远平.一种基于记录分区的多值关联规则挖掘算法[J].东南大学学报(自然科学版),2000,30(2):6.[doi:10.3969/j.issn.1001-0505.2000.02.002]
 Ding Yiming,Jin Yuanping.A Record Partition Based Algorithm for Mining Quantitative Association Rules[J].Journal of Southeast University (Natural Science Edition),2000,30(4):6.[doi:10.3969/j.issn.1001-0505.2000.02.002]
[4]朱慧云,陈森发,张丽杰.动态环境下多个时期的客户购物模式变化挖掘[J].东南大学学报(自然科学版),2012,42(5):1012.[doi:10.3969/j.issn.1001-0505.2012.05.038]
 Zhu Huiyun,Chen Senfa,Zhang Lijie.Change mining of customer shopping patterns from multi-period datasets under dynamic environment[J].Journal of Southeast University (Natural Science Edition),2012,42(4):1012.[doi:10.3969/j.issn.1001-0505.2012.05.038]
[5]陆介平,刘月波,倪巍伟,等.基于PrefixSpan的快速交互序列模式挖掘算法[J].东南大学学报(自然科学版),2005,35(5):692.[doi:10.3969/j.issn.1001-0505.2005.05.008]
 Lu Jieping,Liu Yuebo,Ni Weiwei,et al.Fast interactive sequential pattern mining algorithm based on PrefixSpan[J].Journal of Southeast University (Natural Science Edition),2005,35(4):692.[doi:10.3969/j.issn.1001-0505.2005.05.008]
[6]张净,孙志挥.GDLOF:基于网格和稠密单元的快速局部离群点探测算法[J].东南大学学报(自然科学版),2005,35(6):863.[doi:10.3969/j.issn.1001-0505.2005.06.007]
 Zhang Jing,Sun Zhihui.GDLOF: fast local outlier detection algorithm with grid-based and dense cell[J].Journal of Southeast University (Natural Science Edition),2005,35(4):863.[doi:10.3969/j.issn.1001-0505.2005.06.007]
[7]杨明,孙志挥,吉根林.一种基于分布式数据库的全局频繁项目集更新算法[J].东南大学学报(自然科学版),2002,32(6):879.[doi:10.3969/j.issn.1001-0505.2002.06.012]
 Yang Ming,Sun Zhihui,Ji Genlin.Algorithm based on distributed database for updating global frequent itemsets[J].Journal of Southeast University (Natural Science Edition),2002,32(4):879.[doi:10.3969/j.issn.1001-0505.2002.06.012]
[8]陈岭,陈元中,陈根才.基于操作序列挖掘的OLAP查询推荐方法[J].东南大学学报(自然科学版),2011,41(3):498.[doi:10.3969/j.issn.1001-0505.2011.03.013]
 Chen Ling,Chen Yuanzhong,Chen Gencai.Operation sequence mining based OLAP query recommendation method[J].Journal of Southeast University (Natural Science Edition),2011,41(4):498.[doi:10.3969/j.issn.1001-0505.2011.03.013]
[9]胡孔法,唐小丽,达庆利,等.一种高效挖掘高维数据的频繁闭合模式算法[J].东南大学学报(自然科学版),2007,37(4):569.[doi:10.3969/j.issn.1001-0505.2007.04.005]
 Hu Kongfa,Tang Xiaoli,Da Qingli,et al.Efficient algorithm for frequent closed patterns mining from high dimensional data[J].Journal of Southeast University (Natural Science Edition),2007,37(4):569.[doi:10.3969/j.issn.1001-0505.2007.04.005]
[10]肖利,金远平,徐宏炳,等.一个新的挖掘广义关联规则算法[J].东南大学学报(自然科学版),1997,27(6):76.[doi:10.3969/j.issn.1001-0505.1997.06.015]
 Xiao Li,Jin Yuanping,Xu Hongbing,et al.A New Algorithm for Mining Generalized Association Rules[J].Journal of Southeast University (Natural Science Edition),1997,27(4):76.[doi:10.3969/j.issn.1001-0505.1997.06.015]

备注/Memo

备注/Memo:
基金项目: 国家自然科学基金资助项目(70472033)、江苏省 “青蓝工程”基金资助项目.
作者简介: 龚振志(1970—),男,博士生; 达庆利(联系人),男, 教授, 博士生导师,dqlseunj@126.com.
更新日期/Last Update: 2007-07-20