[1]陆介平,刘月波,倪巍伟,等.基于投影数据库的序列模式挖掘增量式更新算法[J].东南大学学报(自然科学版),2006,36(3):457-462.[doi:10.3969/j.issn.1001-0505.2006.03.026]
 Lu Jieping,Liu Yuebo,Ni Weiwei,et al.Incremental updating algorithm for sequence patterns mining based on projected database[J].Journal of Southeast University (Natural Science Edition),2006,36(3):457-462.[doi:10.3969/j.issn.1001-0505.2006.03.026]
点击复制

基于投影数据库的序列模式挖掘增量式更新算法()
分享到:

《东南大学学报(自然科学版)》[ISSN:1001-0505/CN:32-1178/N]

卷:
36
期数:
2006年第3期
页码:
457-462
栏目:
计算机科学与工程
出版日期:
2006-05-20

文章信息/Info

Title:
Incremental updating algorithm for sequence patterns mining based on projected database
作者:
陆介平1 刘月波2 倪巍伟1 陈耿3 孙志挥1
1 东南大学计算机科学与工程学院, 南京 210096; 2 上海工程技术大学科研处, 上海 200366; 3 南京审计学院审计信息工程重点实验室, 南京 210029
Author(s):
Lu Jieping1 Liu Yuebo2 Ni Weiwei1 Chen Geng3 Sun Zhihui1
1 School of Computer Science and Engineering, Southeast University, Nanjing 210096, China
2 Scientific Research Office, Shanghai University of Engineering Science, Shanghai 200336,China
3 Key Laboratory of Audit In
关键词:
序列模式 数据挖掘 投影数据库 增量式更新
Keywords:
sequence patterns data mining projection database incremental updating
分类号:
TP311
DOI:
10.3969/j.issn.1001-0505.2006.03.026
摘要:
针对序列模式挖掘中的增量挖掘问题,提出一种序列模式更新算法ISPBP.算法引入序列数据库结构来存储从原始数据库中挖掘出的所有项、最大频繁模式以及它们的支持数,采用间接拼接方法,只需处理增量数据库,避免了对更新后数据库的重新计算.对于因增量数据库新产生的频繁模式,利用了在增量数据库中出现的频繁项集来减小投影数据库,进一步提高了算法的效率.理论分析和实验表明,算法是有效可行的,并且增量数据库越大,算法在效率上的优越性越明显,算法ISPBP优于传统增量式更新算法.
Abstract:
Considering the problem of incremental sequence pattern mining, an incremental sequential patterns mining based on projected database(ISPBP)algorithm is proposed. Sequential patterns base is applied to the algorithm, which stores all items, maximum frequent patterns and corresponding support counts in original database. Instead of remining impertinently, ISPBP updates the frequent items and patterns found previously by implicit merging and discovers new patterns by projection database. Furthermore ISPBP decreases the projection database using the frequent items in the increment database. Theoretical analysis and experiments testify that ISPBP is efficient and effective. The larger the scale of database, the more prominent the algorithm’s efficiency. ISPBP outperforms the conventional incremental updating algorithms.

参考文献/References:

[1] Han J,Pei J,Mortazavi-Asl B,et al.FreeSpan:frequent pattern-projected sequential pattern mining [C] //Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Boston,MA,USA,2000:355-359.
[2] Han J,Pei J,Mortazavi-Asl B,et al.PrefixSpan:mining sequence patterns efficiently by prefix-projected pattern growth [C] //Proceedings of the International Conference on Data Engineering.Heidelberg,Germany:IEEE Press,2001:215-226.
[3] Lin Ming-Yen,Lee Suh-Yin.Incremental update on sequential patterns in large databases by implicit merging and efficient counting [J].Information Systems,2004,29(5):385-404.
[4] Agrawal R,Srikant R.Mining sequential patterns [C] //Proceedings of the International Conference on Data Engineering.Tapei,1995:3-14.
[5] Parthasarathy S,Zaki M J,Ogihara M,et al.Incremental and interactive sequence mining[C] //Proceedings of 1999 International Conference on Information and Knowledge Management.Kansas City,MO,USA,1999:251-258.
[6] Zaki Mohammed J.Efficient enumeration of frequent sequences[C] //Proceedings of the 1998 ACM 7th International Conference on Information and Knowledge Management.Washington,1998:68-75.
[7] Masseglia F,Poncelet P,Teisseire M.Incremental mining of sequential patterns in large databases[EB/OL].(2000)[2005-05-30].http://citeseer.nj.nec.com/masseglia00incremental.html.
[8] Lu Jieping,Liu Yuebo,Ni Weiwei,et al.A fast interactive sequential pattern mining algorithm [J]. Wuhan University Journal of Natural Science,2005,11(6):31-36.
[9] Lin M Y,Lee S Y.Improving the efficiency of interactive sequential pattern mining by incremental pattern discovery [C/OL] //Proceedings of the 36th Annual Hawaii International Conference on System Sciences.Hawaii,2003 [2005-05-10].http://www.csie.nctu.edu.tw/~mylin/papers/conference/hicss36.pdf.
[10] Tsoukatos I,Gunopulos D.Efficient mining of spatiotemporal patterns[C/OL] //Proceedings of the 7th International Symposium of Advances in Spatial and Temporal Databases.2001:425-442 [2005-05-10].http://infolab.usc.edu/csci59/fall2001/paper/Ilias.pdf.

相似文献/References:

[1]吉根林,凌霄汉,杨明.一种基于集成学习的分布式聚类算法[J].东南大学学报(自然科学版),2007,37(4):585.[doi:10.3969/j.issn.1001-0505.2007.04.008]
 Ji Genlin,Ling Xiaohan,Yang Ming.Distributed clustering algorithm based on ensemble learning[J].Journal of Southeast University (Natural Science Edition),2007,37(3):585.[doi:10.3969/j.issn.1001-0505.2007.04.008]
[2]宋爱波,胡孔法,董逸生.Web日志挖掘[J].东南大学学报(自然科学版),2002,32(1):15.[doi:10.3969/j.issn.1001-0505.2002.01.004]
 Song Aibo,Hu Kongfa,Dong Yisheng.Research on Weblog mining[J].Journal of Southeast University (Natural Science Edition),2002,32(3):15.[doi:10.3969/j.issn.1001-0505.2002.01.004]
[3]吉根林,孙志挥.一种基于可信度最优的数量关联规则挖掘算法[J].东南大学学报(自然科学版),2001,31(2):31.[doi:10.3969/j.issn.1001-0505.2001.02.008]
 Ji Genlin,Sun Zhihui.An Algorithm for Mining Optimized Confidence Quantitative Association Rules[J].Journal of Southeast University (Natural Science Edition),2001,31(3):31.[doi:10.3969/j.issn.1001-0505.2001.02.008]
[4]胡孔法,张长海,陈崚,等.一种面向物流数据分析的路径序列挖掘算法ImGSP[J].东南大学学报(自然科学版),2008,38(6):970.[doi:10.3969/j.issn.1001-0505.2008.06.007]
 Hu Kongfa,Zhang Changhai,Chen Ling,et al.ImGSP:a path sequence mining algorithm for product flow analysis[J].Journal of Southeast University (Natural Science Edition),2008,38(3):970.[doi:10.3969/j.issn.1001-0505.2008.06.007]
[5]郭海燕,李枭雄,李拟珺,等.基于基频状态和帧间相关性的单通道语音分离算法[J].东南大学学报(自然科学版),2014,44(6):1099.[doi:10.3969/j.issn.1001-0505.2014.06.001]
 Guo Haiyan,Li Xiaoxiong,Li Nijun,et al.Single-channel speech separation based on pitch state and interframe correlation[J].Journal of Southeast University (Natural Science Edition),2014,44(3):1099.[doi:10.3969/j.issn.1001-0505.2014.06.001]

备注/Memo

备注/Memo:
基金项目: 高等学校博士学科点专项科研基金资助项目(20040286009)、江苏省自然科学基金资助项目(BK2004058)、审计署审计科研所专项资助项目(SK2006007).
作者简介: 陆介平(1959—),男,博士,教授, zjjplu@yahoo.com.cn.
更新日期/Last Update: 2006-05-20