[1]陆介平,刘月波,倪巍伟,等.基于PrefixSpan的快速交互序列模式挖掘算法[J].东南大学学报(自然科学版),2005,35(5):692-696.[doi:10.3969/j.issn.1001-0505.2005.05.008]
 Lu Jieping,Liu Yuebo,Ni Weiwei,et al.Fast interactive sequential pattern mining algorithm based on PrefixSpan[J].Journal of Southeast University (Natural Science Edition),2005,35(5):692-696.[doi:10.3969/j.issn.1001-0505.2005.05.008]
点击复制

基于PrefixSpan的快速交互序列模式挖掘算法()
分享到:

《东南大学学报(自然科学版)》[ISSN:1001-0505/CN:32-1178/N]

卷:
35
期数:
2005年第5期
页码:
692-696
栏目:
计算机科学与工程
出版日期:
2005-09-20

文章信息/Info

Title:
Fast interactive sequential pattern mining algorithm based on PrefixSpan
作者:
陆介平12 刘月波3 倪巍伟1 刘同明2 孙志挥1
1 东南大学计算机科学与工程系, 南京 210096; 2 江苏科技大学电子与信息学院, 镇江 212003; 3 上海工程技术大学科研处, 上海 200336
Author(s):
Lu Jieping12 Liu Yuebo3 Ni Weiwei1 Liu Tongming2 Sun Zhihui1
1 Department of Computer Science and Engineering, Southeast University, Nanjing 210096, China
2 School of Electronics and Information, Jiangsu University of Science and Technology, Zhenjiang 212003, China
3 Scientific Research Office, Shanghai University of Engineering Science,Shanghai 200336, China
关键词:
数据挖掘 序列模式 交互式挖掘 投影数据库
Keywords:
data mining sequential patterns interactive mining projection database
分类号:
TP311
DOI:
10.3969/j.issn.1001-0505.2005.05.008
摘要:
为了克服序列模式挖掘过程中重复运行挖掘算法而产生的时空消耗,提出了一个快速、简单而有效序列模式的交互式算法FISPM,利用前次挖掘得到的序列构造序列模式数据库用来存储挖掘出来的所有序列, 通过缩减本次挖掘所要构造投影数据库的频繁项的数量来减少构造投影数据库所需的时间以及投影数据库的大小,从而减少时间和空间消耗,提高挖掘效率.通过设置全局最小支持度来减少算法迭代次数. 实验结果证明在交互挖掘过程中FISPM效率优于PrefixSpan.
Abstract:
A novel sequential patterns mining method based on PrefixSpan, called FISPM(fast interactive sequential patterns mining algorithm), is proposed in this paper to reduce time-consuming in rerunning algorithm of sequential patterns query. In which, by building sequence patterns base(SPB)for saving all the mined patterns and minimum support, the number of frequent items of the projection databases constructed by the correct mining which based on the previously mined sequences can be reduced, and the interactive process and the efficiency can be spended up by using SPB to reduce the time and space consuming of projected database. Furthermore, using globalthreshold can reduce the number of iteration. The results of experiments on several different databases indicate that the performance of FISPM is better than that of PrefixSpan in the interactive mining.

参考文献/References:

[1] Agrawal C C,Yu P S.Online generation of association rules[A].In: Proceedings of the 14th International Conference on Data Engineering [C].Orlando,Florida,USA,1998.402-411.
[2] Hidber C.Online association rule mining[R].Barkeley:Technical Report UCB/CSD-98-1004,UC,1998.
[3] Nag B,Deshpande P M,DeWitt D J.Using a knowledge cache for interactive discovery of association rules[A].In:Proceedings of the 1999 SIGKDD Conference[C].San Diego,California,1999.244-253.
[4] Parthasarathy S,Dwarkadas S,Ogihara M.Active mining in a distributed setting[A].In:Proceedings of Workshop on Large-Scale Parallel KDD Systems[C].San Diego,CA,USA,1999.65-85.
[5] Parthasarathy S,Zaki M J,Ogihara M,et al.Incremental and interactive sequence sequence mining[C].In:Proceedings of the 8th International Conference on Information and Knowledge Management[C].Kansas,Missouri,USA,1999.65-85.
[6] Lin Mingyen,Lee Suhyin.Improving the efficiency of interactive sequential pattern mining by incremental pattern discovery[A].In:Proceedings of the 36th Hawaii International Conference on System Sciences[C].Hawil,USA,2002.68-75.
[7] Lin Mingyen,Lee Suhyin.Interactive sequence discovery by incremental mining[J].Information Sciences,2004,165(3,4):187-205.
[8] Agrawal R,Srikant R.Mining sequential patterns:generalizations and performance improvements[A].In: Proceeding of the International Conference on Extending Data Base Technology[C].New York:Springer Verlag,1996.3-17.
[9] Zaki M J.Efficient enumeration of frequent sequences [A].In: Proceedings of the 1998 ACM 7th International Conference on Information and Knowledge Management [C].Washington,USA,1998.68-75.
[10] Han J,Pei J,Mortazavi-Asl B,et al.Freespan:frequent pattern-projected sequential pattern mining[A].In: Proceedings of the International Conference on Knowledge Discovery and Data Mining ACM [C].Montreal,Canada,2000.355-359.
[11] Pei J,Han J,Mortazavi-Asl B,et al.PrefixSpan:mining sequence patterns efficiently by prefix-projected pattern growth[A].In:17th the International Conference on Data Engineering[C].H(·overe)idelberg,2001.215-224.

相似文献/References:

[1]赵传申,孙志挥.半结构化文档数据流的快速频繁模式挖掘[J].东南大学学报(自然科学版),2006,36(3):452.[doi:10.3969/j.issn.1001-0505.2006.03.025]
 Zhao Chuanshen,Sun Zhihui.Fast mining frequent patterns in semi-structured data stream[J].Journal of Southeast University (Natural Science Edition),2006,36(5):452.[doi:10.3969/j.issn.1001-0505.2006.03.025]
[2]陆建江,徐宝文,邹晓峰,等.模糊关联规则的并行挖掘算法[J].东南大学学报(自然科学版),2005,35(2):165.[doi:10.3969/j.issn.1001-0505.2005.02.001]
 Lu Jianjiang,Xu Baowen,Zou Xiaofeng,et al.Parallel mining algorithm for fuzzy association rules[J].Journal of Southeast University (Natural Science Edition),2005,35(5):165.[doi:10.3969/j.issn.1001-0505.2005.02.001]
[3]丁艺明,金远平.一种基于记录分区的多值关联规则挖掘算法[J].东南大学学报(自然科学版),2000,30(2):6.[doi:10.3969/j.issn.1001-0505.2000.02.002]
 Ding Yiming,Jin Yuanping.A Record Partition Based Algorithm for Mining Quantitative Association Rules[J].Journal of Southeast University (Natural Science Edition),2000,30(5):6.[doi:10.3969/j.issn.1001-0505.2000.02.002]
[4]朱慧云,陈森发,张丽杰.动态环境下多个时期的客户购物模式变化挖掘[J].东南大学学报(自然科学版),2012,42(5):1012.[doi:10.3969/j.issn.1001-0505.2012.05.038]
 Zhu Huiyun,Chen Senfa,Zhang Lijie.Change mining of customer shopping patterns from multi-period datasets under dynamic environment[J].Journal of Southeast University (Natural Science Edition),2012,42(5):1012.[doi:10.3969/j.issn.1001-0505.2012.05.038]
[5]张净,孙志挥.GDLOF:基于网格和稠密单元的快速局部离群点探测算法[J].东南大学学报(自然科学版),2005,35(6):863.[doi:10.3969/j.issn.1001-0505.2005.06.007]
 Zhang Jing,Sun Zhihui.GDLOF: fast local outlier detection algorithm with grid-based and dense cell[J].Journal of Southeast University (Natural Science Edition),2005,35(5):863.[doi:10.3969/j.issn.1001-0505.2005.06.007]
[6]杨明,孙志挥,吉根林.一种基于分布式数据库的全局频繁项目集更新算法[J].东南大学学报(自然科学版),2002,32(6):879.[doi:10.3969/j.issn.1001-0505.2002.06.012]
 Yang Ming,Sun Zhihui,Ji Genlin.Algorithm based on distributed database for updating global frequent itemsets[J].Journal of Southeast University (Natural Science Edition),2002,32(5):879.[doi:10.3969/j.issn.1001-0505.2002.06.012]
[7]陈岭,陈元中,陈根才.基于操作序列挖掘的OLAP查询推荐方法[J].东南大学学报(自然科学版),2011,41(3):498.[doi:10.3969/j.issn.1001-0505.2011.03.013]
 Chen Ling,Chen Yuanzhong,Chen Gencai.Operation sequence mining based OLAP query recommendation method[J].Journal of Southeast University (Natural Science Edition),2011,41(5):498.[doi:10.3969/j.issn.1001-0505.2011.03.013]
[8]胡孔法,唐小丽,达庆利,等.一种高效挖掘高维数据的频繁闭合模式算法[J].东南大学学报(自然科学版),2007,37(4):569.[doi:10.3969/j.issn.1001-0505.2007.04.005]
 Hu Kongfa,Tang Xiaoli,Da Qingli,et al.Efficient algorithm for frequent closed patterns mining from high dimensional data[J].Journal of Southeast University (Natural Science Edition),2007,37(5):569.[doi:10.3969/j.issn.1001-0505.2007.04.005]
[9]龚振志,胡孔法,达庆利,等.DMGSP:一种快速分布式全局序列模式挖掘算法[J].东南大学学报(自然科学版),2007,37(4):574.[doi:10.3969/j.issn.1001-0505.2007.04.006]
 Gong Zhenzhi,Hu Kongfa,Da Qingli,et al.DMGSP: an algorithm of distributed mining global sequential pattern on distributed system[J].Journal of Southeast University (Natural Science Edition),2007,37(5):574.[doi:10.3969/j.issn.1001-0505.2007.04.006]
[10]肖利,金远平,徐宏炳,等.一个新的挖掘广义关联规则算法[J].东南大学学报(自然科学版),1997,27(6):76.[doi:10.3969/j.issn.1001-0505.1997.06.015]
 Xiao Li,Jin Yuanping,Xu Hongbing,et al.A New Algorithm for Mining Generalized Association Rules[J].Journal of Southeast University (Natural Science Edition),1997,27(5):76.[doi:10.3969/j.issn.1001-0505.1997.06.015]

备注/Memo

备注/Memo:
基金项目: 国家自然科学基金资助项目(70371015)、江苏省自然科学基金资助项目(BK2004058).
作者简介: 陆介平(1959—),男,博士生,教授; 孙志挥(联系人),男,教授,博士生导师,sunzh@seu.edu.cn.
更新日期/Last Update: 2005-09-20