[1]赵传申,孙志挥.半结构化文档数据流的快速频繁模式挖掘[J].东南大学学报(自然科学版),2006,36(3):452-456.[doi:10.3969/j.issn.1001-0505.2006.03.025]
 Zhao Chuanshen,Sun Zhihui.Fast mining frequent patterns in semi-structured data stream[J].Journal of Southeast University (Natural Science Edition),2006,36(3):452-456.[doi:10.3969/j.issn.1001-0505.2006.03.025]
点击复制

半结构化文档数据流的快速频繁模式挖掘()
分享到:

《东南大学学报(自然科学版)》[ISSN:1001-0505/CN:32-1178/N]

卷:
36
期数:
2006年第3期
页码:
452-456
栏目:
计算机科学与工程
出版日期:
2006-05-20

文章信息/Info

Title:
Fast mining frequent patterns in semi-structured data stream
作者:
赵传申 孙志挥
东南大学计算机科学与工程学院, 南京 210096
Author(s):
Zhao Chuanshen Sun Zhihui
School of Computer Science and Engineering, Southeast University, Nanjing 210096, China
关键词:
数据挖掘 频繁模式 数据流 枚举树
Keywords:
data mining frequent pattern data stream enumeration tree
分类号:
TP311
DOI:
10.3969/j.issn.1001-0505.2006.03.025
摘要:
为了提高半结构化文档数据流的挖掘效率,对原有挖掘算法StreamT进行了改进,提出了一种半结构化文档数据流的快速频繁模式挖掘算法——FStreamT.该算法针对利用集合存储候选频繁模式效率较低的缺点,采用枚举树存储候选频繁模式,可以有效地提高对候选频繁模式集合进行查找和更新的效率,同时利用频繁模式的单调性和枚举树的特点减小了维护负边界的搜索空间,从而提高了整个算法的效率.理论分析和实验结果表明,算法FStreamT与算法StreamT相比具有较高的效率,是有效可行的.
Abstract:
To improve the efficiency of the semi-structured data stream mining, a fast algorithm for mining frequent patterns from semi-structured data stream, FStreamT, is proposed based on StreamT. To solve the problem of low efficiency of storing frequent patterns in set, this algorithm stores frequent patterns in enumeration tree, which is more efficient when searching and updating the frequent pattern set, and at the same time reduces the search space of maintaining the negative border using the monotonicity of frequent pattern and the characteristics of enumeration tree. Theoretical analysis and experimental results show that the FStreamT algorithm is feasible and more efficient than the StreamT algorithm.

参考文献/References:

[1] Abiteboul S,Buneman P,Suciu D. Data on the Web:from relations to semistructured data and XML[M].San Francisco,CA:Morgan Kaufmann,2000.
[2] W3C.Extensive markup language(XML)1.0(second edition)[EB/OL].(2000-10-06)[2005-05-20].http://www.w3.org/TR/REC-xml.
[3] Asai T,Abe K,Kawasoe S,et al.Efficient substructure discovery from large semi-structured data [C] // Proc of the 2nd SIAM Int’l Conf on Data Mining(SDM2002).Arlington,VA,USA,2002:158-174.
[4] Zaki Mohammed J.Efficiently mining frequent trees in a forest[C] //Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Edmonton,Alberta,Canada,2002:71-80.
[5] Asai T,Arimura H,Abe K,et al.Online algorithm for mining semi-structured data stream[C] //Proceedings of the 2002 IEEE International Conference on Data Mining(ICDM 2002).Maebashi City,Japan,2002:27-34.
[6] de Berg M,van Kreveld M,Overmars M,et al.Computational geometry,algorithms and applications[M].Springer,2000.
[7] Hidber C.Online association rule mining [C] //Proceedings of ACM SIGMOD International Conference on Management of Data.Washington,DC,1999:145-156.
[8] Chi Y,Yang Y,Xia Y,et al.CMTreeMiner:mining both closed and maximal frequent subtrees[C] //The Eighth Pacific Asia Conference on Knowledge Discovery and Data Mining(PAKDD’04).Sidney,2004:63-73.
[9] Prthasarathy S,Zaki M J,Ogihara M,et al.Incremental and interactive sequence mining[C] //Proc of Int’ l Conf on Information and Knowledge Management(CIKM’99).New York:ACM Press,1999:251-258.

相似文献/References:

[1]陆建江,徐宝文,邹晓峰,等.模糊关联规则的并行挖掘算法[J].东南大学学报(自然科学版),2005,35(2):165.[doi:10.3969/j.issn.1001-0505.2005.02.001]
 Lu Jianjiang,Xu Baowen,Zou Xiaofeng,et al.Parallel mining algorithm for fuzzy association rules[J].Journal of Southeast University (Natural Science Edition),2005,35(3):165.[doi:10.3969/j.issn.1001-0505.2005.02.001]
[2]丁艺明,金远平.一种基于记录分区的多值关联规则挖掘算法[J].东南大学学报(自然科学版),2000,30(2):6.[doi:10.3969/j.issn.1001-0505.2000.02.002]
 Ding Yiming,Jin Yuanping.A Record Partition Based Algorithm for Mining Quantitative Association Rules[J].Journal of Southeast University (Natural Science Edition),2000,30(3):6.[doi:10.3969/j.issn.1001-0505.2000.02.002]
[3]朱慧云,陈森发,张丽杰.动态环境下多个时期的客户购物模式变化挖掘[J].东南大学学报(自然科学版),2012,42(5):1012.[doi:10.3969/j.issn.1001-0505.2012.05.038]
 Zhu Huiyun,Chen Senfa,Zhang Lijie.Change mining of customer shopping patterns from multi-period datasets under dynamic environment[J].Journal of Southeast University (Natural Science Edition),2012,42(3):1012.[doi:10.3969/j.issn.1001-0505.2012.05.038]
[4]陆介平,刘月波,倪巍伟,等.基于PrefixSpan的快速交互序列模式挖掘算法[J].东南大学学报(自然科学版),2005,35(5):692.[doi:10.3969/j.issn.1001-0505.2005.05.008]
 Lu Jieping,Liu Yuebo,Ni Weiwei,et al.Fast interactive sequential pattern mining algorithm based on PrefixSpan[J].Journal of Southeast University (Natural Science Edition),2005,35(3):692.[doi:10.3969/j.issn.1001-0505.2005.05.008]
[5]张净,孙志挥.GDLOF:基于网格和稠密单元的快速局部离群点探测算法[J].东南大学学报(自然科学版),2005,35(6):863.[doi:10.3969/j.issn.1001-0505.2005.06.007]
 Zhang Jing,Sun Zhihui.GDLOF: fast local outlier detection algorithm with grid-based and dense cell[J].Journal of Southeast University (Natural Science Edition),2005,35(3):863.[doi:10.3969/j.issn.1001-0505.2005.06.007]
[6]杨明,孙志挥,吉根林.一种基于分布式数据库的全局频繁项目集更新算法[J].东南大学学报(自然科学版),2002,32(6):879.[doi:10.3969/j.issn.1001-0505.2002.06.012]
 Yang Ming,Sun Zhihui,Ji Genlin.Algorithm based on distributed database for updating global frequent itemsets[J].Journal of Southeast University (Natural Science Edition),2002,32(3):879.[doi:10.3969/j.issn.1001-0505.2002.06.012]
[7]陈岭,陈元中,陈根才.基于操作序列挖掘的OLAP查询推荐方法[J].东南大学学报(自然科学版),2011,41(3):498.[doi:10.3969/j.issn.1001-0505.2011.03.013]
 Chen Ling,Chen Yuanzhong,Chen Gencai.Operation sequence mining based OLAP query recommendation method[J].Journal of Southeast University (Natural Science Edition),2011,41(3):498.[doi:10.3969/j.issn.1001-0505.2011.03.013]
[8]胡孔法,唐小丽,达庆利,等.一种高效挖掘高维数据的频繁闭合模式算法[J].东南大学学报(自然科学版),2007,37(4):569.[doi:10.3969/j.issn.1001-0505.2007.04.005]
 Hu Kongfa,Tang Xiaoli,Da Qingli,et al.Efficient algorithm for frequent closed patterns mining from high dimensional data[J].Journal of Southeast University (Natural Science Edition),2007,37(3):569.[doi:10.3969/j.issn.1001-0505.2007.04.005]
[9]龚振志,胡孔法,达庆利,等.DMGSP:一种快速分布式全局序列模式挖掘算法[J].东南大学学报(自然科学版),2007,37(4):574.[doi:10.3969/j.issn.1001-0505.2007.04.006]
 Gong Zhenzhi,Hu Kongfa,Da Qingli,et al.DMGSP: an algorithm of distributed mining global sequential pattern on distributed system[J].Journal of Southeast University (Natural Science Edition),2007,37(3):574.[doi:10.3969/j.issn.1001-0505.2007.04.006]
[10]肖利,金远平,徐宏炳,等.一个新的挖掘广义关联规则算法[J].东南大学学报(自然科学版),1997,27(6):76.[doi:10.3969/j.issn.1001-0505.1997.06.015]
 Xiao Li,Jin Yuanping,Xu Hongbing,et al.A New Algorithm for Mining Generalized Association Rules[J].Journal of Southeast University (Natural Science Edition),1997,27(3):76.[doi:10.3969/j.issn.1001-0505.1997.06.015]

备注/Memo

备注/Memo:
基金项目: 国家自然科学基金资助项目(70371015).
作者简介: 赵传申(1973—),男,博士生; 孙志挥(联系人),男,教授,博士生导师,szh@seu.edu.cn.
更新日期/Last Update: 2006-05-20