[1]张净,孙志挥.GDLOF:基于网格和稠密单元的快速局部离群点探测算法[J].东南大学学报(自然科学版),2005,35(6):863-866.[doi:10.3969/j.issn.1001-0505.2005.06.007]
 Zhang Jing,Sun Zhihui.GDLOF: fast local outlier detection algorithm with grid-based and dense cell[J].Journal of Southeast University (Natural Science Edition),2005,35(6):863-866.[doi:10.3969/j.issn.1001-0505.2005.06.007]
点击复制

GDLOF:基于网格和稠密单元的快速局部离群点探测算法()
分享到:

《东南大学学报(自然科学版)》[ISSN:1001-0505/CN:32-1178/N]

卷:
35
期数:
2005年第6期
页码:
863-866
栏目:
计算机科学与工程
出版日期:
2005-11-20

文章信息/Info

Title:
GDLOF: fast local outlier detection algorithm with grid-based and dense cell
作者:
张净12 孙志挥1
1 东南大学计算机科学与工程系, 南京 210096; 2 江苏大学电气信息工程学院, 镇江 212001
Author(s):
Zhang Jing12 Sun Zhihui1
1 Department of Computer Science and Engineering, Southeast University, Nanjing 210096, China
2 College of Electronic and Information Engineering, Jiangsu University, Zhenjiang 212001, China
关键词:
数据挖掘 离群点 稠密单元 稠密区域
Keywords:
data mining outlier dense cell dense region GDLOF(grid-based and dence cell based on local outlier factor)
分类号:
TP311.13
DOI:
10.3969/j.issn.1001-0505.2005.06.007
摘要:
为了适应高维大规模数据集的稀疏性,解决现有离群点探测算法在运用于高维大规模数据集时计算量以及时间效率均无法令人满意的现状,区别于以往文献中以点的数量作为判断稠密的阈值,在基于密度的局部异常检测算法LOF的基础上,以通过数据集中每一点周围的邻近点的状况作为判别依据,提出了稠密单元和稠密区域的概念以及基于网格和稠密单元的快速局部离群点探测算法.通过证明稠密单元和稠密区域中的点不可能成为离群点,使得算法减少了LOF值的计算量并显著提高效率.实验表明,该算法对于高维大规模数据集具有良好的适用性和有效性.
Abstract:
Considering the sparse character of high-dimensional and large-scale datasets, the actuality that current algorithms for outlier detection applications are not so satisfactory in calculation cost and efficiency when dealing with high-dimensional and large-scale datasets is supposed to be altered. The fast local outlier detection algorithm with grid-based and dense notion was presented, which was based on the density-based local outlier detection algorithm(LOF)and judged outlier according as a wealth of information about the data in the vicinity of the point, and differed from the current algorithms which took the number of point as the parameter to judge denseness. By means of proving that those points in dense cell and dense region are not outlier, this algorithm can decrease computation amount and improve the efficiency of LOF algorithm while keeping the desirable detection accuracy. Results of experiments indicate that the new algorithm is effective and practicable for high-dimensional and large-scale datasets.

参考文献/References:

[1] Han J,Kamber M. Data mining:concepts and techniques [M].San Fransisco,CA,USA:Morgan Kaufmann Publishers,2000.381-389.
[2] Hawkins D. Identification of outliers [M].London:Chapman and Hall,1980.1-10.
[3] Knorr E M,Ng R T.Algorithms for mining distance-based outliers in large datasets[A].In: Proceedings of the 24th VLDB Conference[C].New York,1998.392-403.
[4] Jin W,Tung A K H,Han J.Mining top-n local outliers in large databases[A].In: Proceedings of 7th ACM SIGMOD International Conference on Knowledge Discovery and Data Mining[C].San Francisco,2001.293-298.
[5] Breuning M M,Kriegel H P,Ng R T,et al.LOF:identifying density-based local outlier[A].In:Proc ACM SIGMOD’00 Int Conf on Management of Data[C].Dalles,TX,2000.93-104.
[6] Chiu A L M,Fu A W C.Enhancements on local outlier detection[A].In:The(IEEE)7th Int Database Engineering and Applications Symposium[C].Hong Kong,2003.298-307.
[7] Zhao Y C,Song J D.AGRID:an efficient algorithm for clustering large high-dimensional datasets [A].In: Proc of the 7th Pacific-Asia Conference on Knowledge Discovery and Data Mining[C].Seoul,Korea,2003.271-282.
[8] Aggarwal C C,Yu P S.Outlier detection for high dimensional data[A].In: Proc of the ACM SIGMOD Int Conf on Management of Data[C].Santa Barbara,CA,2001.37-46.
[9] Ramaswamy S,Rastogi R,Shim K.Efficient algorithms for mining outliers from large data sets[J].ACM Sigmoid Record,2000,29(2):427-438.
[10] Knorr E M,Ng R T,Tucakov V.Distance-based outliers:algorithm and applications[J]. VLDB Journal,2000,8(3):237-253.
[11] 魏藜,宫学庆,钱卫宁,等.高维空间中的离群点发现[J].软件学报,2002,13(2):280-290.
  Wei li,Gong Xueqing,Qian Weining,et al.Finding outliers in high-dimensional space [J]. Journal of Software,2002,13(2):280-290.(in Chinese)
[12] 李存华,孙志挥.GridOF:面向大规模数据集的高效离群点检测算法[J].计算机研究与发展,2003,40(11):1586-1592.
  Li Cunhua,Sun Zhihui.GridOF:an efficient outlier detection algorithm for very large datasets [J].Journal of Computer Research and Development,2003,40(11):1586-1592.(in Chinese)

相似文献/References:

[1]赵传申,孙志挥.半结构化文档数据流的快速频繁模式挖掘[J].东南大学学报(自然科学版),2006,36(3):452.[doi:10.3969/j.issn.1001-0505.2006.03.025]
 Zhao Chuanshen,Sun Zhihui.Fast mining frequent patterns in semi-structured data stream[J].Journal of Southeast University (Natural Science Edition),2006,36(6):452.[doi:10.3969/j.issn.1001-0505.2006.03.025]
[2]陆建江,徐宝文,邹晓峰,等.模糊关联规则的并行挖掘算法[J].东南大学学报(自然科学版),2005,35(2):165.[doi:10.3969/j.issn.1001-0505.2005.02.001]
 Lu Jianjiang,Xu Baowen,Zou Xiaofeng,et al.Parallel mining algorithm for fuzzy association rules[J].Journal of Southeast University (Natural Science Edition),2005,35(6):165.[doi:10.3969/j.issn.1001-0505.2005.02.001]
[3]丁艺明,金远平.一种基于记录分区的多值关联规则挖掘算法[J].东南大学学报(自然科学版),2000,30(2):6.[doi:10.3969/j.issn.1001-0505.2000.02.002]
 Ding Yiming,Jin Yuanping.A Record Partition Based Algorithm for Mining Quantitative Association Rules[J].Journal of Southeast University (Natural Science Edition),2000,30(6):6.[doi:10.3969/j.issn.1001-0505.2000.02.002]
[4]朱慧云,陈森发,张丽杰.动态环境下多个时期的客户购物模式变化挖掘[J].东南大学学报(自然科学版),2012,42(5):1012.[doi:10.3969/j.issn.1001-0505.2012.05.038]
 Zhu Huiyun,Chen Senfa,Zhang Lijie.Change mining of customer shopping patterns from multi-period datasets under dynamic environment[J].Journal of Southeast University (Natural Science Edition),2012,42(6):1012.[doi:10.3969/j.issn.1001-0505.2012.05.038]
[5]陆介平,刘月波,倪巍伟,等.基于PrefixSpan的快速交互序列模式挖掘算法[J].东南大学学报(自然科学版),2005,35(5):692.[doi:10.3969/j.issn.1001-0505.2005.05.008]
 Lu Jieping,Liu Yuebo,Ni Weiwei,et al.Fast interactive sequential pattern mining algorithm based on PrefixSpan[J].Journal of Southeast University (Natural Science Edition),2005,35(6):692.[doi:10.3969/j.issn.1001-0505.2005.05.008]
[6]杨明,孙志挥,吉根林.一种基于分布式数据库的全局频繁项目集更新算法[J].东南大学学报(自然科学版),2002,32(6):879.[doi:10.3969/j.issn.1001-0505.2002.06.012]
 Yang Ming,Sun Zhihui,Ji Genlin.Algorithm based on distributed database for updating global frequent itemsets[J].Journal of Southeast University (Natural Science Edition),2002,32(6):879.[doi:10.3969/j.issn.1001-0505.2002.06.012]
[7]陈岭,陈元中,陈根才.基于操作序列挖掘的OLAP查询推荐方法[J].东南大学学报(自然科学版),2011,41(3):498.[doi:10.3969/j.issn.1001-0505.2011.03.013]
 Chen Ling,Chen Yuanzhong,Chen Gencai.Operation sequence mining based OLAP query recommendation method[J].Journal of Southeast University (Natural Science Edition),2011,41(6):498.[doi:10.3969/j.issn.1001-0505.2011.03.013]
[8]胡孔法,唐小丽,达庆利,等.一种高效挖掘高维数据的频繁闭合模式算法[J].东南大学学报(自然科学版),2007,37(4):569.[doi:10.3969/j.issn.1001-0505.2007.04.005]
 Hu Kongfa,Tang Xiaoli,Da Qingli,et al.Efficient algorithm for frequent closed patterns mining from high dimensional data[J].Journal of Southeast University (Natural Science Edition),2007,37(6):569.[doi:10.3969/j.issn.1001-0505.2007.04.005]
[9]龚振志,胡孔法,达庆利,等.DMGSP:一种快速分布式全局序列模式挖掘算法[J].东南大学学报(自然科学版),2007,37(4):574.[doi:10.3969/j.issn.1001-0505.2007.04.006]
 Gong Zhenzhi,Hu Kongfa,Da Qingli,et al.DMGSP: an algorithm of distributed mining global sequential pattern on distributed system[J].Journal of Southeast University (Natural Science Edition),2007,37(6):574.[doi:10.3969/j.issn.1001-0505.2007.04.006]
[10]肖利,金远平,徐宏炳,等.一个新的挖掘广义关联规则算法[J].东南大学学报(自然科学版),1997,27(6):76.[doi:10.3969/j.issn.1001-0505.1997.06.015]
 Xiao Li,Jin Yuanping,Xu Hongbing,et al.A New Algorithm for Mining Generalized Association Rules[J].Journal of Southeast University (Natural Science Edition),1997,27(6):76.[doi:10.3969/j.issn.1001-0505.1997.06.015]

备注/Memo

备注/Memo:
基金项目: 国家自然科学基金资助项目(70371015)、教育部高等学校博士学科点专项科研基金资助项目.
作者简介: 张净(1975—),女,博士,讲师,jszj08062000@yahoo.com.cn; 孙志挥(联系人),男,教授,博士生导师,sunzh@seu.edu.cn.
更新日期/Last Update: 2005-11-20