[1]吉根林,凌霄汉,杨明.一种基于集成学习的分布式聚类算法[J].东南大学学报(自然科学版),2007,37(4):585-588.[doi:10.3969/j.issn.1001-0505.2007.04.008]
 Ji Genlin,Ling Xiaohan,Yang Ming.Distributed clustering algorithm based on ensemble learning[J].Journal of Southeast University (Natural Science Edition),2007,37(4):585-588.[doi:10.3969/j.issn.1001-0505.2007.04.008]
点击复制

一种基于集成学习的分布式聚类算法()
分享到:

《东南大学学报(自然科学版)》[ISSN:1001-0505/CN:32-1178/N]

卷:
37
期数:
2007年第4期
页码:
585-588
栏目:
计算机科学与工程
出版日期:
2007-07-20

文章信息/Info

Title:
Distributed clustering algorithm based on ensemble learning
作者:
吉根林 凌霄汉 杨明
南京师范大学计算机系, 南京 210097
Author(s):
Ji Genlin Ling Xiaohan Yang Ming
Department of Computer Science, Nanjing Normal University, Nanjing 210097, China
关键词:
K-means 分布式聚类 数据挖掘 集成学习
Keywords:
K-means distributed clustering data mining ensemble learning
分类号:
TP311
DOI:
10.3969/j.issn.1001-0505.2007.04.008
摘要:
基于集成学习的思想,提出一种分布式聚类模型.该模型的分布式处理过程分为2个阶段:先在局部站点局部聚类,然后在全局站点全局聚类.局部站点的局部聚类看作是一种基于数据子集的学习过程,所有的局部聚类结果组成了聚类集成系统的个体学习器,全局聚类采用平均法对局部结果进行集成,并定义了一个准则函数来度量集成的精度.把K-means算法推广到分布式环境,提出一种基于该模型的分布式K均值算法DK-means,该算法对局部数据的分布有较强的伸缩性.实验结果表明,DK-means在同等条件下能达到集中式聚类的精度水平,是有效可行的,从而验证了基于集成学习的分布式聚类模型的有效性.
Abstract:
A distributed clustering model based on ensemble learning is proposed. A typical distributed clustering scenario of the model is a ‘two-stage’ course, which firstly does clustering in local sites and then in global site. The local clustering results transmitted to server site form an ensemble and combining schemes of ensemble learning use the ensemble to generate global clustering results. The model converts distributed clustering into a combinatorial optimization problem. As an implementation for the model, a novel distributed K-means called DK-means is presented. DK-means firstly does clustering in each local site using K-means, then does clustering in global site which receives clustering results from local sites by K-means again. Despite the fact that data distribution varies in any local site, it always works well. Experimental results show that DK-means is effective and efficient. So it is also an empirical verification of validity to the model.

参考文献/References:

[1] Mclachlan G,Basford K. Mixture models:inference and application to clustering[M].New York:Dekker Press,1988.
[2] Ester M,Kriegel H P,Sander J,et al.A density based algorithm for discovering clusters in large spatial databases with noise[C] //Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining.Portland,CA:AAAI Press,1996:226-231.
[3] Xu X,Jäger J,Kriegel H P.A fast parallel clustering algorithm for large spatial databases [J].Data Mining and Knowledge Discovery,1999,3(3):263-290.
[4] 倪巍伟,陆介平,孙志挥.基于向量内积不等式的分布式K均值聚类算法[J].计算机研究与发展,2005,42(9):1493-1497.
  Ni Weiwei,Lu Jieping,Sun Zhihui.An effective distributed K-means clustering algorithm based on the pretreatment of vector’s inner-product [J]. Journal of Computer Research and Development,2005,42(9):1493-1497.(in Chinese)
[5] Kriegel H P,Kröger P,Pryakhin A,et al.Effective and efficient distributed model-based clustering[C] //Proceedings of the 5th IEEE International Conference on Data Mining.Houston,TX,USA,2005:258-265.
[6] Prodip H,Lawrence O H.Scalable clustering:a distributed approach[C] //IEEE International Conference on Fuzzy Systems.Budapest,Hungary,2004:143-148.
[7] Dietterich T G.Machine learning research:four current directions[J]. AI Magazine,1997,18(4):97-136.
[8] Strehl A,Ghosh J.Cluster ensembles—a knowledge reuse framework for combining partitionings [C] //Proceedings of the 18th National Conference on Artificial Intelligence.Menlo Park:AAAI Press,2002:93-98.
[9] Newman D J,Hettich S,Blake C L,et al.UCI repository of machine learning databases [EB/OL].(1998)[2006-6-18].http://www.ics.uci.edu/~mlearn/MLRepository.html.
[10] Modha D S,Spangler W S.Feature weighting in K-means clustering [J]. Machine Learning,2003,52(3):217-237.

相似文献/References:

[1]於跃成,王建东,郑关胜,等.基于约束信息的并行k-means算法[J].东南大学学报(自然科学版),2011,41(3):505.[doi:10.3969/j.issn.1001-0505.2011.03.014]
 Yu Yuecheng,Wang Jiandong,Zheng Guansheng,et al.Parallel k-means algorithm based on constrained information[J].Journal of Southeast University (Natural Science Edition),2011,41(4):505.[doi:10.3969/j.issn.1001-0505.2011.03.014]

备注/Memo

备注/Memo:
基金项目: 江苏省自然科学基金资助项目(BK2005135).
作者简介: 吉根林(1964—),男,博士,教授,博士生导师, glji@njnu.edu.cn.
更新日期/Last Update: 2007-07-20