[1]於跃成,王建东,郑关胜,等.基于约束信息的并行k-means算法[J].东南大学学报(自然科学版),2011,41(3):505-508.[doi:10.3969/j.issn.1001-0505.2011.03.014] 　Yu Yuecheng,Wang Jiandong,Zheng Guansheng,et al.Parallel k-means algorithm based on constrained information[J].Journal of Southeast University (Natural Science Edition),2011,41(3):505-508.[doi:10.3969/j.issn.1001-0505.2011.03.014] 点击复制 基于约束信息的并行k-means算法() 分享到： var jiathis_config = { data_track_clickback: true };

41

2011年第3期

505-508

2011-05-20

文章信息/Info

Title:
Parallel k-means algorithm based on constrained information

(1南京航空航天大学信息科学与技术学院,南京 210016)(2江苏科技大学计算机科学与工程学院,镇江 212003)
Author(s):
(1College of Information Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China)
(2College of Computer Science and Engineering, Jiangsu University of Science and Technology, Zhenjiang 212003, China)

Keywords:

TP391.4
DOI:
10.3969/j.issn.1001-0505.2011.03.014

Abstract:
In order to obtain the desired clustering results on the distributed data set, a parallel k-means algorithm is presented based on constrained information. On the basis of the facts that the parallel k-means algorithm can be effectively used in clustering the horizontal distributed data set, the objective function of the parallel k-means algorithm is modified, and the constrained parallel k-means algorithm is designed, then the constrained information of site users is introduced into the distributed clustering process in the form of chunklets, which can guide the algorithm to a bias search. Theoretically the algorithm guarantees the inter-cluster distance among the unconstrained samples to be the closest, and guarantees the average distance between constrained samples in a chunklet and the corresponding cluster center to be the closest one. The results from the experiments show that the algorithm can effectively enhance the clustering precision of parallel k-means, meanwhile it can obtain the clustering results on the distributed data set, which are equivalent to the results of the constrained k-means algorithm running on a centralized data set.

参考文献/References:

[1] Inan A,Kaya S V,Saygin Y,et al.Privacy preserving clustering on horizontally partitioned data [J].Data and Knowledge Engineering,2007,63(3):646-666.
[2] Dhillon I S,Modha D S.A data-clustering algorithm on distributed memory multiprocessors [C]//Proceedings of the KDD’99 Workshop on High Performance Knowledge Discovery.San Digeo,USA,1999:245-260.
[3] MacQueen J.Some methods for classification and analysis of multivariate observations [C]//The 5th Berkeley Symposium on Mathematical Statistics and Probability.Berkeley,USA,1967:281-297.
[4] Bandyopadhyay S,Giannella C,Maulik U,et al.Clustering distributed data streams in peer to peer environments [J].Information Sciences,2006,176(14):1952-1985.
[5] Datta S,Bhaduri K,Giannella C,et al.Distributed data mining in peer-to-peer networks [J].Internet Computing,2006,10(4):18-26.
[6] Datta S,Giannella C,Kargupta H.Approximate distributed k-means clustering over a peer-to-peer network [J].IEEE Transactions on Knowledge and Data Engineering,2009,21(10):1372-1388.
[7] Jin R M,Goswami A,Agrawal G.Fast and exact out-of-core and distributed k-means clustering [J].Knowledge and Information Systems,2006,10(1):17-40.
[8] Wagstaff K,Cardie C,Rogers S,et al.Constrained k-means clustering with background knowledge [C]//Proceeding of 18th International Conference on Machine Learning.Williamstown,USA,2001:577-584.
[9] Basu S,Banerjee A,Mooney R J.Active semi-supervision for pairwise constrained clustering [C]//Proceedings of the 2004 SIAM International Conference on Data Mining.Lake Buena Vista,FL,USA,2004:333-344.
[10] Bar-Hillel A,Hertz T,Shental N,et al.Learning a Mahalanobis metric from equivalence constraints [J].Journal of Machine Learning Research,2005,6:937-965.
[11] Alipanahi B,Biggs M,Ghodsi A,et al.Distance metric learning vs.fisher discriminant analysis[C]//Proceedings of the 23th AAAI Conference on Artificial Intelligence.Chicago,USA,2008:598-603.

相似文献/References:

[1]吉根林,凌霄汉,杨明.一种基于集成学习的分布式聚类算法[J].东南大学学报(自然科学版),2007,37(4):585.[doi:10.3969/j.issn.1001-0505.2007.04.008]
Ji Genlin,Ling Xiaohan,Yang Ming.Distributed clustering algorithm based on ensemble learning[J].Journal of Southeast University (Natural Science Edition),2007,37(3):585.[doi:10.3969/j.issn.1001-0505.2007.04.008]