[1]冯静,金远平,冯欣.基于主成分分析及匹配聚类分析的数据表语义压缩方法[J].东南大学学报(自然科学版),2006,36(6):927-930.[doi:10.3969/j.issn.1001-0505.2006.06.011]
 Feng Jing,Jin Yuanping,Feng Xin.Semantic compression for data tables based on principal component and matching clustering analysis[J].Journal of Southeast University (Natural Science Edition),2006,36(6):927-930.[doi:10.3969/j.issn.1001-0505.2006.06.011]
点击复制

基于主成分分析及匹配聚类分析的数据表语义压缩方法()
分享到:

《东南大学学报(自然科学版)》[ISSN:1001-0505/CN:32-1178/N]

卷:
36
期数:
2006年第6期
页码:
927-930
栏目:
计算机科学与工程
出版日期:
2006-11-20

文章信息/Info

Title:
Semantic compression for data tables based on principal component and matching clustering analysis
作者:
冯静 金远平 冯欣
东南大学计算机科学与工程学院, 南京 210096
Author(s):
Feng Jing Jin Yuanping Feng Xin
School of Computer Science and Engineering, Southeast University, Nanjing 210096, China
关键词:
语义压缩 主成分分析 匹配程度
Keywords:
semantic compression principal component analysis matching degree
分类号:
TP311.13
DOI:
10.3969/j.issn.1001-0505.2006.06.011
摘要:
提出一种基于主成分分析及匹配聚类分析的数据表语义压缩方法PCA-Clustering.主成分分析利用属性间相关性,提取主成分以实现纵向压缩; 匹配聚类通过对匹配程度的量度决定元组的隶属,用较少的簇集代表元组代替所有元组以实现横向压缩,并充分利用较小的允许误差取得更好的压缩比.仿真实验结果表明,在数据属性间线性相关关系明显的情况下,PCA-Clustering在压缩比方面平均优于Fascicles和ItCompress 10%~15%左右; 与采用CaRT模型的SPARTAN相比,由于CaRT对于线性相关明显的数值型属性效果不够理想,PCA-Clustering仍然具有较好的压缩比.
Abstract:
A principal component analysis and matching clustering based approach to semantic compression for data tables, PCA-Clustering, is proposed. The principal component analysis extracts the principal component and implements the column-wise compression, using the correlation between attributes. The matching clustering analysis determines which group a row should belong to through matching degree measurement, replacing all rows with the cluster representative rows of which the number is much small and thus implementing the row-wise compression. The simulation experiment results show that when there is a strong linear correlation between data attributes, PCA-Clustering can achieve better compression effect than existed methods. More specifically, the compression ratio of PCA-Clustering is about 10%-15% higher than that of Fascicles and ItCompress. Compared with SPARTAN using CaRT model, PCA-Clustering also has a better compression ratio because CaRT is not very effective for numeric attributes with a strong linear correlation.

参考文献/References:

[1] Mertz D.A data compression primer[EB/OL].(2000-04)[2005-03].http://www-128.ibm.com/developerworks/library/l-compr.html.
[2] Jagadish H V,Madar J,Ng R.Semantic compression and pattern extraction with fascicles[C] //Proc 1999 Int Conf Very Large Data Bases(VLDB’99).Edinburgh,UK,1999:186-197.
[3] Jagadish H V,Ng R T,Ool B C,et al.ItCompress:an iterative semantic compression algorithm[C] //20th International Conference on Data Engineering(ICDE’04).Boston,MA,USA,2004:646-657.
[4] Babu S,Garofalakis M,Rastogi R.SPARTAN:a model-based semantic compression system for massive data tables[C] //Proc of the ACM SIGMOD’2001 International Conference on Management of Data.Santa Barbara,California,2001:22-49.
[5] Babu S,Garofalakis M,Rastogi R.SPARTAN:using constrained models for guaranteed-error semantic compression [J].SIGKDD Explorations,2002,4(2):11-20.
[6] Sun W-S,Fan Y-P,Chen Y-P,et al.Feature data enriching approach based on immune clustering [J].Information and Control,2005,34(2):181-187.
[7] Fan Y-P,Chen Y-P,Sun W-S,et al.Algorithm for bi-directional reduce feature data based on the principal component analysis and immune clustering [J]. Journal of System Simulation,2005,17(1):148-153.

相似文献/References:

[1]邹红艳,达飞鹏,李晓莉.基于面部曲线特征融合的三维人脸识别[J].东南大学学报(自然科学版),2012,42(4):618.[doi:10.3969/j.issn.1001-0505.2012.04.008]
 Zou Hongyan,Da Feipeng,Li Xiaoli.3D face recognition using compositional features from facial curves[J].Journal of Southeast University (Natural Science Edition),2012,42(6):618.[doi:10.3969/j.issn.1001-0505.2012.04.008]
[2]丁幼亮,李爱群,耿方方.考虑环境因素影响的悬索桥整体状态预警方法[J].东南大学学报(自然科学版),2010,40(5):1052.[doi:10.3969/j.issn.1001-0505.2010.05.032]
 Ding Youliang,Li Aiqun,Geng Fangfang.Monitoring and warning of health conditions for suspension bridges under varying environmental conditions[J].Journal of Southeast University (Natural Science Edition),2010,40(6):1052.[doi:10.3969/j.issn.1001-0505.2010.05.032]
[3]李玉民,李旭宏,毛海军,等.主成分聚类分析在省域物流规划中的应用[J].东南大学学报(自然科学版),2004,34(4):549.[doi:10.3969/j.issn.1001-0505.2004.04.028]
 Li Yumin,Li Xuhong,Mao Haijun,et al.Application of principal component analysis and cluster analysis to provincial logistics planning[J].Journal of Southeast University (Natural Science Edition),2004,34(6):549.[doi:10.3969/j.issn.1001-0505.2004.04.028]

备注/Memo

备注/Memo:
基金项目: 国家自然科学基金重大研究资助项目(90412014)、东南大学科学基金资助项目(XJ0409150).
作者简介: 冯静(1982—),女,硕士生; 金远平(联系人),男,教授,ypjin@seu.edu.cn.
更新日期/Last Update: 2006-11-20