[1]高岭,申元,高妮,等.基于文本挖掘的漏洞信息聚类分析[J].东南大学学报(自然科学版),2015,45(5):845-850.[doi:10.3969/j.issn.1001-0505.2015.05.006]
 Gao Ling,Shen Yuan,Gao Ni,et al.Clustering analysis of vulnerability information based on text mining[J].Journal of Southeast University (Natural Science Edition),2015,45(5):845-850.[doi:10.3969/j.issn.1001-0505.2015.05.006]
点击复制

基于文本挖掘的漏洞信息聚类分析()
分享到:

《东南大学学报(自然科学版)》[ISSN:1001-0505/CN:32-1178/N]

卷:
45
期数:
2015年第5期
页码:
845-850
栏目:
计算机科学与工程
出版日期:
2015-09-20

文章信息/Info

Title:
Clustering analysis of vulnerability information based on text mining
作者:
高岭申元高妮雷艳婷孙骞
西北大学信息科学与技术学院, 西安710069
Author(s):
Gao Ling Shen Yuan Gao Ni Lei Yanting Sun Qian
School of Information Science and Technology, Northwest University, Xi’an 710069, China
关键词:
漏洞信息 聚类 粒子群优化算法 文本挖掘 余弦相似度
Keywords:
vulnerability information clustering particle swarm optimization algorithm text mining cosine similarity
分类号:
TP393
DOI:
10.3969/j.issn.1001-0505.2015.05.006
摘要:
为了挖掘漏洞内在联系且高效管理漏洞信息,将文本处理和聚类算法应用于漏洞挖掘中.从漏洞库宏观角度出发,提出了一种基于文本挖掘和粒子群优化算法的漏洞信息聚类(PSO-K-means)算法.首先,通过文本处理,获取频词空间,用以将漏洞信息描述字段编码化;其次,为了减少局部最优和聚类中心选取不当对聚类结果的影响,利用粒子群优化算法获取全局聚类中心;最后,利用K-means算法实现漏洞信息的聚类,对漏洞信息进行分类别管理,并为预测未知漏洞特征提供参考.实验结果表明,PSO-K-means算法准确率达到90.16%,与K-means算法相比,其平均准确率提高约5%,平均迭代次数减少约45次.所提算法可预测3种未知漏洞的主要类别,是一种有效的漏洞分析方法.
Abstract:
In order to dig out the internal relationships of vulnerability and efficiently manage vulnerability information, text processing and clustering algorithm are applied to vulnerability mining. In the aspect of whole vulnerability database, the PSO-K-means algorithm based on text mining and particle swarm optimization(PSO)algorithm is proposed. First, the keyword space is obtained by text processing to code the vulnerability information description. Secondly, the PSO algorithm is used to obtain the global cluster centers for reducing the impact of local optimum and cluster centers’ improper selection on clustering. Finally, the K-means algorithm is adopted to achieve clustering of vulnerability information, which can administer vulnerability information in classification and supply information for predicting unknown vulnerability characteristics. The experimental results show that the accuracy rate of the PSO-K-means algorithm is 90.16%. Compared to the K-means algorithm, the average accuracy rate of the PSO-K-means algorithm is increased by about 5% and the average number of iteration is reduced about 45 times. Moreover, the PSO-K-means algorithm can predict three primary classes for unknown vulnerability and it is an effective vulnerability analysis method.

参考文献/References:

[1] 廖晓峰, 王永吉, 范修斌, 等. 基于LDA主题模型的安全漏洞分类[J]. 清华大学学报:自然科学版, 2012, 52(10): 1351-1355.
  Liao Xiaofeng, Wang Yongji, Fan Xiubin, et al. National security vulnerability database classification based on an LDA topic model[J].Journal of Tsinghua University:Science and Technology, 2012, 52(10): 1351-1355.(in Chinese)
[2] 刘文杰, 伍之昂, 曹杰, 等. 基于成对约束Info-Kmeans 聚类的图像索引方法[J]. 通信学报, 2013, 34(7): 159-166,173.
  Liu Wenjie, Wu Zhiang, Cao Jie, et al. Image indexing method based on clustering via Info-Kmeans under pair constraints[J]. Journal of Communications, 2013, 34(7): 159-166,173.(in Chinese)
[3] Abbott R P, Chin J S, Donnelley J E,et al. Security analysis and enhancements of computer operating systems[R]. Washington DC, USA: Institute for Computer Sciences and Technology, National Bureau of Standards, 1976.
[4] Bishop M, Bailey D. A critical analysis of vulnerability taxonomies: technical report CSE-96-11 [R]. Davis, CA, USA: Department of Computer Science at University of Californian Davis, 1996.
[5] Christey S. The preliminary list of vulnerability examples for researchers version 0.24[EB/OL].(2014-02-19)[2014-03-20], https://cwe.mitre.org/documents/sources/PLOVER.pdf.
[6] de Sá J P M. Pattern recognition: concepts, methods and applications [M]. Berlin, Germany: Springer-Verlag, 2001: 73-105.
[7] Omran M, Salman A, Engelbrecht A P. Image classification using particle swarm optimization[C]//Proceedings of the 4th Asia-Pacific Conference on Simulated Evolution and Learning. Singapore, 2002: 370-374.
[8] 陈永彬, 张琢. 智能单粒子优化算法在聚类分析中的应用[J].南京大学学报:自然科学, 2011, 47(5): 578-584.
  Chen Yongbin, Zhang Zhuo. An application of intelligent single particle optimizer in cluster analysis[J]. Journal of Nanjing University:Natural Sciences, 2011, 47(5): 578-584.(in Chinese)
[9] CNNVD. China national vulnerability database [DB/OL].(2013-05-07)[2013-11-03]. http://www.cnnvd.org.cn.
[10] 张玉清, 吴舒平, 刘奇旭,等. 国家安全漏洞库的设计和实现[J]. 通讯学报, 2011, 32(6): 93-100.
  Zhang Yuqing, Wu Shuping, Liu Qixu, et al. Design and implementation of national security vulnerability database[J]. Journal of Communications, 2011, 32(6): 93-100.(in Chinese)

相似文献/References:

[1]刘大峰,廖文和,戴宁,等.散乱点云去噪算法的研究与实现[J].东南大学学报(自然科学版),2007,37(6):1108.[doi:10.3969/j.issn.1001-0505.2007.06.033]
 Liu Dafeng,Liao Wenhe,Dai Ning,et al.Research and implementation for denoising noisy scattered point data[J].Journal of Southeast University (Natural Science Edition),2007,37(5):1108.[doi:10.3969/j.issn.1001-0505.2007.06.033]
[2]宋爱国,陆佶人.舰船噪声目标聚类分析的演化计算方法[J].东南大学学报(自然科学版),1997,27(6):24.[doi:10.3969/j.issn.1001-0505.1997.06.005]
 Song Aiguo,Lu Jiren.Evolutionary Computation for Ship Noise Targets Clustering Analysis[J].Journal of Southeast University (Natural Science Edition),1997,27(5):24.[doi:10.3969/j.issn.1001-0505.1997.06.005]
[3]黄书强,张震,周继鹏.无线Mesh网络节点聚类属性分析[J].东南大学学报(自然科学版),2012,42(2):219.[doi:10.3969/j.issn.1001-0505.2012.02.005]
 Huang Shuqiang,Zhang Zhen,Zhou Jipeng.Clustering attribute analysis on nodes of wireless Mesh networks[J].Journal of Southeast University (Natural Science Edition),2012,42(5):219.[doi:10.3969/j.issn.1001-0505.2012.02.005]
[4]张祥,李星,温韵清,等.语义网虚拟本体构建[J].东南大学学报(自然科学版),2015,45(4):652.[doi:10.3969/j.issn.1001-0505.2015.04.007]
 Zhang Xiang,Li Xing,Wen Yunqing,et al.Building virtual ontologies in semantic web[J].Journal of Southeast University (Natural Science Edition),2015,45(5):652.[doi:10.3969/j.issn.1001-0505.2015.04.007]
[5]张琳,张进.基于PPIN的社交网络推荐系统[J].东南大学学报(自然科学版),2017,47(3):478.[doi:10.3969/j.issn.1001-0505.2017.03.011]
 Zhang Lin,Zhang Jin.Social network recommendation system based on PPIN[J].Journal of Southeast University (Natural Science Edition),2017,47(5):478.[doi:10.3969/j.issn.1001-0505.2017.03.011]

备注/Memo

备注/Memo:
收稿日期: 2014-11-05.
作者简介: 高岭(1964—),男,博士,教授,博士生导师, gl@nwu.edu.cn.
基金项目: 国家自然科学基金资助项目(61373176)、“十二五”国家科技支撑计划资助项目(2013BAK01B02)、西安市科技计划资助项目(CXY1440(8)).
引用本文: 高岭,申元,高妮,等.基于文本挖掘的漏洞信息聚类分析[J].东南大学学报:自然科学版,2015,45(5):845-850. [doi:10.3969/j.issn.1001-0505.2015.05.006]
更新日期/Last Update: 2015-09-20