[1]吴含前,周立凤,谢珏.二次剪枝算法在评论特征提取中的应用[J].东南大学学报(自然科学版),2016,46(3):513-517.[doi:10.3969/j.issn.1001-0505.2016.03.010]
 Wu Hanqian,Zhou Lifeng,Xie Jue.Application of secondary pruning algorithm in commentary feature extraction[J].Journal of Southeast University (Natural Science Edition),2016,46(3):513-517.[doi:10.3969/j.issn.1001-0505.2016.03.010]
点击复制

二次剪枝算法在评论特征提取中的应用
分享到:

《东南大学学报(自然科学版)》[ISSN:1001-0505/CN:32-1178/N]

卷:
46
期数:
2016年第3期
页码:
513-517
栏目:
计算机科学与工程
出版日期:
2016-05-20

文章信息/Info

Title:
Application of secondary pruning algorithm in commentary feature extraction
作者:
吴含前1周立凤1谢珏2
1东南大学计算机科学与工程学院, 南京211189; 2东南大学蒙纳士大学苏州联合研究生院, 苏州215123
Author(s):
Wu Hanqian1 Zhou Lifeng1 Xie Jue2
1 School of Computer Science and Engineering, Southeast University, Nanjing 211189, China
2Southeast University-Monash University Joint Graduate School, Suzhou 215123, China
关键词:
特征提取 二次剪枝 词对共现度 似然比检验 交叉语言模型
Keywords:
feature extraction secondary pruning term pair co-occurrence weight likelihood ratio test cross language model
分类号:
TP315.69
DOI:
10.3969/j.issn.1001-0505.2016.03.010
摘要:
针对序列模式挖掘(GSP)算法在中文产品评论特征提取中准确率不够高的问题,提出了一种二次剪枝算法,即利用GSP算法产生候选特征集,然后采用词对共现度作为阈值对其进行进一步筛选,从而达到提高准确率的目的.利用定制化的爬虫工具从京东网站上抓取摄像头产品的中文评论,选取其中1 000条作为试验数据,采用分词工具ICTCLAS对评论进行分词和数据预处理,并将所提算法与GSP算法、交叉语言模型(CLM)和似然比检验(LRT)进行对比试验.结果表明,利用所提算法获得的中文产品评论特征提取准确率达到76.37%,较GSP算法、CLM和LRT的准确率分别提高2.94%,5.77%和7.57%.
Abstract:
Aiming at the low accuracy rate of the generalized sequence pattern(GSP)algorithm on product feature extraction from Chinese online reviews, a secondary pruning algorithm is proposed. In this algorithm, based on the candidate collection of the output of the GSP algorithm, the term pair co-occurrence weight(TPCW)is used as the threshold for further filtering to improve the accuracy rate.The customized tools are used to crawl the product Chinese reviews of cameras from Jingdong website. 1 000 reviews are selected as the experimental data and the segmentation tool ICTCLAS is used on the word segmentation and data preprocessing. The proposed algorithm is compared with the GSP algorithm, the cross language model(CLM), and the likelihood ratio test(LRT). The results show that the accuracy rate of the proposed algorithm on product feature extraction from Chinese online reviews is 76.37%, which is higher than those of the GSP algorithm, CLM and LRT by 2.94%, 5.77% and 7.57%, respectively.

参考文献/References:

[1] Hu M, Liu B. Mining opinion features in customer reviews[C]//Proceedings of the 19th National Conference on Artifical Intelligence. Chicago, Illinois,USA, 2004:755-760.
[2] Li F, Pan S J, Jin O, et al. Cross-domain co-extraction of sentiment and topic lexicons[C]//Meeting of the Association for Computational Linguistics: Long Papers. Beijing,China, 2012:410-419.
[3] Fei G, Liu B, Hsu M, et al. A dictionary-based approach to identifying aspects implied by adjectives for opinion mining[C]//24th International Conference on Computational Linguistics. Chicago, Illinois,USA,2012:309-318.
[4] Zhang Y, Xu W. Fast exact maximum likelihood estimation for mixture of language model[J]. Information Processing & Management, 2008, 44(3):1076-1085. DOI:10.1016/j.ipm.2007.12.003.
[5] Popescu A-M, Etzioni O. Extracting product features and opinions from reviews[C]//Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing. Washington, DC, USA, 2005:32-33.
[6] Chen Y, Wang X. Text feature extraction based on joint conditional entropy[C]//Proceedings of 2012 2nd International Conference on Computer Science and Network Technology. Changchun,China, 2012:2055-2058.
[7] 李实, 叶强, 李一军, 等. 挖掘中文网络客户评论的产品特征及情感倾向[J]. 计算机应用研究, 2010, 27(8):3016-3019. DOI:10.3969/j.issn.1001-3695.2010.08.054.
  Li Shi, Ye Qiang, Li Yijun, et al. Mining product features and sentiment orientation from Chinese customer reviews[J]. Application Research of Computers, 2010, 27(8):3016-3019. DOI:10.3969/j.issn.1001-3695.2010.08.054.(in Chinese)
[8] Javed K, Babri H A, Saeed M. Feature selection based on class-dependent densities for high-dimensional binary data[J]. IEEE Trans Knowl Data Eng, 2012, 24(3):465-477. DOI:10.1109/tkde.2010.263.
[9] Agrawal R, Srikant R. Mining sequential patterns[C]//Proceedings of the Eleventh International Conference on Data Engineering. Taipei,China, 1995: 3-14.
[10] Zhai C, Lafferty J. Model-based feedback in the language modeling approach to information retrieval[C]//Proceedings of the Tenth International Conference on Information and Knowledge Management. Pittsburgh, Pennsylvania, USA, 2001: 403-410.
[11] Ferreira L, Jakob N, Gurevych I. A comparative study of feature extraction algorithms in customer reviews[C]// 2008 IEEE International Conference on Semantic Computing. Santa Clara, California,USA, 2008: 144-151. DOI:10.1109/icsc.2008.40.
[12] Zheng Y, Ye L, Wu G F, et al. Extracting product features from Chinese customer reviews[C]//2008 3rd International Conference on Intelligent System and Knowledge Engineering. Xiamen,China, 2008: 285-290. DOI:10.1109/iske.2008.4730942.

相似文献/References:

[1]富煜清,顾明亮.基于神经网络的人脸主特征提取[J].东南大学学报(自然科学版),1995,25(5):118.[doi:10.3969/j.issn.1001-0505.1995.05.021]
 Fu Yuqing,Gu,Mingliang.Facial Feature Extraction Based on the Neural Network[J].Journal of Southeast University (Natural Science Edition),1995,25(3):118.[doi:10.3969/j.issn.1001-0505.1995.05.021]
[2]李拟珺,程旭,郭海燕,等.基于多特征融合和分层反向传播增强算法的人体动作识别[J].东南大学学报(自然科学版),2014,44(3):493.[doi:10.3969/j.issn.1001-0505.2014.03.008]
 Li Nijun,Cheng Xu,Guo Haiyan,et al.Human action recognition based on multi-feature fusion and hierarchical BP-AdaBoost algorithm[J].Journal of Southeast University (Natural Science Edition),2014,44(3):493.[doi:10.3969/j.issn.1001-0505.2014.03.008]

备注/Memo

备注/Memo:
收稿日期: 2015-08-22.
作者简介: 吴含前(1972—),男,博士,副教授, hanqian@seu.edu.cn.
基金项目: 中央高校基本科研业务费专项资金资助项目、国家高技术研究发展计划(863计划)资助项目(2015AA015904).
引用本文: 吴含前,周立凤,谢珏.二次剪枝算法在评论特征提取中的应用[J].东南大学学报(自然科学版),2016,46(3):513-517. DOI:10.3969/j.issn.1001-0505.2016.03.010.
更新日期/Last Update: 2016-05-20