[1]吴含前,朱云杰,谢珏.基于逻辑回归的中文在线评论有效性检测模型[J].东南大学学报(自然科学版),2015,45(3):433-437.[doi:10.3969/j.issn.1001-0505.2015.03.004]
 Wu Hanqian,Zhu Yunjie,Xie Jue.Detection model of effectiveness of Chinese online reviews based on logistic regression[J].Journal of Southeast University (Natural Science Edition),2015,45(3):433-437.[doi:10.3969/j.issn.1001-0505.2015.03.004]
点击复制

基于逻辑回归的中文在线评论有效性检测模型()
分享到:

《东南大学学报(自然科学版)》[ISSN:1001-0505/CN:32-1178/N]

卷:
45
期数:
2015年第3期
页码:
433-437
栏目:
其他
出版日期:
2015-05-20

文章信息/Info

Title:
Detection model of effectiveness of Chinese online reviews based on logistic regression
作者:
吴含前1朱云杰1谢珏2
1东南大学计算机科学与工程学院, 南京 210018; 2东南大学-蒙纳士大学苏州联合研究生院, 苏州 215123
Author(s):
Wu Hanqian1 Zhu Yunjie1 Xie Jue2
1School of Computer Science and Engineering, Southeast University, Nanjing 210018, China
2Southeast University-Monash University Joint Graduate School, Suzhou 215123, China
关键词:
在线评论有效性 逻辑回归 关联规则
Keywords:
effectiveness of online review logistic regression association rule
分类号:
P315.69
DOI:
10.3969/j.issn.1001-0505.2015.03.004
摘要:
为了实现电子商务和社交网络中文在线评论有效性的自动化检测,提出了一种单一主题环境下基于逻辑回归的垃圾评论检测模型.中文在线评论有效性的检测可以归结为分类问题,结合中文在线评论的特点提取了9个特征以构建分类模型;为获取核心特征主题的相关度,采用基于关联规则的评论名词模式优化了ICTCLAS中文分词系统的主题识别,进而利用交叉语言模型获取在线评论主题相关度.实验中采取了人为标定的1 000条评论作为样本,把支持向量机分类模型作为对比进行试验,利用数据挖掘工具Weka进行计算.结果表明,采用优化评论名词模式下基于逻辑回归的垃圾评论检测模型结果的准确率达到83.54%,比支持向量机分类模型计算得到的准确率高2.10%.
Abstract:
In order to realize automated detection of the effectiveness of Chinese online reviews in the context of e-commerce and social networks, a spam detection model based on logistic regression to solve single topic classification problem is proposed. The detection of effectiveness of Chinese online reviews can be regarded as a classification problem. According to the characteristics of Chinese online reviews, nine features are extracted to build the classification model. In order to extract the core feature-topic relevance, an association rule based review term mode is utilized to optimize the topics identification in ICTCLAS(Institute of Computing Technology, Chinese Lexical Analysis System). The cross language model is then used to retrieve relevancy between online review topics. In the experiment, a sample of 1 000 human-labeled reviews is used, and the support vector machine(SVM)classification model is adopted as a comparison. The calculation results of the data mining tool Weka demonstrate that the accuracy rate of the proposed logistic regression classification model based on the optimized review term classification mode is 83.54%, which is 2.10% higher than that of the SVM classification model.

参考文献/References:

[1] 中国互联网络信息中心. 2013年中国网络购物市场研究报告[EB/OL].(2014-04-21)[2014-10-20]. http://www.cnnic.cn/hlwfzyj/hlwxzbg/dzswbg/201404/t20140421_46598.htm.
[2] Karkare V Y, Gupta S R. A survey on product evaluation using opinion mining [J]. International Journal of Computer Science and Applications, 2013, 6(2): 306-312.
[3] Sheibani A A. Opinion mining and opinion spam: a literature review focusing on product reviews[C]//2012 Sixth International Symposium on Telecommunications(IST). Tehran, Iran, 2012: 1109-1113.
[4] Lim E P, Nguyen V A, Jindal N, et al. Detecting product review spammers using rating behaviors[C]//Proceedings of the 19th ACM International Conference on Information and Knowledge Management. New York, USA, 2010: 939-948.
[5] Jindal N, Liu B, Lim E P. Finding unusual review patterns using unexpected rules[C]//Proceedings of the 19th ACM International Conference on Information and Knowledge Management. New York, USA, 2010: 1549-1552.
[6] Mukherjee A, Kumar A, Liu B, et al. Spotting opinion spammers using behavioral footprints[C]//Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA, 2013: 632-640.
[7] Jindal N, Liu B. Opinion spam and analysis[C]//Proceedings of the 2008 International Conference on Web Search and Data Mining. New York, USA, 2008: 219-230.
[8] Ott M, Cardie C, Hancock J T. Negative deceptive opinion spam[C]//North American Chapter of the Association for Computational Linguistics-Human Language Technologies. Atlanta, Georgia, 2013: 497-501.
[9] Lin Y, Zhu T, Wang X, et al. Towards online review spam detection[C]//Proceedings of the Companion Publication of the 23rd International Conference on World Wide Web Companion. New York, USA, 2014: 341-342.
[10] Liu B. Sentiment analysis and opinion mining [J]. Synthesis Lectures on Human Language Technologies, 2012, 5(1): 1-167.
[11] 徐琳宏,林鸿飞,潘宇,等.情感词汇本体的构造[J].情报学报,2008,27(2):180-185.
  Xu Linhong, Lin Hongfei, Pan Yu, et al. Constructing the affective lexicon ontology [J]. Journal of the China Society for Scientific and Technical Information, 2008, 27(2): 180-185.(in Chinese)
[12] Bhattarai A, Rus V, Dasgupta D. Characterizing comment spam in the blogosphere through content analysis[C]//2009 IEEE Symposium on Computational Intelligence in Cyber Security. Nashville, TN, USA, 2009: 37-44.
[13] AL-Zawaidah F H, Jbara Y H, Abu-Zanona M A. An improved algorithm for mining association rules in large databases [J]. World of Computer Science and Information Technology, 2011, 1(7): 311-316.
[14] Zhai C, Lafferty J. Model-based feedback in the language modeling approach to information retrieval[C]//Proceedings of the Tenth International Conference on Information and Knowledge Management. New York, USA, 2001: 403-410.
[15] Zhang Y, Xu W. Fast exact maximum likelihood estimation for mixture of language model[J]. Information Processing & Management, 2008, 44(3): 1076-1085.

相似文献/References:

[1]李欣.城市空间形态与空间体验的耦合性[J].东南大学学报(自然科学版),2015,45(6):1209.[doi:10.3969/j.issn.1001-0505.2015.06.033]
 Li Xin.Coupling research on urban form and spatial experience[J].Journal of Southeast University (Natural Science Edition),2015,45(3):1209.[doi:10.3969/j.issn.1001-0505.2015.06.033]

备注/Memo

备注/Memo:
收稿日期: 2014-12-05.
作者简介: 吴含前(1972—),男,博士,副教授,hanqian@seu.edu.cn.
基金项目: 国家自然科学基金资助项目(60803057)、国家高技术研究发展计划(863计划)资助项目(2015AA015904).
引用本文: 吴含前,朱云杰,谢珏.基于逻辑回归的中文在线评论有效性检测模型[J].东南大学学报:自然科学版,2015,45(3):433-437. [doi:10.3969/j.issn.1001-0505.2015.03.004]
更新日期/Last Update: 2015-05-20