[1]马应龙,李鹏鹏,张敬旭.一种基于多分类语义分析和个性化的语义检索方法[J].东南大学学报(自然科学版),2014,44(2):261-265.[doi:10.3969/j.issn.1001-0505.2014.02.007]
 Ma Yinglong,Li Pengpeng,Zhang Jingxu.Semantic search approach based on multi-classification semantic analysis and personalization[J].Journal of Southeast University (Natural Science Edition),2014,44(2):261-265.[doi:10.3969/j.issn.1001-0505.2014.02.007]
点击复制

一种基于多分类语义分析和个性化的语义检索方法()
分享到:

《东南大学学报(自然科学版)》[ISSN:1001-0505/CN:32-1178/N]

卷:
44
期数:
2014年第2期
页码:
261-265
栏目:
计算机科学与工程
出版日期:
2014-03-20

文章信息/Info

Title:
Semantic search approach based on multi-classification semantic analysis and personalization
作者:
马应龙1李鹏鹏1张敬旭2
1华北电力大学控制与计算机工程学院, 北京102206; 2甘肃省电力公司, 兰州 730030
Author(s):
Ma Yinglong1 Li Pengpeng1 Zhang Jingxu2
1 School of Control and Computer Engineering, North China Electric Power University, Beijing 102206, China
2 Gansu Electric Power Corporation, Lanzhou 730030, China
关键词:
语义检索 多分类语义分析 词向量库 个性化算法
Keywords:
semantic search multi-classification semantic analysis(MSA) term vector database(TVDB) personalization algorithm
分类号:
TP391.3
DOI:
10.3969/j.issn.1001-0505.2014.02.007
摘要:
为了进一步提升语义检索的精度和改善用户体验,提出了一种基于多分类语义分析和个性化的语义检索方法.首先,利用改进的多分类语义分析方法实现目标文档的向量化,并建立词向量库;然后,利用支持向量机对文档进行分类,并结合文档类别生成标签索引.在检索时,根据词向量库的引导,使用用户历史检索记录和个人信息优化检索结果.实验结果显示,基于该方法的系统的检索精度、平均DCG和nDCG指标值分别达到0.7,7.267和0.890,较基于Lucene方法和Yahoo Directory方法所得结果的均值分别高出31%,36%和19%.在时间复杂度上,每次检索的平均耗时为0.669 s,较Lucene方法仅增加了0.326 s.由此可见,该方法提高了检索的精度和综合相关度,且额外的时间消耗较少.
Abstract:
To further enhance the accuracy of semantic search and improve the user experience, a novel approach for semantic search based on multi-classification semantic analysis(MSA)and personalization is presented. First,documents are transformed into vectors and stored in term vector database(TVDB)by using the modified MSA method. Then, documents are classified by support vector machine(SVM)and wrote into index with categories. In the search process, users’ search history and personal information are used to optimize the search results with the help of TVDB. The experiment results show that the average precision, the average discounted cumulative gain(DCG)and the average normalized discounted cumulative gain(nDCG)otained by using this approach are 0.7, 7.267 and 0.890, respectively, which are 31%, 36% and 19% higher than the average of the results calculated by the Lucene method and the Yahoo Directory method. And the time complexity per query is 0.669 s, which is only 0.326 s more than that by using the Lucene method. Therefore, this approach can improve the relevance and precision of semantic search with a rational time cost.

参考文献/References:

[1] Liu F, Yu C. Personalized web search for improving retrieval effectiveness [J]. IEEE Transactions on Knowledge and Data Engineering, 2004, 16(1): 28-40.
[2] Wang T D, Deshpande A, Shneiderman B. A temporal pattern search algorithm for personal history event visualization [J]. IEEE Transactions on Knowledge and Data Engineering, 2012, 24(5): 799-812.
[3] Pang M, Xu G. A personalized search engine research based on bloom filter mechatronic science [C]//Proceedings of 2011 IEEE International Conference on Mechatronic Science, Electric Engineering and Computer. Changchun, China, 2011: 2365-2366.
[4] Eberlein A. Calculating the strength of ties of a social network in a semantic search system using hidden Markov models [C]//Proceedings of 2011 IEEE International Conference on Systems, Man and Cybernetics. Anchorage, Alaska, USA, 2011: 2755-2760.
[5] Lai L F, Wu C C, Lin P Y. Developing a fuzzy search engine based on fuzzy ontology and semantic search [C]//Proceedings of 2011 IEEE International Conference on Fuzzy Systems. Taipei, China, 2011: 2684-2689.
[6] Singh R, Dhingra D A. A SCHISM—a web search engine using semantic taxonomy [J]. IEEE Potentials, 2010, 29(5): 36-40.
[7] Paltoglou G, Thelwall M. A study of information retrieval weighting schemes for sentiment analysis [C]//Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, Pennsylvania, USA, 2010: 1386-1395.
[8] Li Z X, Xiong Z Y. Fast text categorization using concise semantic analysis [J]. Pattern Recognition Letters, 2011, 32(3): 441-448.
[9] Imielinski T, Signorini A. If you ask nicely, I will answer: semantic search and today’s search engines [C]//Proceedings of 2009 IEEE International Conference on Semantic Computing. Berkeley, CA, USA, 2009: 184-191.
[10] Tamine-Lechani L, Boughanem M, Daoud M. Evaluation of contextual information retrieval effectiveness: overview of issues and research [J]. Knowledge and Information Systems, 2010, 24(1): 1-34.

备注/Memo

备注/Memo:
收稿日期: 2013-10-21.
作者简介: 马应龙(1976—),男,博士,副教授,yinglongma@ncepu.edu.cn.
基金项目: 国家自然科学基金资助项目(61001197,61372182)、国家电网公司科技资助项目(522722130292).
引用本文: 马应龙,李鹏鹏,张敬旭.一种基于多分类语义分析和个性化的语义检索方法[J].东南大学学报:自然科学版,2014,44(2):261-265. [doi:10.3969/j.issn.1001-0505.2014.02.007]
更新日期/Last Update: 2014-03-20