[1]蔡庆.多准则融合的中文命名实体识别方法[J].东南大学学报(自然科学版),2020,50(5):929-934.[doi:10.3969/j.issn.1001-0505.2020.05.019]
 Cai Qing.Chinese named entity recognition based on multi-criteria fusion[J].Journal of Southeast University (Natural Science Edition),2020,50(5):929-934.[doi:10.3969/j.issn.1001-0505.2020.05.019]
点击复制

多准则融合的中文命名实体识别方法()
分享到:

《东南大学学报(自然科学版)》[ISSN:1001-0505/CN:32-1178/N]

卷:
50
期数:
2020年第5期
页码:
929-934
栏目:
计算机科学与工程
出版日期:
2020-09-20

文章信息/Info

Title:
Chinese named entity recognition based on multi-criteria fusion
作者:
蔡庆
江苏自动化研究所, 连云港 222061
Author(s):
Cai Qing
Jiangsu Institute of Automation, Lianyungang 222061, China
关键词:
命名实体识别 BERT 条件随机场 多准则学习
Keywords:
named entity recognition bidirectional encoder representations from transformers(BERT) conditional random field(CRF) multi-criteria learning
分类号:
TP391;TP183
DOI:
10.3969/j.issn.1001-0505.2020.05.019
摘要:
为提高中文命名实体识别任务的识别率,提出了一种多准则融合模型.采用基于字的BERT语言模型作为语言信息特征提取层,将其接入多准则共享连接层和条件随机场(CRF)层,得到融合模型.建立大规模中文混合语料库,优化模型参数,使用单GPU设备完成BERT语言模型的预训练.将融合模型在MSRA-NER和RMRB-98-1实体标注集上进行独立训练和混合训练,得到各语料库独立的单准则中文命名实体识别模型和多准则融合中文命名实体识别模型.结果表明,多准则融合中文命名实体识别模型能够挖掘语料库间的共有信息,提高中文命名实体的识别率,MSRA-NER和RMRB-98-1实体标注集上的F1值分别为94.46%和94.32%,优于其他现有模型.
Abstract:
To improve the recognition rate of Chinese named entity recognition tasks, a multi-criteria fusion model was proposed. The word-based BERT(bidirectional encoder representations from transformers)language model was used as the language information feature extraction layer, and connected to the multi-criteria shared connection layer and the conditional random field(CRF)layer to obtain the fusion model. Then,a large-scale Chinese mixed corpus was established and the model parameters were optimized. A single GPU(graphics processing unit)device was used to complete the pre-training of the BERT language model. Independent and hybrid training of the fusion model on MSRA-NER and RMRB-98-1 entity annotation sets were carried out to obtain the independent single-criteria Chinese named entity recognition model and the multi-criteria fusion Chinese named entity recognition model for each corpus. The results show that the multi-criteria fusion Chinese named entity recognition model can mine common information between corpora and improve the recognition rate of Chinese named entities. The F1 values on MSRA-NER and RMRB-98-1 entity tagging sets are 94.46% and 94.32%, respectively, which are better than those of other models.

参考文献/References:

[1] Collobert R, Weston J. A unified architecture for natural language processing: Deep neural networks with multitask learning[C]//Proceedings of the 25th International Conference on Machine Learning. Helsinki, Finland, 2008: 160-167. DOI:10.1145/1390156.1390177.
[2] Duan H Z, Zheng Y, Random C. A study on features of the CRFs-based Chinese[J].International Journal of Advanced Intelligence, 2011, 3(2): 287-294.
[3] Ouyang L B, Tian Y, Tang H, et al. Chinese named entity recognition based on B-LSTM neural network with additional features[C]// Proceedings of the 10th International Conference on Security, Privacy, and Anonymity in Computation, Communication, and Storage. Guangzhou, China, 2017: 269-279. DOI:10.1007/978-3-319-72389-1_22.
[4] 杨飘, 董文永. 基于BERT嵌入的中文命名实体识别方法[J]. 计算机工程, 2020, 46(4): 40-45,52.
  Yang P, Dong W Y. Chinese named entity recognition method based on BERT embedding[J]. Computer Engineering, 2020, 46(4): 40-45,52.(in Chinese)
[5] Qiu X, Zhao J, Huang X J. Joint Chinese word segmentation and POS tagging on heterogeneous annotated corpora with multiple task learning[C]// Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Seattle,WA,USA,2013: 658-668.
[6] Peters M, Neumann M, Iyyer M, et al. Deep contextualized word representations[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Louisiana, New Orleans, USA,2018: 2227–2237.DOI: 10.18653/v1/n18-1202.
[7] Devlin J, Chang M W, Lee K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding [EB/OL].(2018-10-11)[2019-05-24]. https://arxiv.org/abs/1810.04805.
[8] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]// Proceedings of the 31st Conference on Neural Information Processing Systems. San Diego, CA,USA,2017: 6000-6010.
[9] Hendrycks D, Gimpel K. Bridging nonlinearities and stochastic regularizers with Gaussian error linear units[EB/OL].(2016-06-27)[2018-11-11]. https://arxiv.org/abs/1606.08415v1.
[10] Lafferty J, McCallum A, Pereira F C N. Conditional random fields: Probabilistic models for segmenting and labeling sequence data[C]//Proceedings of the Eighteenth International Conference on Machine Learning. Williamstown, PA, USA, 2001: 282-289.
[11] Ratinov L, Roth D. Design challenges and misconceptions in named entity recognition[C]//Proceedings of the Thirteenth Conference on Computational Natural Language Learning. Boulder, Colorado, USA, 2009: 147-155. DOI:10.3115/1596374.1596399.
[12] GitHub. Conlleval[EB/OL].(2005-02-01)[2019-04-12].https://github.com/sighsmile/conlleval.
[13] Dong C H, Zhang J J, Zong C Q, et al. Character-based LSTM-CRF with radical-level features for Chinese named entity recognition[C]// NLPCC 2016 Natural Language Understanding and Intelligent Applications. Kunming, China, 2016: 239-250. DOI: 10.1007/978-3-319-50496-4_20.
[14] Zhang Y, Yang J. Chinese NER using lattice LSTM [C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Melbourne, Australia,2018:1554-1564. DOI: 10.18653/v1/p18-1144.
[15] Wang G Y, Cai Y, Ge F. Using hybrid neural network to address Chinese named entity recognition[C]//2014 IEEE 3rd International Conference on Cloud Computing and Intelligence Systems. Shenzhen, China, 2014:433-437. DOI:10.1109/ccis.2014.7175774.
[16] 冯蕴天, 张宏军, 郝文宁, 等. 基于深度信念网络的命名实体识别[J]. 计算机科学, 2016, 43(4): 224-230. DOI:10.11896/j.issn.1002-137X.2016.4.046.
Feng Y T, Zhang H J, Hao W N, et al. Named entity recognition based on deep belief net[J].Computer Science, 2016, 43(4): 224-230. DOI:10.11896/j.issn.1002-137X.2016.4.046. (in Chinese)
[17] 顾溢. 基于BiLSTM-CRF的复杂中文命名实体识别研究[D]. 南京: 南京大学, 2019.
  Gu Y. Research on complex Chinese named entity recognition based on BiLSTM-CRF[D]. Nanjing: Nanjing University, 2019.(in Chinese)

备注/Memo

备注/Memo:
收稿日期: 2020-01-20.
作者简介: 蔡庆(1975—),男,研究员,caiqingjari@163.com.
基金项目: “十三五”装备预研共用技术和领域基金资助项目(41412030902).
引用本文: 蔡庆.多准则融合的中文命名实体识别方法[J].东南大学学报(自然科学版),2020,50(5):929-934. DOI:10.3969/j.issn.1001-0505.2020.05.019.
更新日期/Last Update: 2020-09-20