[1]黄程韦,金赟,王青云,等.基于语音信号与心电信号的多模态情感识别[J].东南大学学报(自然科学版),2010,40(5):895-900.[doi:10.3969/j.issn.1001-0505.2010.05.003]
 Huang Chengwei,Jin Yun,Wang Qingyun,et al.Multimodal emotion recognition based on speech and ECG signals[J].Journal of Southeast University (Natural Science Edition),2010,40(5):895-900.[doi:10.3969/j.issn.1001-0505.2010.05.003]
点击复制

基于语音信号与心电信号的多模态情感识别()
分享到:

《东南大学学报(自然科学版)》[ISSN:1001-0505/CN:32-1178/N]

卷:
40
期数:
2010年第5期
页码:
895-900
栏目:
计算机科学与工程
出版日期:
2010-09-20

文章信息/Info

Title:
Multimodal emotion recognition based on speech and ECG signals
作者:
黄程韦1 金赟12 王青云1 赵力1 邹采荣1
1 东南大学水声信号处理教育部重点实验室,南京 210096; 2 徐州师范大学物理与电子工程学院, 徐州 221116
Author(s):
Huang Chengwei1 Jin Yun12 Wang Qingyun1 Zhao Li1 Zou Cairong1
1 Key Laboratory of Underwater Acoustic Signal Processing of Ministry of Education, Southeast University, Nanjing 210096, China
2 School of Physics and Electronics Engineering, Xuzhou Normal University, Xuzhou 221116,China
关键词:
情感识别 多模态 判决层融合 特征层融合
Keywords:
emotion recognition multimodal decision level fusion feature level fusion
分类号:
TP391.4
DOI:
10.3969/j.issn.1001-0505.2010.05.003
摘要:
通过采集与分析语音信号和心电信号,研究了相应的情感特征与融合算法.首先,通过噪声刺激和观看影视片段的方式分别诱发烦躁情感和喜悦情感,并采集了相应情感状态下的语音信号和心电信号.然后,提取韵律、音质特征和心率变异性特征分别作为语音信号和心电信号的情感特征.最后,利用加权融合和特征空间变换的方法分别对判决层和特征层进行融合,并比较了这2种融合算法在语音信号与心电信号融合情感识别中的性能.实验结果表明:在相同测试条件下,基于心电信号和基于语音信号的单模态情感分类器获得的平均识别率分别为71%和80%; 通过特征层融合,多模态分类器的识别率则达到90%以上; 特征层融合算法的平均识别率高于判决层融合算法.因此,依据语音信号、心电信号等不同来源的情感特征可以构建出可靠的情感识别系统.
Abstract:
Through collecting and analyzing speech signals and electrocardiography(ECG)signals, emotion features and fusion algorithms are studied. First, annoyance is induced by noise stimulation and happiness is induced by comedy movie clips. The corresponding speech signals and ECG signals are recorded. Then, prosodic features and voice quality features are adopted for speech emotional features, and heart rate variability(HRV)features are used for ECG emotional features. Finally, the decision level fusion and the feature level fusion are accomplished by the weighted fusion method and the feature transformation method, respectively. The performances of the two fusion methods in speech emotion and ECG emotion recognition are compared. The experimental results show that for the same testing set, the average recognition rates of the single modal classifier based on the ECG signals and the single modal classifier based on the speech signals reach 71% and 80%, respectively, while that of the multi-modal classifier with the feature level fusion of the speech signals and the ECG signals achieves above 90%. The average recognition rate of the feature level fusion algorithm is higher than that of the decision level fusion algorithm. The different signal channels such as speech signals and ECG signals show a promising improvement in building a reliable emotion recognition system.

参考文献/References:

[1] Zeng Z,Pantic M,Roisman G I,et al.A survey of affect recognition methods:audio,visual and spontaneous expressions [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2009,31(1):39-58.
[2] Hoch S,Althoff F,McGlaun A,et al.Bimodal fusion of emotional data in an automotive environment [C] //Proceedings of the 2005 IEEE International Conference on Acoustics,Speech,and Signal Processing.Philadelphia,Pennsylvania,USA,2005:1085-1088.
[3] Busso C,Deng Z,Yildirim S,et al.Analysis of emotion recognition using facial expressions,speech and multimodal information[C] //Proceedings of the Sixth International Conference on Multimodal Interfaces. Pennsylvania,USA,2004:205-211.
[4] Wagner J,Kim J,Andre E.From physiological signals to emotions:implementing and comparing selected methods for feature extraction and classification[C] //Proceedings of the 2005 IEEE International Conference on Multimedia & Expo.Amsterdam,the Netherlands,2005:940-943.
[5] Khiet T.How does real affect affect affect recognition in speech? [D].Enschede,the Netherlands:Center for Telematics and Information Technology of University of Twente,2009.
[6] Tato R,Santos R,Kompe R,et al.Emotion space improves emotion recognition[C] //Proceedings of the 2002 International Conference on Speech and Language Processing.Denver,Colorado,USA,2002:2029-2032.
[7] 赵力.语音信号处理[M].北京:机械工业出版社,2003.
[8] Schuller B,Rigoll G,Lang M.Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture[C] //Proceedings of the 2004 IEEE International Conference on Acoustics,Speech,and Signal Processing.Montreal,Canada,2004:577-580.
[9] Pittam J,Scherer K R.Vocal expression and communication of emotion [M].New York,USA:Guilford Press,1993:185-198.
[10] Biemans M.Gender variation in voice quality [D].Nijmegen,the Netherlands:Department of Linguistics of Radboud University Nijmegen,2000.
[11] 景慎旗.基于LabVIEW的多生理信号采集与处理的研究[D].南京:东南大学生物科学与医学工程学院,2009.
[12] 蔡莉莉.基于数据融合的语音情感分析与识别[D].南京:东南大学信息科学与工程学院,2005:46-48.
[13] Peng Hangchuan,Long Fuhui,Ding Chris.Feature selection based on mutual information:criteria of max-dependency,max-relevance,and min-redundancy [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2005,27(8):1226-1238.
[14] Ververidis D,Kotropoulos C,Pitas I.Automatic emotional speech classification [C] //Proceedings of the 2004 IEEE International Conference on Acoustics,Speech,and Signal Processing.Montreal,Canada,2004:593-596.

相似文献/References:

[1]陶华伟,査诚,梁瑞宇,等.面向语音情感识别的语谱图特征提取算法[J].东南大学学报(自然科学版),2015,45(5):817.[doi:10.3969/j.issn.1001-0505.2015.05.001]
 Tao Huawei,Zha Cheng,Liang Ruiyu,et al.Spectrogram feature extraction algorithm for speech emotion recognition[J].Journal of Southeast University (Natural Science Edition),2015,45(5):817.[doi:10.3969/j.issn.1001-0505.2015.05.001]

备注/Memo

备注/Memo:
作者简介: 黄程韦(1984—), 男, 博士生; 赵力(联系人), 男, 博士,教授, 博士生导师, zhaoli@seu.edu.cn.
基金项目: 国家自然科学基金资助项目(60472058,60975017)、江苏省自然科学基金资助项目(BK2008291).
引文格式: 黄程韦,金赟,王青云,等.基于语音信号与心电信号的多模态情感识别[J].东南大学学报:自然科学版,2010,40(5):895-900. [doi:10.3969/j.issn.1001-0505.2010.05.003]
更新日期/Last Update: 2010-09-20