[1]陶华伟,査诚,梁瑞宇,等.面向语音情感识别的语谱图特征提取算法[J].东南大学学报(自然科学版),2015,45(5):817-821.[doi:10.3969/j.issn.1001-0505.2015.05.001]
 Tao Huawei,Zha Cheng,Liang Ruiyu,et al.Spectrogram feature extraction algorithm for speech emotion recognition[J].Journal of Southeast University (Natural Science Edition),2015,45(5):817-821.[doi:10.3969/j.issn.1001-0505.2015.05.001]
点击复制

面向语音情感识别的语谱图特征提取算法()
分享到:

《东南大学学报(自然科学版)》[ISSN:1001-0505/CN:32-1178/N]

卷:
45
期数:
2015年第5期
页码:
817-821
栏目:
计算机科学与工程
出版日期:
2015-09-20

文章信息/Info

Title:
Spectrogram feature extraction algorithm for speech emotion recognition
作者:
陶华伟1査诚1梁瑞宇12张昕然1赵力1王青云12
1东南大学水声信号处理教育部重点实验室, 南京210096; 2南京工程学院通信工程学院, 南京211167
Author(s):
Tao Huawei1 Zha Cheng1 Liang Ruiyu12 Zhang Xinran1 Zhao Li1 Wang Qingyun12
1Key Laboratory of Underwater Acoustic Signal Processing of Ministry of Education, Southeast University, Nanjing 210096, China
2School of Communication Engineering, Nanjing Institute of Technology, Nanjing 211167, China
关键词:
情感识别 语谱图 图像纹理特征 局部二值模式
Keywords:
emotion recognition spectrogram image texture feature local binary pattern
分类号:
TP391.42
DOI:
10.3969/j.issn.1001-0505.2015.05.001
摘要:
为研究信号相关性在语音情感识别中的作用,提出了一种面向语音情感识别的语谱图特征提取算法.首先,对语谱图进行处理,得到归一化后的语谱图灰度图像;然后,计算不同尺度、不同方向的Gabor图谱,并采用局部二值模式提取Gabor图谱的纹理特征;最后,将不同尺度、不同方向Gabor图谱提取到的局部二值模式特征进行级联,作为一种新的语音情感特征进行情感识别.柏林库(EMO-DB)及FAU AiBo库上的实验结果表明:与已有的韵律、频域、音质特征相比,所提特征的识别率提升3%以上;与声学特征融合后,所提特征的识别率较早期声学特征至少提高5%.因此,利用这种新的语音情感特征可以有效识别不同种类的情感语音.
Abstract:
In order to study the role of signal correlation in emotional speech recognition, a spectrogram feature extraction algorithm for speech emotion recognition is proposed. First, speech signal is quantized as speech spectrum gray image after preprocessing. Then, the Gabor spectrum images with different scales and different directions are calculated, and the texture features are extracted by local binary pattern(LBP). Finally, the LBP features of the Gabor spectrogram images with different scales and different directions are joined to form a new feature for emotion recognition. The experimental results of EMO-DB and FAU AiBo show that the recognition rate of the proposed features can be raised to at least 3% higher than those of the conventional rhythm and frequency domain features. After fusion with acoustic features, the recognition rate can be raised to at least 5% higher than those of the conventional acoustic features. Therefore, the proposed features can effectively identify different kinds of emotional speech.

参考文献/References:

[1] Attabi Y, Dumouchel P. Anchor models for emotion recognition from speech[J]. IEEE Transactions on Affective Computing, 2013, 4(3): 280-290.
[2] Ramakrishnan S, El Emary I M M. Speech emotion recognition approaches in human computer interaction[J]. Telecommunication Systems, 2013, 52(3): 1467-1478.
[3] Lee A K C, Larson E, Maddox R K, et al. Using neuroimaging to understand the cortical mechanisms of auditory selective attention[J]. Hearing Research, 2014, 307: 111-120.
[4] Minker W, Pittermann J, Pittermann A, et al. Challenges in speech-based human-computer interfaces[J]. International Journal of Speech Technology, 2007, 10(2/3):109-119.
[5] Zhao X M, Zhang S Q, Lei B C. Robust emotion recognition in noisy speech via sparse representation[J]. Neural Computing and Applications, 2014, 24(7/8): 1539-1553.
[6] Huang C W, Chen G M, Yu H, et al. Speech emotion recognition under white noise[J]. Archives of Acoustics, 2013, 38(4): 457-463.
[7] Yan J J, Wang X L, Gu W Y, et al. Speech emotion recognition based on sparse representation[J]. Archives of Acoustics, 2013, 38(4): 465-470.
[8] Wu C H, Liang W B. Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels [J]. IEEE Transactions on Affective Computing, 2011, 2(1):10-21.
[9] Bozkurt E, Erzin E, Erdem C E, et al. Formant position based weighted spectral features for emotion recognition[J]. Speech Communication, 2011, 53(9): 1186-1197.
[10] Altun H, Polat G. Boosting selection of speech related features to improve performance of multi-class SVMs in emotion detection [J]. Expert Systems with Applications, 2009, 36(4): 8197-8203.
[11] Mencattini A, Martinelli E, Costantini G, et al. Speech emotion recognition using amplitude modulation parameters and a combined feature selection procedure[J]. Knowledge-Based Systems, 2014, 63: 68-81.
[12] El Ayadi M, Kamel M S, Karray F. Survey on speech emotion recognition: features, classification schemes, and databases[J]. Pattern Recognition, 2011, 44(3):572-587.
[13] 韩文静,李海峰,阮华斌,等. 语音情感识别研究进展综述[J].软件学报,2014,25(1):37-50.
  Han Wenjing, Li Haifeng, Ruan Huabin, et al. Review on speech emotion recognition[J]. Journal of Software, 2014,25(1):37-50.(in Chinese)
[14] Xu X Z, Huang C W, Wu C, et al. Graph learning based speaker independent speech emotion recognition[J]. Advanced in Electrical and Computer Engineering, 2014,14(2):17-22.

相似文献/References:

[1]黄程韦,金赟,王青云,等.基于语音信号与心电信号的多模态情感识别[J].东南大学学报(自然科学版),2010,40(5):895.[doi:10.3969/j.issn.1001-0505.2010.05.003]
 Huang Chengwei,Jin Yun,Wang Qingyun,et al.Multimodal emotion recognition based on speech and ECG signals[J].Journal of Southeast University (Natural Science Edition),2010,40(5):895.[doi:10.3969/j.issn.1001-0505.2010.05.003]

备注/Memo

备注/Memo:
收稿日期: 2014-02-20.
作者简介: 陶华伟(1987—),男,博士生;赵力(联系人),男,博士,教授,博士生导师,zhaoli@seu.edu.cn.
基金项目: 国家自然科学基金资助项目(61231002,61273266, 61301219)、教育部博士点专项基金资助项目(20110092130004)、江苏省自然科学基金资助项目(BK20130241).
引用本文: 陶华伟,査诚,梁瑞宇,等.面向语音情感识别的语谱图特征提取算法[J].东南大学学报:自然科学版,2015,45(5):817-821. [doi:10.3969/j.issn.1001-0505.2015.05.001]
更新日期/Last Update: 2015-09-20