[1]朱芳枚,赵力,梁瑞宇,等.面向中文语音情感识别的改进栈式自编码结构[J].东南大学学报(自然科学版),2017,47(4):631-636.[doi:10.3969/j.issn.1001-0505.2017.04.001]
 Zhu Fangmei,Zhao Li,Liang Ruiyu,et al.Improved stacked autoencoder for Chinese speech emotion recognition[J].Journal of Southeast University (Natural Science Edition),2017,47(4):631-636.[doi:10.3969/j.issn.1001-0505.2017.04.001]
点击复制

面向中文语音情感识别的改进栈式自编码结构()
分享到:

《东南大学学报(自然科学版)》[ISSN:1001-0505/CN:32-1178/N]

卷:
47
期数:
2017年第4期
页码:
631-636
栏目:
计算机科学与工程
出版日期:
2017-07-20

文章信息/Info

Title:
Improved stacked autoencoder for Chinese speech emotion recognition
作者:
朱芳枚1赵力1梁瑞宇12王青云2邹采荣1
1东南大学水声信号处理教育部重点实验室, 南京 210096; 2南京工程学院通信工程学院, 南京 211167
Author(s):
Zhu Fangmei1 Zhao Li1 Liang Ruiyu12 Wang Qingyun2 Zou Cairong1
1Key Laboratory of Underwater Acoustic signal Processing of Ministry of Education, Southeast University, Nanjing 210096, China
2School of Communication Engineering, Nanjing Institute of Technology, Nanjing 211167, China
关键词:
语音情感识别 改进的栈式自编码 降噪自编码 稀疏自编码
Keywords:
speech emotion recognition enhanced stacked autoencoder denoising autoencoder sparse autoencoder
分类号:
TP391.42
DOI:
10.3969/j.issn.1001-0505.2017.04.001
摘要:
为进一步提高汉语语音情感识别率,基于深度学习中的自编码、降噪自编码及稀疏自编码的网络结构,提出了一种改进的栈式自编码结构.该结构第1层使用降噪自编码学习一个比输入特征维数更大的隐藏特征,第2层采用稀疏自编码学习稀疏性特征,最后使用softmax分类器进行分类识别.训练过程首先采用逐层预训练的方法,达到网络参数全面初始化的目的,然后对整个网络进行微调.在中文语音库上的情感识别实验显示,相较于单独使用栈式降噪或稀疏自编码,所提结构具有更好的识别效果.此外,基于CASIA库的对比实验显示,该结构比K近邻算法、稀疏表示方法、传统支持向量机和人工神经网络识别率分别提高了53.7%,29.8%,14.3%和1.9%.在自行录制的语音库中,该结构的识别率比人工神经网络提高了1.64%.
Abstract:
An improved stacked autoencoder based on autoencoder, denoising autoencoder and sparse autoencoder is proposed to improve the Chinese speech emotion recognition. The first layer of the structure uses a denoising autoencoder to learn a hidden feature with a larger dimension than the dimension of the input features, and the second layer employs a sparse autoencoder to learn sparse features.Finally, a softmax classifer is applied to classify the features. In the training process, the layer-wise pre-training is used to achieve the purpose of initializing all parameters of the network, and then the whole network is fine-tuned. The experiments on Chinese databases show that the improved stacked autoencoders achieve a better recognition rate than the stacked denoising autoencoders or stacked sparse autoencoders. In addition, the comparative experiments based on CASIA database show that the recognition rate of the structure is improved by 53.7%, 29.8%, 14.3% and 1.9%, respectively, compared with the K-nearest neighbor algorithm, the sparse representation method, the traditional support vector machine and the artificial neural network. The recognition rate of this structure is 1.64% higher than the artificial neural network on the self-recording database.

参考文献/References:

[1] Sun Y X, Wen G H, Wang J B. Weighted spectral features based on local Hu moments for speech emotion recognition[J]. Biomedical Signal Processing and Control, 2015, 18: 80-90. DOI:10.1016/j.bspc.2014.10.008.
[2] 张昕然, 查诚, 徐新洲,等. 基于LDA+kernel+KNNFLC的语音情感识别方法[J]. 东南大学学报(自然科学版), 2015, 45(1):5-11. DOI: 10.3969/j.issn.1001-0505.2015.01.002.
Zhang Xinran, Zha Cheng, Xu Xinzhou, et al.Speech emotion recognition based on LDA+kernel+KNNFLC[J].Journal of Southeast University(Natural Science Edition), 2015, 45(1):5-11.DOI:10.3969/j.issn.1001-0505.2015.01.002. (in Chinese)
[3] Burges C J C. A tutorial on support vector machines for pattern recognition[J]. Data Mining & Knowledge Discovery, 1998, 2(2):121-167.
[4] UFL DL. Softmax regression [EB/OL].(2013-04-07)[2016-11-10].http://deeplearning.stanford.edu/wiki/index.php/Softmax-Regression.
[5] Hassoun M H. Fundamentals of artificial neural networks[J]. Proceedings of the IEEE, 1996, 84(6): 906. DOI:10.1109/jproc.1996.503146.
[6] Bengio Y, Courville A. Deep learning of representations[M]//Handbook on Neural Information Processing. Berlin:Springer, 2013:1-28.
[7] 韩文静, 李海峰. 情感语音数据库综述[J]. 智能计算机与应用, 2013, 3(1): 5-7. DOI:10.3969/j.issn.2095-2163.2013.01.002.
Han Wenjing, Li Haifeng. A brief review on emotional speech databases[J]. Intelligent Computer and Applications, 2013, 3(1): 5-7. DOI:10.3969/j.issn.2095-2163.2013.01.002. (in Chinese)
[8] Aharon M, Elad M, Bruckstein A. K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation[J]. IEEE Transactions on Signal Processing, 2006, 54(11):4311-4322. DOI:10.1109/tsp.2006.881199.
[9] 蒋丹宁, 蔡莲红. 基于语音声学特征的情感信息识别[J]. 清华大学学报(自然科学版), 2006, 46(1): 86-89. DOI:10.3321/j.issn:1000-0054.2006.01.023.
Jiang Danning, Cai Lianhong. Speech emotion recognition using acoustic features[J]. Journal of Tsinghua University(Science and Technology), 2006, 46(1): 86-89. DOI:10.3321/j.issn:1000-0054.2006.01.023. (in Chinese)
[10] Bengio Y. Learning deep architectures for AI[J]. Foundations and Trends? in Machine Learning, 2009, 2(1):1-127. DOI:10.1561/2200000006.
[11] Deng J, Zhang Z, Eyben F, et al. Autoencoder-based unsupervised domain adaptation for speech emotion recognition[J]. IEEE Signal Processing Letters, 2014, 21(9):1068-1072.
[12] Chen X, Li M, Yang X Q. Stacked denoise autoencoder based feature extraction and classification for hyperspectral images[J]. Journal of Sensors, 2016, 2016: 3632943. DOI:10.1155/2016/3632943.
[13] Vincent P, Larochelle H, Bengio Y, et al. Extracting and composing robust features with denoising autoencoders[C]// Proceedings of the 25th International Conference on Machine Learning. Helsinki, Finland, 2008. DOI:10.1145/1390156.1390294.
[14] Deng J, Zhang Z X, Marchi E, et al. Sparse autoencoder-based feature transfer learning for speech emotion recognition[C]//Humaine Association Conference on Affective Computing and Intelligent Interaction. Geneva,Switzerland, 2013:511-516. DOI:10.1109/acii.2013.90.

相似文献/References:

[1]张昕然,查诚,徐新洲,等.基于LDA+kernel-KNNFLC的语音情感识别方法[J].东南大学学报(自然科学版),2015,45(1):5.[doi:10.3969/j.issn.1001-0505.2015.01.002]
 Zhang Xinran,Zha Cheng,Xu Xinzhou,et al.Speech emotion recognition based on LDA+kernel-KNNFLC[J].Journal of Southeast University (Natural Science Edition),2015,45(4):5.[doi:10.3969/j.issn.1001-0505.2015.01.002]

备注/Memo

备注/Memo:
收稿日期: 2016-12-10.
作者简介: 朱芳枚(1992—),女,硕士生;赵力(联系人),男,博士,教授,博士生导师,zhaoli@seu.edu.cn.
基金项目: 国家自然科学基金资助项目(61375028,61571106,61673108)、江苏省青蓝工程资助项目、江苏省博士后科研资助计划资助项目(1601011B)、江苏省“六大人才高峰”资助项目(2016-DZXX-023)、中国博士后科学基金资助项目(2016M601695).
引用本文: 朱芳枚,赵力,梁瑞宇,等.面向中文语音情感识别的改进栈式自编码结构[J].东南大学学报(自然科学版),2017,47(4):631-636. DOI:10.3969/j.issn.1001-0505.2017.04.001.
更新日期/Last Update: 2017-07-20