[1]曾毓敏,吴镇扬.基于浊音语音谐波谱子带加权重建的抗噪声说话人识别[J].东南大学学报(自然科学版),2008,38(6):935-941.[doi:10.3969/j.issn.1001-0505.2008.06.001]
 Zeng Yumin,Wu Zhenyang.Robust speaker recognition based on harmonic spectrum reconstruction of voiced speech[J].Journal of Southeast University (Natural Science Edition),2008,38(6):935-941.[doi:10.3969/j.issn.1001-0505.2008.06.001]
点击复制

基于浊音语音谐波谱子带加权重建的抗噪声说话人识别()
分享到:

《东南大学学报(自然科学版)》[ISSN:1001-0505/CN:32-1178/N]

卷:
38
期数:
2008年第6期
页码:
935-941
栏目:
信息与通信工程
出版日期:
2008-11-20

文章信息/Info

Title:
Robust speaker recognition based on harmonic spectrum reconstruction of voiced speech
作者:
曾毓敏12 吴镇扬1
1 东南大学信息科学与工程学院, 南京 210096; 2 南京师范大学物理科学与技术学院, 南京 210097
Author(s):
Zeng Yumin12 Wu Zhenyang1
1 School of Information Science and Engineering, Southeast University, Nanjing 210096, China
2 School of Physics and Technology, Nanjing Normal University, Nanjing 210097, China
关键词:
说话人识别 频谱重建 感知线性预测倒谱系数 噪声补偿 谱平坦度测度
Keywords:
speaker recognition spectrum reconstruction perceptual linear predictive cepstrum coefficient noise compensation spectral flatness measure
分类号:
TN912.3
DOI:
10.3969/j.issn.1001-0505.2008.06.001
摘要:
提出了一个基于浊音语音谐波谱重建的说话人识别算法.该算法根据浊音语音短时频谱的结构特征和基音信息,对浊音语音谐波结构频谱进行子带加权重建,以补偿由噪声引起的训练与测试条件的失配. 算法基于重建浊音频谱提取感知线性预测倒谱系数,与基音相组合作为说话人的语音特征参数矢量,采用高斯混合模型对说话人进行建模. 仿真实验的结果表明:所提出的浊音谱重建方法对多种类型含噪语音的噪声补偿均具良好效果,可以明显提高在噪声环境下的与文本无关的说话人识别的识别率,特别是显著提高低信噪比环境下的识别率,而不会明显降低纯净语音和高信噪比环境下的识别率.
Abstract:
A speaker recognition algorithm based on harmonic spectrum reconstruction of voiced speech is proposed. In the proposed approach, according to the spectral character and pitch information of original speech, the harmonic spectrum of voiced segment is reconstructed with the sub-band weighting method to compensate the acoustic mismatches caused by noises between training and testing conditions. The perceptual linear predictive cepstrum coefficient is extracted from the reconstructed spectrum and is combined with pitch to form a speech feature vector of a giving speaker. Speaker is modeled by Gaussian mixture model. Simulation results indicate that the approach of the voiced speech spectrum reconstruction proposed in this paper is very effective for the noise compensation in many noisy speech conditions. For the text independent speaker recognition, the recognition accuracy is significantly improved by this method in the noisy environments, especially in low SNR environments, and there is no remarkable degradation in clean speech and high SNR environments.

参考文献/References:

[1] Solomonoff A,Campbell W,Boardman I.Advances in channel compensation for SVM speaker recognition [C] //Proceeding of IEEE ICASSP-2005.Philadelphia,USA,2005:629-632.
[2] Hermansky H,Morgan N.RASTA processing of speech [J]. IEEE Transactions on Speech and Audio Processing,1994,2(4):578-589.
[3] Poruba J.Speech enhancement based on nonlinear spectral subtraction [C] //Proceedings of IEEE ICCDCS’02.Piscataway,USA,2002:1-4.
[4] Rose R,Hofstetter E.Integrated models of signal and background with application to speaker identification in noise [J]. IEEE Transactions on Speech and Audio Processing,1994,2(2):245-257.
[5] Deng L,Droppo J,Acero A.Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion [J]. IEEE Transactions on Speech and Audio Processing,2005,13(3):412-421.
[6] Ming J.Noise compensation for speech recognition with arbitrary additive noise [J]. IEEE Transactions on Audio,Speech and Language Processing, 2006, 14(3):833-844.
[7] 赵蕤,王作英.语音识别中信道和噪声的联合补偿[J].声学学报,2006,31(5):466-470.
  Zhao Rui,Wang Zuoying.Joint compensation of noise and channel in speech recognition [J]. Acta Acustica,2006,31(5):466-470.(in Chinese)
[8] Gong Y.A method of joint compensation of additive and convolutive distortions for speaker-independent speech recognition [J]. IEEE Transactions on Speech and Audio Processing,2005,13(5):975-983.
[9] Hermansky H.Perceptual linear predictive(PLP)analysis of speech [J]. The Journal of the Acoustic Society of America,1994,87(4):1738-1752.
[10] Ding H,Qian B,Li Y,et al.A method combining LPC-based cepstrum and harmonic product spectrum for pitch detection [C] //Proceedings of ICIIH-MSP’06.Pasadena,USA,2006:537-540.
[11] Painter T,Spanias A.Perceptual coding of digital audio [J].Proceedings of the IEEE,2000,88(4):451-513.
[12] Chen C,Chen C,Cheng P.Hybrid KLT-GMM approach for robust speaker identification [J].IEE Electronics Letters,2003,39(21):1552-1554.
[13] SPIB.NoiseX92 noise database [EB/OL].(2001-05-15)[2002-11-15].http://spib.rice.edu/spib/select_noise.html.

备注/Memo

备注/Memo:
作者简介: 曾毓敏(1962—),男,博士; 吴镇扬(联系人),男,教授,博士生导师,zhenyang@seu.edu.cn.
基金项目: 国家重点基础研究发展计划(973计划)资助项目(2002CB312102)、江苏省普通高校自然科学研究计划资助项目(07KJD510110).
引文格式: 曾毓敏,吴镇扬.基于浊音语音谐波谱子带加权重建的抗噪声说话人识别[J].东南大学学报:自然科学版,2008,38(6):935-941.
更新日期/Last Update: 2008-11-20