[1]郭海燕,李枭雄,李拟珺,等.基于基频状态和帧间相关性的单通道语音分离算法[J].东南大学学报(自然科学版),2014,44(6):1099-1104.[doi:10.3969/j.issn.1001-0505.2014.06.001]
 Guo Haiyan,Li Xiaoxiong,Li Nijun,et al.Single-channel speech separation based on pitch state and interframe correlation[J].Journal of Southeast University (Natural Science Edition),2014,44(6):1099-1104.[doi:10.3969/j.issn.1001-0505.2014.06.001]
点击复制

基于基频状态和帧间相关性的单通道语音分离算法()
分享到:

《东南大学学报(自然科学版)》[ISSN:1001-0505/CN:32-1178/N]

卷:
44
期数:
2014年第6期
页码:
1099-1104
栏目:
信息与通信工程
出版日期:
2014-11-20

文章信息/Info

Title:
Single-channel speech separation based on pitch state and interframe correlation
作者:
郭海燕12李枭雄1李拟珺1周琳1吴镇扬1
1东南大学信息科学与工程学院, 南京 210096; 2南京农业大学工学院, 南京 210031
Author(s):
Guo Haiyan12 Li Xiaoxiong1 Li Nijun1 Zhou Lin1 Wu Zhenyang1
1School of Information Science and Engineering, Southeast University, Nanjing 210096, China
2College of Engineering, Nanjing Agricultural University, Nanjing 210031, China
关键词:
语音分离 稀疏分解 正交匹配追踪 基频 数据挖掘
Keywords:
speech separation sparse decomposition orthogonal matching pursuit(OMP) pitch frequency data mining
分类号:
TN912
DOI:
10.3969/j.issn.1001-0505.2014.06.001
摘要:
提出一种基于基频状态和帧间相关性的单通道混合语音分离算法.首先,从混合语音中提取2个源语音的基频进行状态编码,基于编码的基频状态构造自适应字典,并通过引入基频信息在字典层面对各源语音信号进行区分.然后,采用频繁模式挖掘算法,提取基频状态为1时字典的频繁1项子集,缩减字典尺寸.最后,以基于正交匹配追踪的分离语音为基础,检测分离效果差的混合语音帧,搜索与其相关度最高的平移后的邻近分离语音帧进行叠加,并采用软掩蔽方法进行第二次分离校正.仿真实验结果表明,该算法获取的分离语音信噪比优于现有的2种经典语音分离算法,并且该算法采用频繁模式挖掘算法大大减小了运算量.
Abstract:
A single-channel speech separation algorithm based on pitch state and interframe correlation is proposed. First, the pitch of two simultaneously active speakers is tracked from mixture over time and encoded by pitch states. On this basis, adaptive source-individual dictionaries are generated to distinguish source frames in pitch. Secondly, a frequent pattern mining method is utilized to find the frequent 1-itemset as atoms to reduce the sizes of the dictionaries generated for the sources whose pitch states are 1. Thirdly, based on the separated sources achieved by the orthogonal matching pursuit(OMP)algorithm, mixed frames with poor separation performance are detected. Each is added with the shifted separated source frame which is the most correlated one among all the shifted waveforms of adjacent separated sources, and the soft mask method is adopted to perform the second separation. The experimental results show that the proposed algorithm outperforms two classical separation methods in terms of signal-to-noise ratio(SNR). Besides, the frequent pattern mining method can greatly reduce the computation cost of the separation algorithm.

参考文献/References:

[1] Wang D L, Brown G J. Computational auditory scene analysis: principles, algorithms, and applications [M]. Wiley-IEEE Press, 2006: 1-395.
[2] Shao Y, Srinivasan S, Jin Z, et al. A computational auditory scene analysis system for speech segregation and robust speech recognition [J]. Computer Speech & Language, 2010, 24(1): 77-93.
[3] Stark M, Wohlmayr M, Pernkopf F. Source-filter-based single-channel speech separation using pitch information [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(2): 242-255.
[4] Weiss R J, Ellis D P W. Speech separation using speaker-adapted eigenvoice speech models [J]. Computer Speech & Language, 2010, 24(1): 16-29.
[5] Moussallam M, Richard G, Daudet L. Audio source separation informed by redundancy with greedy multiscale decompositions [C]//Proceedings of the 20th European European Signal Processing Conference. Burcharest, Romania, 2012: 2644-2648.
[6] Schmidt M N, Olsson R K. Single-channel speech separation using sparse non-negative matrix factorization [C/OL]//International Conference on Spoken Language Processing. Pittsburgh, PA, USA, 2006. http://eprints.pascal-network.org/archive/00002722/01/imm4511-01.pdf.
[7] Virtanen T. Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2007, 15(3): 1066-1074.
[8] 郭海燕,杨震,朱卫平. 一种新的基于稀疏分解的单通道混合语音分离方法 [J]. 电子学报, 2012, 40(4): 762-768.
  Guo Haiyan, Yang Zhen, Zhu Weiping. A new single-channel speech separation method based on sparse decomposition [J]. Acta Electronic Sinica, 2012, 40(4): 762-768.(in Chinese)
[9] Wohlmayr M, Stark M, Pernkopf F. A probabilistic interaction model for multipitch tracking with factorial hidden Markov models [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(4): 799-810.
[10] Han J W, Kamber M, Pei J. 数据挖掘概念与技术[M]. 3 版. 范明,孟小峰,译. 北京:机械工业出版社, 2012: 157-179.
[11] Pati Y C, Rezaiifar R, Krishnaprasad P S. Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition [C]//1993 Conference Record of the Twenty-Seventh Asilomar Conference on Signals, Systems and Computers. Pacific Grove, CA, USA, 1993: 40-44.
[12] Cooke M P, Barker J, Cunningham S P, et al. An audio-visual corpus for speech perception and automatic speech recognition [J]. The Journal of the Acoustical Society of America, 2006, 120(5): 2421-2424.

备注/Memo

备注/Memo:
收稿日期: 2014-06-10.
作者简介: 郭海燕(1983—),女,博士,讲师;吴镇扬(联系人),男,教授,博士生导师,zhenyang@seu.edu.cn.
基金项目: 国家自然科学基金资助项目(61302152, 61201345, 61271240)、现代信息科学与网络技术北京市重点实验室开放课题资助项目(XDXX1308).
引用本文: 郭海燕,李枭雄,李拟珺,等.基于基频状态和帧间相关性的单通道语音分离算法[J].东南大学学报:自然科学版,2014,44(6):1099-1104. [doi:10.3969/j.issn.1001-0505.2014.06.001]
更新日期/Last Update: 2014-11-20