[1]汤一彬,吴海洋,吴镇扬.基于非负矩阵分解的1kbit/s波形内插语音编码算法[J].东南大学学报(自然科学版),2010,40(4):670-675.[doi:10.3969/j.issn.1001-0505.2010.04.002]
 Tang Yibin,Wu Haiyang,Wu Zhenyang.1kbit/s waveform interpolation speech coding based on non-negative matrix factorization[J].Journal of Southeast University (Natural Science Edition),2010,40(4):670-675.[doi:10.3969/j.issn.1001-0505.2010.04.002]
点击复制

基于非负矩阵分解的1kbit/s波形内插语音编码算法()
分享到:

《东南大学学报(自然科学版)》[ISSN:1001-0505/CN:32-1178/N]

卷:
40
期数:
2010年第4期
页码:
670-675
栏目:
信息与通信工程
出版日期:
2010-07-20

文章信息/Info

Title:
1kbit/s waveform interpolation speech coding based on non-negative matrix factorization
作者:
汤一彬 吴海洋 吴镇扬
东南大学信息科学与工程学院,南京 210096; 东南大学水声信号处理教育部重点实验室(B类筹),南京 210096
Author(s):
Tang Yibin Wu Haiyang Wu Zhenyang
School of Information Science and Engineering, Southeast University, Nanjing 210096, China
Key Laboratory of Underwater Acoustic Signal Processing of Ministry of Education, Southeast University, Nanjing 210096, China
关键词:
非负矩阵分解 波形内插 特征波表面
Keywords:
non-negative matrix factorization waveform interpolation characteristic waveform surface
分类号:
TN912.3
DOI:
10.3969/j.issn.1001-0505.2010.04.002
摘要:
为了进一步降低编码速率,提出了一种基于非负矩阵分解的1kbit/s波形内插语音编码算法.该算法对特征波表面的幅度矩阵进行非负矩阵分解,以获得局部特征矩阵,并对该局部特征矩阵进行约束和改进,使优化后局部特征更加突出.对应的基矢量进一步稀疏,从而有利于对权矢量的量化,以实现对特征波表面的高效编码.该算法同时加入清浊音标志,对特征波表面的相位谱进行估计,以更好地提高合成语音质量.实验表明,该算法能够在1kbit/s的低编码速率条件下,获得与1.2kbit/s混合激励线性预测语音编码算法相近的合成语音质量,取得了较好的效果.
Abstract:
A 1kbit/s waveform interpolation speech coding is proposed based on non-negative matrix factorization to achieve the lower encoding bit-rate. The new coder decomposes the magnitude matrix of the characteristic waveform surface with the non-negative matrix factorization to obtain the local characteristic matrix. Then the local characteristic matrix is improved with some constraints to enhance the local characters and make the corresponding basis vectors sparser, which is propitious to the quantization of the weighted vectors and can encode the character waveform surface more effectively. The surd/sonant flag is also introduced to estimate the phase spectrum of the character waveform surface, which can improve the synthetic speech quality better. The results show that the new algorithm can make the synthetic speech quality nearly the same as the speech quality from 1.2kbit/s mixed excitation linear prediction speech coding algorithm in the case of low encoding bit-rate of 1kbit/s, and achieves a good performance.

参考文献/References:

[1] Kleijn W B.A speech coder based on decomposition of characteristic waveforms[C] //Proc ICASSP. Detroit, USA,1995:508-511.
[2] Kleijn W B,Shoham Y,Sen D,et al.A low-complexity waveform interpolation coder[C] //Proc ICASSP. Atlanta,USA,1996:212-215.
[3] 王贵平,鲍长春,张鹏.基于奇异值分解的低速率波形内插语音编码算法[J].电子学报,2006,34(1):135-140.
  Wang Guiping,Bao Changchun,Zhang Peng.Low bit rates waveform interpolation speech coding based on singular value decomposition[J].Acta Electronica Sinica,2006,34(1):135-140.(in Chinese)
[4] 王晶,匡镜明,谢湘.基于小波变换的2.4kbit/s 波形内插语音编码算法[J].通信学报,2007,28(5):43-48.
  Wang Jing,Kuang Jingming,Xie Xiang.Waveform interpolation speech coding algorithm at 2.4kbit/s based on wavelet transform[J].Journal on Communications,2007,28(5):43-48.(in Chinese)
[5] 张鹏,鲍长春,郭莉莉.基于非负矩阵分解的2kb/s波形内插语音编码算法[J].电子学报,2008,36(4):632-638.
  Zhang Peng,Bao Changchun,Guo Lili.2kb/s waveform interpolation speech coding based on non-negative matrix factorization[J].Acta Electronica Sinica,2008,36(4):632-638.(in Chinese)
[6] 郭莉莉,鲍长春.基于贝叶斯阴阳机的2kb/s NMFCWI 语音编码算法[J].电子学报,2009,37(5):1146-1153.
  Guo Lili,Bao Changchun.2kb/s Bayesian yin-yang waveform interpolative speech coding based on non-negative matrix factorization[J].Acta Electronica Sinica,2009,37(5):1146-1153.(in Chinese)
[7] Lee D D,Seung H S.Learning the parts of objects by non-negative matrix factorization[J].Nature,1999,401(6755):788-791.
[8] Zhang Taiping,Fang Bin,Tang Yuanyan,et al.Topology preserving non-negative matrix factorization for face recognition[J].IEEE Transactions on Image Processing,2008,17(4):574-584.
[9] Yuan Yuan,Li Xuelong,Pang Yanwei,et al.Binary sparse nonnegative matrix factorization[J].IEEE Transactions on Circuits and Systems for Video Technology,2009,19(5):772-777.
[10] Kameoka H,Ono N,Kashino K,et al.Complex NMF:a new sparse representation for acoustic signals[C] //Proc ICASSP.Taipei,China,2009:3437-3440.
[11] Lee D D,Seung H S.Algorithms for non-negative matrix factorization[M] // Advance in Neural Information Processing.Cambridge,MA,USA:MIT Press,2001,13:556-562.
[12] Federal Information Processing Standards Publication.Specifications for the analog to digital conversion of voice by 2400 bit/second mixed excitation linear prediction[R].Federal Information Processing Standards Publication,1998.
[13] Tian Wang,Koishida K,Cuperman V,et al.A 1200 bps speech coder based on MELP[C] //Proc ICASSP. Istanbul,Turkey,2000:1375-1378.

相似文献/References:

[1]黄钢石,张亚非,陆建江,等.一种受限非负矩阵分解方法[J].东南大学学报(自然科学版),2004,34(2):189.[doi:10.3969/j.issn.1001-0505.2004.02.011]
 Huang Gangshi,Zhang Yafei,Lu Jianjiang,et al.Constrained factorization method for non-negative matrix[J].Journal of Southeast University (Natural Science Edition),2004,34(4):189.[doi:10.3969/j.issn.1001-0505.2004.02.011]

备注/Memo

备注/Memo:
作者简介: 汤一彬(1982—),男,博士生; 吴镇扬(联系人),男,教授,博士生导师,zhenyang@seu.edu.cn.
基金项目: 国家自然科学基金资助项目(60971098).
引文格式: 汤一彬,吴海洋,吴镇扬.基于非负矩阵分解的1kbit/s波形内插语音编码算法[J].东南大学学报:自然科学版,2010,40(4):670-675. [doi:10.3969/j.issn.1001-0505.2010.04.002]
更新日期/Last Update: 2010-07-20