时频字典学习的单通道语音增强算法Single channel speech enhancement via time-frequency dictionary learning
黄建军;张雄伟;张亚非;邹霞;
摘要:
针对以往语音增强算法在非平稳噪声环境下性能急剧下降的问题,基于时频字典学习方法提出了一种新的单通道语音增强算法。首先,提出采用时频字典学习方法对噪声的频谱结构的先验信息进行建模,并将其融入到卷积非负矩阵分解的框架下;然后,在固定噪声时频字典情况下,推导了时变增益和语音时频字典的乘性迭代求解公式;最后,利用该迭代公式更新语音和噪声的时变增益系数以及语音的时频字典,通过语音时频字典和时变增益的卷积运算重构出语音的幅度谱并用二值时频掩蔽方法消除噪声干扰。实验结果表明,在多项语音质量评价指标上,本文算法都取得了更好的结果。在非平稳噪声和低信噪比环境下,相比于多带谱减法和非负稀疏编码去噪算法,本文算法更有效地消除了噪声,增强后的语音具有更好的质量。
关键词:
基金项目: 江苏省自然科学基金资助项目(BK2009059)
通讯作者:
Email:
参考文献:
- 1 YANG Lu,Loizou P C.A geometric approach to spectral subtraction.Speech Communication,2008;50:453-466
- 2 HAO Jiucang,Hagai Attias,Srikantan Nagarajan,Te-Won Lee,Terrence J.Sejnowski.Speech enhancement,gain,and noise Spectrum adaptation using approximate bayesian estimation. IEEE Trans,on Audio,Speech and Language Processing,2009;17(1):24-37
- 3 Kris Hermus,Patrick Wambacq,van Hamme H.A review of signal subspace speech enhancement and its application to noise robust speech recognition.EURASIP Journal on Advances in Signal Processing,2007:1-15
- 4邹霞,陈亮,张雄伟.一种基于Gamma语音模型的语音增强算法.通信学报,2006;27(10):118-123
- 5吴红卫,俞一彪,吴镇扬.基于Laplace-Gauss模型和简化相位判别的离散余弦变换域语音增强.声学学报,2008;33(3): 244-251
- 6 Sriram Srinivasan,Jonas Samuelsson,Bastiaan Kleijn W. Codebook-based bayesian speech enhancement for nonsta-tionary environments.IEEE Trans,on Audio,Speech and Language Processing,2007;15(2):441-452
- 7 Hiroko Kato Solvang,Yuichi Nagahara,Shoko Araki,Hiroshi Sawada,Shoji Makino.Frequency-domain pearson distribution approach for independent component analysis (FD-Pearson-ICA) in blind source separation.IEEE Trans,on Audio,Speech and Language Processing,2009; 17(4):639-649
- 8 Jerome Bobin,Jean-Luc Starck,Jalal M.Fadili,Yassir Moudden,David L.Donoho.Morphological component analysis:an adaptive thresholding strategy.IEEE Trans. on image processing,2007;16(11):2675-2681
- 9 Romain Hennequin,Roland Badeau,Bertrand David. NMF with time-frequency activations to model non stationary audio events.IEEE Trans,on Audio,Speech and Language Processing,2010;19(4):744-753
- 10 XU Tao,WANG Wenwu.A block-based compressed sensing method for underdetermined blind speech separation incorporating binary mask.ICASSP,2010:2022-2025
- 11 Wilson K,Raj B,Smaragdis P,Divakaran A.Speech denoising using nonnegative matrix factorization with priors. ICASSP,2008:4029-4032
- 12 Schmidt M N,Jan Larsen,Fu-Tien Hsiao.Wind noise reduction using non-negative sparse coding.IEEE Workshop on Machine Learning for Signal Processing,2007:431- 436
- 13 Sigg C D,Tomas Dikk,Buhmann J M.Speech enhancement with sparse coding in learned dictionaries.ICASSP, 2010:4758-4761
- 14 Smaragdis P.Convolutive speech bases and their application to supervised speech separation.IEEE Trans,on Audio,Speech and Language Processing,2007;15(1):1- 12
- 15 Lee D D,Seung H S.Learning the parts of objects by non-negative matrix factorization.Nature,1999;401:788- 791
- 16蔡泽民,赖剑煌.一种基于超完备字典学习的图像去噪方法.电子学报,2009;37(2):347-350
- 17 Namgook Cho,Jay Kuo C C.Sparse music representation with source-specific dictionaries and its application to signal separation.IEEE Trans,on Audio,Speech and Language Processing,2011;19(2):337-348
- 18 Ron Rubinstein,Bruckstein A M,Michael Elad.Dictionaries for sparse representation modeling.IEEE Proceedings, 2010;98(6):1045-1057
- 19 Rice University Digital Signal Processing(DSP) group, Noisex92 Noise Database,http://spib.rice.edu/spib/select. noise.html,1995
- 20 Zhu X,Beauregard G T,Wyse L L.Real-time signal estimation from modified short-time fourier transform magnitude spectra.IEEE Trans,on Audio,Speech,and Language Processing,2007;15(5):1645-1653
- 21 Virtanen T.Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria.IEEE Trans,on Audio,Speech and Language Processing,2007;15(3):1066-1074
- 22 Shao Y,Srinivasan S,Jin Z,Wang D L.A computational auditory scene analysis system for speech segregation and robust speech recognition.Computer Speech and Language, 2010;24:77-93
- 23 Loizou P C.Speech enhancement:theory and practice. Taylor and Francis,2007
- 24 Perceptual Evaluation of Speech Quality(PESQ),an objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs,ITU-T Recommendation-862,2001