Hu-Wang

所属分类:matlab编程
开发工具:matlab
文件大小:1070KB
下载次数:88
上传日期:2014-12-23 21:12:12
上 传 者xingdian11
说明:  Matlab 计算听觉场景分析的语音分离
(computational auditory scene analysis of speech separation in Matlab )

文件列表:
HuWang.dsp (3401, 2007-03-12)
HuWang.dsw (537, 2007-03-12)
HuWang.exe (163890, 2007-01-25)
HuWang.h (4776, 2007-01-25)
HuWang.ncb (41984, 2007-03-15)
HuWang.opt (48640, 2007-03-15)
HuWang.plg (2265, 2007-03-12)
Hu-Wang.tnn04.pdf (925593, 2007-01-25)
pitch.cpp (11393, 2007-01-25)
pitch.h (5218, 2007-01-25)
segment.cpp (4382, 2007-01-25)
segment.h (3399, 2007-01-25)
tool.cpp (2930, 2007-01-25)
tool.h (439, 2007-01-25)
Untitled.asv (147, 2007-01-25)
v3n3.dat (100883, 2007-01-25)
data.h (1863, 2007-01-25)
feature.cpp (13874, 2007-01-25)
feature.h (4644, 2007-01-25)
group.cpp (5876, 2007-01-25)
group.h (1283, 2007-01-25)
v3n3mask.dat (37152, 2007-01-25)
v3n3res1.dat (288333, 2007-01-25)
v3n3res.dat (288333, 2007-01-25)
HuWang.cpp (4358, 2007-01-25)

This C++ program is a software implementation of the voiced speech segregation algorithm described in detail in the appendix of the following book chapter: Guoning Hu and DeLiang Wang (2006): "An auditory scene analysis approach to monaural speech segregation," In Topics in Acoustic Echo and Noise Control edited by E. Hnsler and G. Schmidt. Springer, Heidelberg, pp. 485-515. This algorithm is a simplified and slightly improved version of their 2004 IEEE Trans. on Neural Networks article. You may run the program with the following command: HuWang input output "input" is a text file that contains the waveform of input, i.e., a speech mixture. "output" is a text file that contains the waveform of output, i.e., resynthesized target speech. If you want to store the final speech stream, i.e., the binary mask for resynthesis (see the chapter for more details), you may run the program with the following command: HuWang input output mask "mask" is a text file that stores the binary mask. Each line of the file corresponds to the time-frequency units in a time frame, from filter channel 1 to filter channel 128. We have also included some sample files here. "v3n3.dat": a mixture of a male voice and a "cocktail party" noise. "v3n3res.dat": the corresponding resynthesized speech. "v3n3mask.dat": the corresponding mask. We have included the source codes here. This program is developed on Microsoft visual C++ 6.0. The 'HuWang.exe' file is built with the following settings: SAMPLING_FREQUENCY: the sampling frequency of input, which is set to be 16000 MAX_SIG_LENGTH: the maximum number of samples in input, which is set to be 200000 (the actual size of input, can be smaller than the number, but cannot be larger). If you need to change these numbers, specify them in the file "data.h", and then re-build 'HuWang.exe'. We distribute this program freely, but please cite the paper if you have made any use of this program. ------------------------------------------------------------ Dr. Guoning Hu Perception and Neurodynamics Lab. The Ohio State University 2015 Neil Ave. Columbus, OH 43210-1277, U.S.A. Email: hu.117@osu.edu Phone: 614-292-7402 URL: http://www.cse.ohio-state.edu/~hu

近期下载者

相关文件


收藏者