stft_up_tools-master 联合开发网

Pudn.com > 下载中心 > 语音合成 > stft_up_tools-master

stft_up_tools-master

stft

所属分类：语音合成
开发工具：matlab
文件大小：590KB
下载次数：1
上传日期：2020-12-26 19:45:50
上传者：梦里Aug

说明：傅里叶变换是一种分析信号的方法，它可分析信号的成分，也可用这些成分合成信号。在分析信号时，主要应用于处理平稳信号，通过傅里叶变换可以获取一段信号总体上包含哪些频率的成分，但是对各成分出现的时刻无法得知。
(The Fourier transform is a method of analyzing a signal. It can analyze the components of a signal or synthesize a signal from those components. In the analysis of signals, it is mainly applied to the processing of stationary signals. Through the Fourier transform, it is possible to obtain the components of which frequencies a segment of signals contains in general, but it is impossible to know the time when each component appears.)

文件列表:

DATA (0, 2018-07-16)
DATA\LICENSE (1413, 2018-07-16)
DATA\s29_pbiz6p.raw (79680, 2018-07-16)
DATA\s29_pbiz6p.wav (159404, 2018-07-16)
DATA\s29_pbiz6p_enhanced_noise.wav (159404, 2018-07-16)
DATA\s29_pbiz6p_enhanced_target.wav (159404, 2018-07-16)
DATA\s2_swwp2s.wav (150044, 2018-07-16)
example1.m (7345, 2018-07-16)
example1b.m (6330, 2018-07-16)
example2.m (8138, 2018-07-16)
mfcc_up (0, 2018-07-16)
mfcc_up\abs_up_Xi.m (8695, 2018-07-16)
mfcc_up\append_deltas_up.m (1583, 2018-07-16)
mfcc_up\cms_up.m (1508, 2018-07-16)
mfcc_up\get_delta_up.m (1279, 2018-07-16)
mfcc_up\init_mfcc_HTK.m (2140, 2018-07-16)
mfcc_up\linear_up.m (1103, 2018-07-16)
mfcc_up\log_up_logN.m (1447, 2018-07-16)
mfcc_up\log_up_ut.m (5078, 2018-07-16)
mfcc_up\mfcc_spars_up.m (3574, 2018-07-16)
mfcc_up\mfcc_up.m (2901, 2018-07-16)
mfcc_up\randcg.m (354, 2018-07-16)
speech_enhancement (0, 2018-07-16)
speech_enhancement\IMCRA.m (7302, 2018-07-16)
speech_enhancement\comp_mmse.m (910, 2018-07-16)
speech_enhancement\init_IMCRA.m (4381, 2018-07-16)
speech_enhancement\nlup.m (2216, 2018-07-16)
stft (0, 2018-07-16)
stft\framing.m (1322, 2018-07-16)
stft\iframing.m (891, 2018-07-16)
stft\init_stft_HTK.m (3861, 2018-07-16)
stft\istft_HTK.m (2424, 2018-07-16)
stft\istft_stft_HTK_up.m (4522, 2018-07-16)
stft\stft_HTK.m (5222, 2018-07-16)

stft_up_tools ============= This is part of the code I used for speech enhancement and Short-Time Fourier Transform Uncertainty Propagation (STFT-UP) in my thesis and related papers. I developed the first versions while doing my Ph.D. at the Technische Universtitaet Berlin. The code therefore benefited from the previous works of Prof. Dorothea Kolossa and her students in particular Diep Huyn and Alexander Klimas. Latest version or a link to an online repository should be found here http://www.astudillo.com/ramon/research/stft-up/ Feel free to use it entirely or partially for your own code. Just reference the corresponding papers (and the repo URL for e.g. the IMCRA implementation). Current Version contains code for: **Improved Minima Controlled Recursive Averaging noise variance estimator (IMCRA)** example2.m This is a noise variance estimator based on minimum statistics and though for non-stationary additive noises. It also provides a probabilistic voice activity detection, see [1] I. Cohen, "Noise Spectrum Estimation in Adverse Environments: Improved Minima Controlled Recursive Averaging", in IEEE Trans. on Speech and Audio Processing, Vol 11 (5), pp 1063-6676, 2003 **Uncertainty Propagation for the MFCC Features** example1.m This propagates an uncertain description of the STFT of a signal into MFCC domain. The model used for STFT uncertainties is a complex Gaussian distribution. The example compares the analytic solutions for STFT-UP using the Log-normal approximation or the Unscented Transform with the Monte Carlo solution. To cite Uncertainty Propagation in general please use [2] R. F. Astudillo, "Integration of short-time Fourier domain speech enhancement and observation uncertainty techniques for robust automatic speech recognition", Ph.D. dissertation, Technische Universitaet Berlin, 2010. [[pdf](https://d-nb.info/1005939284/34)] **A MMSE-MFCC Estimator derived with STFT-UP** example2.m Here the previous STFT-UP approximation is used to propagate the residual mean square error (MSE) of the Wiener filter, thus attaining a minimum MSE (MMSE) estimator in MFCC domain (MMSE-MFCC), see [3] R. F. Astudillo, R. Orglmeister, "Computing MMSE Estimates and Residual Uncertainty directly in the Feature Domain of ASR using STFT Domain Speech Distortion Models", IEEE Transactions on Audio, Speech and Language Processing, Vol. 21 (5), pp 1023-1034, 2013 for details. This Matlab script is also thought to process a batch of files and thus allow to reproduce the front-end used in this paper. To reproduce experiments using HTK, you will also need the voicebox toolbox http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html and the patches to modify HTK to perform Uncertainty Decoding or Modified Imputation, see http://www.astudillo.com/ramon/research/stft-up **Sparsity Based Uncertainty Model and Propagation** example1b.m The approach in [2] models the uncertainty over the value of each Fourier coefficient under and additivity assumption. Here we model uncertainty over either source or noise being active under an sparsity assumption. This leads to a scaled Bernoulli uncertainty model than can also be (approximately) propagated. This is described in [4] Francesco Nesta, Marco Matassoni, Ramon Fernandez Astudillo, A FLEXIBLE SPATIAL BLIND SOURCE EXTRACTION FRAMEWORK FOR ROBUST SPEECH RECOGNITION IN NOISY ENVIRONMENTS, In 2nd International Workshop on Machine Listening in Multisource Environments (CHiME), pages 33-38, June 2013 **Propagation Through ISTFT-STFT** Propagation through a linear transformation for Gaussian variables is trivial. However, it can bring interesting advantages. Many speech processing pipelines include an intermediate STFT+STFT transformation separating speech processing and machine listening stages. The approach presented in [5] R. F. Astudillo and S. Braun and E. A. P. Habets "A Multichannel Feature Compensation Approach for Robust ASR in Noisy and Reverberant Environments", In 1st REverberant Voice Enhancement and Recognition Benchmark (REVERB) Workshop 2014. allows to propagate through this transformation when different configurations are used at the speech processing or machine listening stages. This helps aleviate estimation errors and has a positive effect on recognition rates. No example code is avaliable at this point. The idea can be tested however as part of the DIRHA-GIRD baseline, see https://github.com/ramon-astudillo/custom_fe Future code releases will also be probably moved to custom_fe since it provides a plug-and-play way of testing algorithms on HTK or Kaldi recipes. Ramn F. Astudillo, last revision July 2014

近期下载者：

相关文件：

评论：[我要评论] [举报此文件]

收藏者：