speechtools

所属分类:语音合成
开发工具:matlab
文件大小:148KB
下载次数:529
上传日期:2006-11-02 19:29:38
上 传 者amber1026
说明:  我是做语音识别的,这是我做的一个关于端点检测的小程序,里面还有一个工具包
(I have to do speech recognition, which is what I am doing on a small endpoint detection procedures, there is also a tool kits)

文件列表:
新建文件夹\jieshuduandian.m (404, 2004-05-20)
新建文件夹\qishiduandian.m (419, 2006-09-06)
新建文件夹\speech_tool\ascii_read.m (397, 1996-12-17)
新建文件夹\speech_tool\atolsf.m (481, 1996-12-17)
新建文件夹\speech_tool\autocorr.c (1182, 1994-01-13)
新建文件夹\speech_tool\autocorr.m (377, 1996-12-17)
新建文件夹\speech_tool\avsmooth.m (215, 1996-12-17)
新建文件夹\speech_tool\bpf.m (576, 1996-12-17)
新建文件夹\speech_tool\column.m (72, 1996-12-17)
新建文件夹\speech_tool\cut_view.m (841, 1996-12-17)
新建文件夹\speech_tool\degrade.m (566, 1996-12-17)
新建文件夹\speech_tool\DR4_MLJH0_SX334.ADC (49154, 1996-12-17)
新建文件夹\speech_tool\durbin.c (1414, 1994-01-13)
新建文件夹\speech_tool\durbin.m (604, 1996-12-17)
新建文件夹\speech_tool\end_point.m (1498, 1997-07-15)
新建文件夹\speech_tool\end_point.m~ (1487, 1996-12-17)
新建文件夹\speech_tool\energy.m (167, 1996-12-17)
新建文件夹\speech_tool\enhance.m (4875, 1996-12-17)
新建文件夹\speech_tool\figs.m (147, 1996-12-17)
新建文件夹\speech_tool\formants2.m (1071, 1996-12-17)
新建文件夹\speech_tool\frame.m (967, 1996-12-17)
新建文件夹\speech_tool\fres.m (273, 1996-12-17)
新建文件夹\speech_tool\getword.m (473, 1996-12-17)
新建文件夹\speech_tool\hpf.m (448, 1996-12-17)
新建文件夹\speech_tool\input_area.m (854, 1996-12-17)
新建文件夹\speech_tool\input_call.m (3928, 1996-12-17)
新建文件夹\speech_tool\interpolate.m (855, 1996-12-17)
新建文件夹\speech_tool\is.m (2642, 1996-12-17)
新建文件夹\speech_tool\lpc_spectrum.m (587, 1996-12-17)
新建文件夹\speech_tool\lpf.m (450, 1996-12-17)
新建文件夹\speech_tool\lsftoa.m (473, 1997-08-18)
新建文件夹\speech_tool\lsf_gr.m (1283, 1996-12-17)
新建文件夹\speech_tool\matlab_file (101810, 1996-12-17)
新建文件夹\speech_tool\modify.m (415, 1996-12-17)
新建文件夹\speech_tool\modify_neutral.m (774, 1996-12-17)
新建文件夹\speech_tool\open.m (318, 1996-12-17)
新建文件夹\speech_tool\opensd.m (348, 1996-12-17)
新建文件夹\speech_tool\pitch_change.m (977, 1996-12-17)
新建文件夹\speech_tool\pitch_fft.m (451, 1996-12-17)
新建文件夹\speech_tool\pitch_sift.m (3229, 1996-12-17)
... ...

This directory contains a set of speech analysis functions written in MATLAB. It also provides a GUI script "speech.m", which allows the user to work on a specific speech file using the mouse. Here is the menu structure for speech.m script. If you do not want to read through this you can just go ahead and type (It requires some MATLAB signal processing functions to run properly) >> speech in your matlab window. After that you will see a menu. Select the "Open file" option under "Workspace" menu item. You will see DR4_MLJH0_SX334.ADC file at the top of the list if you are running matlab from the same directory that you ftp'd this software into. Select this binary speech file. Then you can play with it by choosing various speech analysis functions. If you would like to print one of the figures that you created you can select the "Print" option under "Workspace" menu, you will be asked to click with your mouse on the figure that you would like to print. After you click it will automatically print the figure from the default printer which is "hp" in my case. But if you would like to change your printer option go under the menu "Options". A new window will pop up. You will see the default parameter settings. You can change your print command or any other settings. Do not be scared try out everything. If you have problems or questions let me know. My name and e-mail are : Levent Arslan : larslan@ee.duke.edu Here is the menu structure and also short info about the functions: The menu architecture is as follows: Stars(*) indicate top level menu items Dashes(-) indicate submenu items *Workspace -Open file -Save file -Load original -Print file -Quit *Analysis -Time Waveform -FFT-Mag -FFT-Phase --Wrapped --Unwrapped -Spectrogram --(2-D) --(3-D) -Welch Spectrum -Autocorrelation -Cepstrum --Complex --Real -Play Speech --(1 time) --(5 times) -Pitch Contour -Energy Contour -Formant Contour -Frame function *Enhancement -Filter -Wiener filter -Spectral subtraction -Play enhanced -Play original -Load enhanced -Load original -Itakura-Saito(orig-deg) -Itakura-Saito(orig-enh) *Speech-Production -Area function analysis *View -Cut&View -Load cut piece -Load original -Play cut piece *Frame Analysis -Time waveform -FFT Spectrum -Welch Spectrum -LPC Spectrum -LSF frequencies -Pitch(time) -Pitch(fft) -Play *Options -Parameters *Filters -Bandpass -Lowpass -Highpass Most of the functions make use of selections with mouse button on the graphics window. The environment is very user friendly. For example each time you would like to save or print a graph that you generated you are asked to only click on the graph window that you would like to save or print. Or in the design of a lowpass filter for instance, you can specify the cut-off frequency just by clicking on the value on the frequency axis. Besides you can delete or overlay graphs with mouse clicks. You can analyze any particular region in the speech waveform by clicking on the region. You have the option of changing the speech parameters like LPC filter order or window length to your wish(They all have reasonable default values). Table of functions: % [lsf]=atolsf(a) computes the line spectral frequencies from a given set % of prediction coefficients % [R] = autocorr(s,P) % where s is the input vector, and P is the order of prediction. % Function to compute the autocorrelation of the data % computes autocorrelation R(i) for i=1, .. ,P+1. (Mex function) % [b] = bpf(CUT_OFF1,CUT_OFF2,ORDER) % The design of a band-pass filter with passband frequency range % of CUT_OFF1 - CUT_OFF2 and order of ORDER. The desired and actual % spectrums are displayed. % CUT_OFF1 and CUT_OFF2 are in terms of normalized frequencies. % cut_view % A menu item for speech_gui graphics user interface for speech analysis % By clicking the mouse button twice you can determine the begin and end % samples for the extraction. % [Y] = degrade(X,SNR,NOISE) % adds SNR dB NOISE to the speech signal X and returns the noisy signal in Y % By default SNR=10dB and NOISE is AWGN. % [a]=durbin(R); % Function to calculate the linear predictive coefficients a, from % autocorrelation lags R. (Mex function) % [e] = energy(s); % Function to calculate the energy in a signal s. % figs(i) creates small size figures sequentially. % [f] = formants2(x,RO,NUM_FORMANTS,LPC_ORDER) % Function to estimate the NUM_FORMANTS formants of voiced speech x, % with LPC_ORDER order LPC analysis and peak picking. RO is a parameter % that varies between 0 and 1 and it is multiplied by each lpc coeff. to % make the peaks clearer. By default it is 0.6. % [y] = frame(x,func,l,step) % where y is output on a frame by frame basis, x is input speech, % and l is the window size. l and step are optional parameters, % by default l is 256, and step is 128. % func is a string e.g. 'pitch' that determines a function that you want % to apply to x on a short-time basis. % [h] = fres(b) % This function calculates the frequency response of b and plots the % magnitude response from 0 to pi. % [b] = hpf(CUT_OFF,ORDER) % The design of a high-pass filter with normalized cut-off frequency CUT_OFF % and order of ORDER. The desired and actual spectrums are displayed. % [MAG] = input_area(DIV,VOC_LENGTH) % Function to calculate the frequency response of a tube from % its area function. % DIV is the number of divisions you want to have in the total % vocal tract length. % VOC_LENGTH is the total vocal tract length, by default it is 17.6cm % MAG is the magnitude of the frequency response. % Related routines: t_solve, t_line % input_call % Graphic user interface for speech production % Usage of slider boxes check boxes etc. in fast parameter adjustment % [d,dw,ds,dws] = is(x,p) % returns the Itakura-Saito distance on a frame by frame basis % in variable d. The algorithm is based on the method described on pp. % 50 of Quackenbush(Objective Measures of Speech Quality). % x: original speech variable(file) % p: processed speech variable(file) % d: IS distance measure between x and p. % dw: energy weighted IS distance measure between x and p. % dw: speech only IS distance measure between x and p. % dw: energy weighted speech only IS distance measure between x and p. % [H] = lpc_spectrum(x,LPC_ORDER) % where x is input speech, and LPC_ORDER is the order of LPC analysis. % [b] = lpf(CUT_OFF,ORDER) % The design of a low-pass filter with normalized cut-off frequency CUT_OFF % and order of ORDER. The desired and actual spectrums are displayed. % [a]=lsftoa(lsf) computes the prediction coefficients from a given set % of line spectral frequencies % lsp(s,P,is_pre) plots the line-spectral frequencies along with LPC % spectrum where % s is the original speech % P is the number of line-spectral frequencies % is_pre[0] is whether you want pre-emphasis or not. If you want % to pre-emhasize use a value of 1. % [f0] = pitch_sift(x,THRESHOLD,FRAME_SIZE,STEP_SIZE) % computes the pitch contour of the speech file x % THRESHOLD is an optional argument between 0 and 1 for voiced-unvoiced % decision % FRAME_SIZE is the frame size(non-overlapping) for the short time analysis. % STEP_SIZE is the step size between successive frames. % Pitch estimation by Simplified Inverse Filter Tracking(SIFT) algorithm. % plt % Function to plot vectors % Unlike PLOT function it automatically sets the x-axis limits such % there will be no space left at the end of the graph % [x] = read(infile,type) % This function reads the binary data of type format in the file specified % by infile to the vector x. Examples: % say(x) % Where x is the speech variable that you want to listen to % If your computer does not have a audio output and if you have audio % equipment such as DATLINK or GRADIENT % [y] = spec_sub(x,l) % where y is output speech, x is input speech, and l is the window size % Function that allows you to do spectral subtraction on a short time basis % using overlapping frames. % [y] = spectrogram(x,NFFT,COLORS) % Plots the 3-D spectrogram of the variable x, with a frame size % of NFFT, and 50 percent overlap. For the frequency spectrum % Welch Power Spectral estimate is used. % COLORS is an optional argument. If you want to print the plot % you can specify this as [0 0 0] which corresponds to 'black' color, % which requires less memory for the plot. % In this case you will not be able to see the graph on the screen % since the background is also black, but when you print it out % it will come out O.K. % speech.m % Graphic User interface menu for the speech analysis. % [s] = syn_vowel(A,tract_length) % Synthesizes a vowel s with the given Area function A. % Tract length is optional. It is set default to 17.5 cm which is % the average vocal tract length for male speakers. % [V] = t_filter(A,lip_rad) % Calculates the power spectrum corresponding to the area function A, and % lip radiation lip_rad. % [A,B] = t_line(L,R,F) % R radius of the cross-section cm % L length of tube % F is the frequency vector for the freq axis e.g. 10:10:4000 % will return T-line model parameters A and B % Relevant info in pp 29, 33, 37 of Acoustic Theory of Speech by Gunnar Fant % [MAGH] = t_solve(A,L,VOC_LENGTH); % This function will generate the frequency response MAGH for concatenated % series of tubes of different lengths(L) and cross-section areas(A) % By default lengths of tubes are assumed equal. % VOC_LENGTH is an optional argument to specify vocal tract length % Reference: Fant, Acoustic Theory of Speech Production, pp37 % [y] = unconstrained(x,iternum,beta); % where x is degraded input signal, iternum is the number of iterations % to be performed, and y is the output signal. % beta is the noise suppression factor[0.2-2] the higher value means % more suppression % This function performs unconstrained iterative speech enhancement % based on the MAP estimate of the original signal. % [y] = wiener(x,Ps_w,N_w,beta) % where x is the input signal, Ps_w is power spectrum of x, N_w is % power spectrum of noise, beta is the order of the filter, and % y is the output signal. % This function performs Wiener filtering operation on degraded input signal % to clear it from noise. % write(outfile,x,type) % This function writes x vector to filename specified by outfile in type % binary data format. By default type is 'short'. Copyright 1995 Levent M. Arslan No guarantees are given with this software. If you would like modify or change funtions please let me know. If you have suggestions also let me know. Levent M. Arslan Duke University EE Grad Student. email: larslan@ee.duke.edu Phone: (919) 660-5247 Enjoy it!!!!

近期下载者

相关文件


收藏者