qio

所属分类:Linux/Unix编程
开发工具:C/C++
文件大小:1910KB
下载次数:4
上传日期:2009-10-21 06:49:42
上 传 者handbroken
说明:  Audio cancellation source code

文件列表:
aurora-front-end (0, 2009-05-31)
aurora-front-end\qio (0, 2007-12-27)
aurora-front-end\qio\parameters (0, 2006-11-11)
aurora-front-end\qio\parameters\lda (0, 2006-11-11)
aurora-front-end\qio\parameters\lda\30-causal-noise_clean+ds_filter.fil (506, 2004-03-31)
aurora-front-end\qio\parameters\vad (0, 2006-11-11)
aurora-front-end\qio\parameters\vad\net.tim-fin-tic-it-spn-rand.54i+50h+2o.0-delay-wiener+dct+lpf.wts.head (27042, 2004-03-31)
aurora-front-end\qio\parameters\vad\tim-fin-tic-it-spn-rand.0-delay-wiener+dct+lpf.norms (117, 2004-03-31)
aurora-front-end\qio\parameters\vad\vad-lpf.fil (51, 2004-03-31)
aurora-front-end\qio\parameters\vad\tim-fin-tic-spn-rand.win20-mel-delay+dct+lpf.norms (120, 2004-10-30)
aurora-front-end\qio\parameters\vad\tim-fin-tic-spn-rand.mel-delay+dct+lpf.norms (120, 2004-04-02)
aurora-front-end\qio\parameters\vad\net.tim-fin-tic-spn-rand.54i+50h+2o.mel-delay+dct+lpf.wts.head (27086, 2004-04-02)
aurora-front-end\qio\parameters\vad\net.tim-fin-tic-spn-rand.54i+50h+2o.win20-mel-delay+dct+lpf.wts.head (27017, 2004-10-30)
aurora-front-end\qio\parameters\traps (0, 2006-08-10)
aurora-front-end\qio\parameters\traps\merger_norm_30 (1580, 2004-03-31)
aurora-front-end\qio\parameters\traps\tnn_30_1bands_crb0.norms (575, 2004-03-31)
aurora-front-end\qio\parameters\traps\tnn_30_1bands_crb0.weights (35655, 2004-03-31)
aurora-front-end\qio\parameters\traps\merger_weight_60 (184635, 2004-03-31)
aurora-front-end\qio\parameters\traps\tnn_30_1bands_crb10.norms (570, 2004-03-31)
aurora-front-end\qio\parameters\traps\tnn_30_1bands_crb10.weights (35613, 2004-03-31)
aurora-front-end\qio\parameters\traps\tnn_60_1bands_crb0.weights (64434, 2004-03-31)
aurora-front-end\qio\parameters\traps\tnn_30_1bands_crb11.norms (568, 2004-03-31)
aurora-front-end\qio\parameters\traps\tnn_30_1bands_crb11.weights (35604, 2004-03-31)
aurora-front-end\qio\parameters\traps\tnn_60_1bands_crb0.norms (1152, 2004-03-31)
aurora-front-end\qio\parameters\traps\tnn_30_1bands_crb12.norms (572, 2004-03-31)
aurora-front-end\qio\parameters\traps\tnn_30_1bands_crb12.weights (35610, 2004-03-31)
aurora-front-end\qio\parameters\traps\tnn_60_1bands_crb1.weights (64438, 2004-03-31)
aurora-front-end\qio\parameters\traps\tnn_30_1bands_crb13.norms (568, 2004-03-31)
aurora-front-end\qio\parameters\traps\tnn_30_1bands_crb13.weights (35610, 2004-03-31)
aurora-front-end\qio\parameters\traps\tnn_60_1bands_crb1.norms (1162, 2004-03-31)
aurora-front-end\qio\parameters\traps\tnn_30_1bands_crb14.norms (573, 2004-03-31)
aurora-front-end\qio\parameters\traps\tnn_30_1bands_crb14.weights (35587, 2004-03-31)
aurora-front-end\qio\parameters\traps\tnn_60_1bands_crb2.weights (64397, 2004-03-31)
aurora-front-end\qio\parameters\traps\tnn_30_1bands_crb1.norms (584, 2004-03-31)
aurora-front-end\qio\parameters\traps\tnn_30_1bands_crb1.weights (35651, 2004-03-31)
aurora-front-end\qio\parameters\traps\tnn_60_1bands_crb2.norms (1156, 2004-03-31)
aurora-front-end\qio\parameters\traps\tnn_30_1bands_crb2.norms (576, 2004-03-31)
aurora-front-end\qio\parameters\traps\tnn_30_1bands_crb2.weights (35663, 2004-03-31)
... ...

The Qualcomm-ICSI-OGI (QIO) Aurora front end ============================================ The qio/ directory contains the code and parameter files of the QIO front end. This is the front end described in the ICSLP 2002 paper by Adami et al., which is available on the ICSI Speech Group publications page. The full front end produces 51 features including 6 TRAPS-based features. The version of the front end submitted to the ETSI committee in January 2002 was different: it did not have the 6 TRAPS-based features and so only produced 45 features. That version is referred to as QIO-NoTRAPS in the ICSLP 2002 paper. It is described in depth in the document QIO-NoTRAPS.pdf (Adobe PDF format), which is a copy of the documentation submitted to the ETSI Aurora committee. The document NoiseReduction.pdf goes into additional detail about the noise reduction component. To compile the software, cd to the qio/src directory and run 'make' or 'gmake'. The QIO front end consists of the auroracalc_front and auroracalc_back programs. These represent terminal-side and server-side processing in a distributed speech recognition (DSR) environment. A third tool, coder, was run between auroracalc_front and auroracalc_back in order to simulate compression and quantization for transmission. This is the historical reason for the split of the front end into auroracalc_front and auroracalc_back. For copyright reasons, the coder tool is not included in this archive. The recognition accuracy results in the ICSLP 2002 paper and the QIO-NoTRAPS.pdf document may be slightly affected by the quantization performed by the coder tool. The auroracalc_front and auroracalc_back tools have many command line options. Running the tools with no command line arguments will produce an options list. Unfortunately, some of these options are poorly tested, obsolete or ineffectual; this is a legacy of the fact that many options were created for experimental purposes during the development of these tools. Also, some options included in the lists for both the auroracalc_front and auroracalc_back tools are in fact only applicable to one of those tools. Please refer to the files in the scripts/ directory for guidance about normal options for running these tools. To obtain only the 45 QIO-NoTRAPS features, suppress the TRAPS-based features by running the auroracalc_back binary as "auroracalc_back -TRAPS 0 ...". To obtain all 51 features, enable the TRAPS-based features by running it as "auroracalc_back -TRAPS 1 ...". A KLT is applied to the length-51 feature vector before it is passed to the HMM back end; this step was accidentally omitted from the system description in the ICSLP 2002 paper. The ETSI Aurora benchmarking rules required the front end to process each user utterance independently. Thus the front end does not keep any information from one user utterance to the next. The ETSI Aurora rules also required that the front end have a low algorithmic latency, or in other words, that the algorithms used in the front end do not require more than a small delay between the arrival of input samples and the generation of the corresponding output frames. This has affected the design of some features such as the mean and variance normalization. See QIO-NoTRAPS.pdf for an analysis of the front end's algorithmic latency. Standalone VAD (Voice Activity Detection) and noise reduction ============================================================= It is sometimes useful to use the VAD and noise reduction components of the front end without using the entire front end. There are programs for this purpose under the qio/ directory: silence_flags (silence_flags.c), nr (nr.c), and drop (drop.c). silence_flags and nr -------------------- The program silence_flags performs VAD and outputs an ASCII file of '1' or '0' "silence" flags (one for every frame in the input audio file; frame step is 10 ms) which label frames as nonspeech or speech respectively (so '1' stands for a frame judged to be nonspeech and '0' stands for a frame judged to be speech). (The name "silence" is a misnomer for nonspeech since there may be noise even when there is no speech.) The nr program performs the Wiener filter noise reduction on an input waveform and outputs a noise-reduced waveform. This depends on a noise estimate made over frames judged to be nonspeech. In the QIO front end the noise estimate is initialized from the beginning of each utterance, under the assumption that each utterance starts with a period of nonspeech, and updated using later frames of the utterance judged to be nonspeech based on an energy threshold. (See section 2.7.1 of QIO-NoTRAPS.pdf.) Noise estimation is done in a causal fashion, so that each frame is noise-reduced using a noise estimate which does not depend on future frames. When the "-Ssilfile " command line option, this program performs noise estimation in a different way: it reads in a VAD flags file (as created by silence_flags.c) for each utterance and then noise-reduces the utterance using a single noise estimate calculated over all the frames labeled nonspeech by the silence flag file. This alternate style of noise estimation may be preferable if not every utterance starts with a sufficiently long period of nonspeech (see section 2.7.1 of QIO-NoTRAPS.pdf for information on what "sufficiently long" means). The Wiener filter imposes a spectral floor, but this floor is low, so it can be partly lost in 16-bit integer output. Past experience shows that the use of 16-bit output from the Wiener filter can lower ASR performance, perhaps because of the use of the logarithm function (which is volatile near 0) in ASR feature extraction. To output the resynthesized audio after noise reduction in (headerless) floating point, instead, use the command line option "-Ssynthfloat 1". In tests on the Aurora 2 task, using an MFCC front end with cepstral mean subtraction and Asela Gunawardana's "complex" back-end configuration, using 16-bit output from nr resulted in 72.9% average word accuracy and using floating point output resulted in 76.7% average word accuracy. Sometimes the output file from nr will be slightly shorter than the input file. The difference should always be less than the length of the frame step (10 ms). Memory usage for nr can be a problem with large input files. The tool does each stage of procesing for every frame in the input file before moving to the next stage of processing. Peak memory usage occurs when the input magnitude spectrogram, input phase spectrogram, noise estimate magnitude spectrogram, smoothed noise estimate magnitude spectrogram, and noise-reduced magnitude spectrogram are all held in memory at once. The memory usage for each spectrogram is approximately (number of seconds of input data) * (100 frames/second) * (NFFT/2 floats/frame) * (4 bytes/float) where NFFT = 256 at 8000 Hz sampling rate and 512 at the 11000 and 16000 Hz sampling rates. For example, a 10-minute file at 16000 Hz would result in an nr peak memory usage of approximately (600 seconds) * (100 frames/second) * (256 floats/frame) * (4 bytes/float) * (5 spectrograms) =~ 300 megabytes See scripts/silenceflag_script and scripts/noisered_script for examples of the use of these tools, including the use of command line options to set the sampling rate and big/little endianness of the input data. (Confusingly, there are some command line options for the silence_flags or nr tool which are just inherited from the auroracalc_front and auroracalc_back source code. Not all of these options have been tested with silence_flags and nr; some of them will may actually have no effect when used with silence_flag or nr.) silence_flags and nr were used for the ICSI-SRI-UW entry in the 2004 NIST Meeting Transcription evaluation, described in the ICSLP 2004 paper by Mirghafori et al. The scripts scripts/silenceflag_script and scripts/noisered_script are related to the ones used for that evaluation. Note that the VAD used by silence_flags has not been optimized for meeting data (the training data used for the VAD MLP was from the Aurora 2 corpus, which is TIDIGITS with noise and convolutional distortion added, and from SpeechDatCar) and that it is possible to use your own VAD techniques to create flag files for nr. Florian Metze of ISL improved his nr performance on meeting data by using nr with his own VAD which simply computed the power for all frames of the recording, smoothed the power curve by low-pass filtering across frames, and then marked the 5% or so frames with the lowest power as silence. drop ---- drop is a simple program which can be used with the output of silence_flags to apply frame dropping (dropping of nonspeech frames) to HTK feature files. It takes as input an HTK feature file and an ASCII file of flags as produced by silence_flags, and produces a new HTK feature file with the frames labeled as nonspeech removed. (The ICSI tool feacat, found in ICSI's SPRACHcore software package, can be used to convert between feature file formats such as HTK, SRI, and ICSI pfile.) silence_flags and drop were used to implement frame-dropping for the ICSLP 2002 paper by Kleinschmidt and Gelbart. AURORACALC environment variable =============================== Before using the tools (except for drop which doesn't require this) the AURORACALC environment variable must be set to the full path of the installation directory aurora-front-end/qio. This is so the tools can load parameter files (like multi-layer perceptron weights for VAD and TRAPS). Running the tools with no arguments will produce a list of command line options. If there is a space inside a directory name in AURORACALC (such as in the name of the "Documents and Settings" directory under Windows XP), that could cause problems with scripts/auroracalc_script and perhaps with other code as well. VAD training ============ The vad-training/ directory contains code, script, and label files used for training the VAD (voice activity detection) MLP (multi-layer perceptron) using ICSI's Quicknet MLP tools. These are only needed if you wish to re-train the VAD. Technical support for users =========================== This package has been tested under Red Hat Enterprise Linux 3.0 and 4.0 using an x86 CPU. Under Windows, the best option is probably to compile the software using the free Cygwin tools. This package has been tested under Windows XP using Cygwin. We hope this package is portable to other platforms, but some modification might be necessary. The file qio/src/libogi/fe_util.h defines the x_int32 and x_int16 types which are used in the definition of HTK feature file header fields. If it is not true on your system that sizeof(int) == 4 and sizeof(short) == 2, you must edit the definitions of x_int32 and x_int16. Note that we have not tested this code on systems where it is not true that sizeof(int) == 4 and sizeof(short) == 2, so be sure to test after editing (see below for information about testing). Users' questions about this software can be directed to the ICSI Speech Group's tools forum at http://groups.yahoo.com/group/icsi-speech-tools. There is no guarantee, however, that questions will be answered. Users should also be aware that these tools were originally created for internal use by the developers themselves, and as a result the user documentation consists only of some README files and the command line option information printed when the tools are run with no arguments. Users may find it necessary to refer to provided design documentation and source code in order to fully understand the use of these tools. Installation verification ========================= The test/ directory contains sample input and output files. The output files can be compared against after installation as a check that installation has proceeded correctly.

近期下载者

相关文件


收藏者