HTK-samples-3[1].3

所属分类:语音压缩
开发工具:C++
文件大小:2319KB
下载次数:461
上传日期:2005-12-14 13:27:59
上 传 者wz90250
说明:  htk的第三章例题的源代码,按照htkbook做就可以了,很快就会熟悉 htk的
(HTK example of the third chapter of the source code, according to htkbook can do it, will soon be familiar with the HTK)

文件列表:
samples (0, 2004-03-24)
samples\HTKDemo (0, 2004-03-24)
samples\HTKDemo\accs (0, 2004-03-24)
samples\HTKDemo\accs.scr (63, 2000-09-15)
samples\HTKDemo\accs\hmms.2 (0, 2004-03-24)
samples\HTKDemo\ChangeLog (212, 2000-10-13)
samples\HTKDemo\codebooks (0, 2004-03-24)
samples\HTKDemo\codebooks\currentCodebook (43971, 2000-09-12)
samples\HTKDemo\configs (0, 2004-03-24)
samples\HTKDemo\configs\DcfFormat (1665, 2000-09-12)
samples\HTKDemo\configs\directAudio.dcf (601, 2000-09-12)
samples\HTKDemo\configs\lbiPlainS1.dcf (460, 2000-09-12)
samples\HTKDemo\configs\monDiscM64S1Lin.dcf (458, 2000-09-12)
samples\HTKDemo\configs\monDiscM64S1Tree.dcf (458, 2000-09-12)
samples\HTKDemo\configs\monDiscM64S3Lin.dcf (463, 2000-09-12)
samples\HTKDemo\configs\monDiscM64S3Tree.dcf (463, 2000-09-12)
samples\HTKDemo\configs\monPlainM1S1.dcf (461, 2000-09-12)
samples\HTKDemo\configs\monPlainM1S3.dcf (464, 2000-09-12)
samples\HTKDemo\configs\monPlainM1S3FullCov.dcf (460, 2000-09-12)
samples\HTKDemo\configs\monPlainM1S3HERestPell.dcf (521, 2000-09-12)
samples\HTKDemo\configs\monPlainM4S1.dcf (459, 2000-09-12)
samples\HTKDemo\configs\monSharedM1S3.dcf (461, 2000-09-12)
samples\HTKDemo\configs\monTiedMixS1.dcf (472, 2000-09-12)
samples\HTKDemo\configs\monTiedMixS3.dcf (496, 2000-09-12)
samples\HTKDemo\configs\rbiPlainS1.dcf (461, 2000-09-12)
samples\HTKDemo\configs\triDiscM64S3.dcf (462, 2000-09-12)
samples\HTKDemo\configs\triDiscM64S3HSmooth.dcf (563, 2000-09-12)
samples\HTKDemo\configs\triPlainS1.dcf (455, 2000-09-12)
samples\HTKDemo\configs\triSharedS1.dcf (536, 2000-09-12)
samples\HTKDemo\configs\triTiedMixS1.dcf (503, 2000-09-12)
samples\HTKDemo\configs\triTiedMixS1HSmooth.dcf (609, 2000-09-12)
samples\HTKDemo\configs\triTiedStateS1.dcf (504, 2000-09-12)
samples\HTKDemo\data (0, 2004-03-24)
samples\HTKDemo\data\store (0, 2004-03-24)
samples\HTKDemo\data\store\te1.mfc (12804, 2000-09-12)
samples\HTKDemo\data\store\te2.mfc (14468, 2000-09-12)
samples\HTKDemo\data\store\te3.mfc (21748, 2000-09-12)
samples\HTKDemo\data\store\tr1.mfc (15768, 2000-09-12)
samples\HTKDemo\data\store\tr2.mfc (19824, 2000-09-12)
samples\HTKDemo\data\store\tr3.mfc (13844, 2000-09-12)
... ...

The following document is a guide to how to build HTK based systems for the ARPA RM task. The systems described have not necessarily been optimised and so the results should not be taken as 'the best that can be done' but more as a baseline. Note that most of the HTK research work at CUED involves continuous density models and little effort has been put in to optimising the performance of the discrete and tied mixture systems and so these are definitely naive baseline results. Although it may seem that a lot of preparation work has been done to keep this recipe short most of the scripts are highly configurable and should be usable for many tasks. The more specific gawk and sed commands were relatively simple to produce and the need to duplicate their functionality (for instance to produce the networks or model lists) should not discourage the user who wishes to attempt new tasks. Note. This is not a guide to setting up HTK and for the recipe to work HTK must be correctly installed and all the executables in the users path. --------------------------------------------------------------------- RM RECIPE ========= The environment variables RMCD1, RMCD2, RMDATA, RMLIB, and RMWORK should be set as shown below, and the directories $RMDATA, $RMLIB, and $RMWORK should be created. The contents of the RMHTK_V3.3/lib should be copied to $RMLIB and the contents of RMHTK_V3.3/work should be copied to $RMWORK. The various HTK scripts as well as all the HTK_V3.3 tools should be in the current path. The scripts are in the directory RMHTK_V3.3/scripts. The recipe provides instructions for creating several types of systems. 1) Single mixture monophones 2) Multiple mixture monophones [from 1] 3) Tied-mixture monophones [from 1 or 2] 4) Discrete density monophones [with data aligned by 1 or 2] 5) Single mixture word internal triphones [from 1] 6) Tied mixture word internal triphones [from 2] 7) Single mixture cross word triphones [from 1] 8) Tied state triphones (data driven clustering) [from 5] 9) Tied state triphones (decision tree clustering) [from 5 or 7] 10) Speaker adaptive training [from 9] As this shows there are many routes through this demo. These instruction will explain (and give results for) the sequences 1 -> 2 -> 3, 1 -> 4, 1 -> 2 -> 6, 1 -> 5 -> 8, 1 -> 7 -> 9 -> 10 The following instructions should be followed exactly as given, with close attention paid to the naming conventions for files and directory hierarchies. 0. Setting Up ============= 0.1 Copy the NIST file lists from RMCD2 Copy the speaker independent file lists. > cd $RMLIB > mkdir ndx # Change Here : replaced '_ind' so pick up spk dependent files # find $RMCD2/rm1/doc -name '*_ind*.ndx' -exec cp {} ndx \; > find $RMCD2/rm1/doc -name '*.ndx' -exec cp {} ndx \; # Added greps to extract files lists for adaptation > grep 'dms0' ndx/6a_deptr.ndx | head -100 > ndx/step0.ndx > rm ndx/6a_deptr.ndx > mv ndx/step0.ndx ndx/6a_deptr.ndx > grep -h 'dms0' ndx/*deptst*.ndx > ndx/dms0_tst.ndx 0.2 Coding the data using the coderm script This can code either the speaker independent or speaker dependent data and uses the file lists created in Step 0.1. Here just the speaker independent data is used. Allow approximately 100MB of disk space for this. The file-lists created are stored in $RMLIB/flists. > mkdir $RMDATA > mkdir $RMLIB/flists > cd $RMLIB > coderm ndx $RMCD1 ind train $RMDATA configs/config.code flists/train.scp > coderm ndx $RMCD2 ind dev_aug $RMDATA configs/config.code flists/dev_aug.scp Now make script files for both the mfc data and label files. > cd $RMLIB/flists > cat train.scp dev_aug.scp > ind_trn109.scp > sed 's:\.mfc:\.lab:' ind_trn109.scp > ind_trn109.lab Also code the four speaker independent test sets and create script files for complete test label files. > cd $RMLIB > coderm ndx $RMCD2 ind feb89 $RMDATA configs/config.code flists/ind_feb89.scp > coderm ndx $RMCD2 ind oct89 $RMDATA configs/config.code flists/ind_oct89.scp > coderm ndx $RMCD2 ind feb91 $RMDATA configs/config.code flists/ind_feb91.scp > coderm ndx $RMCD2 ind sep92 $RMDATA configs/config.code flists/ind_sep92.scp > cat flists/ind_???[89][912].scp | sed 's:\.mfc:\.lab:' > flists/ind_tst.lab 0.3 Fixing the coded data files that are corrupt on the CD-ROM A very few files on the CD-ROM have a long series of identical time-domain values. These should be removed since the net effect is to cause the differential and second differential components to be zero with zero variance. The files which are corrupted were identified automatically as having a run of 250 or more identical time domain samples. Use the script fixrm and the file list $RMLIB/corrupt to delete these frames. Note that $RMLIB/corrupt includes the corrupt frame numbers and this assumes that the default 10ms frames have been used by coderm. All the corrupt frames are in the silence portion of the files. Note that no speaker dependent data has been checked for corrupt files. > fixrm $RMLIB/corrupt $RMDATA 0.4 Create word-level label files (and associated lists) for the test sets This uses grep, gawk and sed to convert the SNOR word-level transcriptions distributed on the CD-ROMs into HTK format. gawk is the public domain GNU version of awk which is specified because of its built in tolower command that allows the uppercase file names in SNOR format to be converted to lower-case. Many versions of nawk also have this functionality, if gawk is not installed but nawk is, use nawk, otherwise use perl on this occasion and awk elsewhere. First we create generic label files based on the prompts. > cd $RMLIB > cp $RMCD2/rm1/doc/al_sents.snr sents.snr > mkdir wlabs > grep -v '^;' $RMCD2/rm1/doc/al_sents.snr | \ gawk 'BEGIN { printf("#\!MLF\!#\n"); } \ { printf("%c*%s.lab%c\n",34,tolower(substr($NF,2,length($NF)-2)),34); \ for (i=1;i wlabs/all.mlf [ Alternatively use the following perl script (this was generated using a2p from above and then a tolower equivalent added so apologies to any perl purists for the syntax etc > cat mk_mlf.perl #!/usr/local/bin/perl eval "exec /usr/local/bin/perl -S $0 $*" if $running_under_some_shell; # this emulates #! processing on NIH machines. # (remove #! line above if indigestible) eval '$'.$1.'$2;' while $ARGV[0] =~ /^([A-Za-z_0-9]+=)(.*)/ && shift; # process any FOO=bar switches $[ = 1; # set array base to 1 printf (("#!MLF!#\n")); while (<>) { chop; # strip record separator @Fld = split(' ', $_, 9999); $name = substr($Fld[$#Fld], 2, length($Fld[$#Fld]) - 2); $name =~ tr/A-Z/a-z/; printf "%c*%s.lab%c\n", 34, $name, 34; for ($i = 1; $i < $#Fld; $i++) { printf "%s\n", $Fld[$i]; } printf ((".\n")); } > grep -v '^;' $RMCD2/rm1/doc/al_sents.snr | ./mk_mlf.perl > wlabs/all.mlf ] Now create specific labels for the actual files. > cd $RMLIB > touch wlabs/null.hled > HLEd -A -D -V -l '*' -i wlabs/ind_trn109.mlf -I wlabs/all.mlf \ -S flists/ind_trn109.lab wlabs/null.hled > HLEd -A -D -V -l '*' -i wlabs/ind_tst.mlf -I wlabs/all.mlf \ -S flists/ind_tst.lab wlabs/null.hled 1. Creating a Base-Line Monophone Set ===================================== 1.1 Basic dictionary and phone list creation Copy the SRI dictionary from the CD-ROM. You should read the header of this file to determine the conditions of use. Dictionaries are stored in $RMLIB/dicts. Model lists are stored in $RMLIB/mlists. > mkdir $RMLIB/mlists > cd $RMLIB > cp $RMCD2/score/src/rdev/pcdsril.txt dicts Use the tool HDMan to transform the dictionary. The supplied HDMan script mono.ded should be in $RMLIB/dicts. > cd $RMLIB/dicts > HDMan -A -D -V -T 1 -a '*;' -n $RMLIB/mlists/mono.list \ -g mono.ded mono.dct pcdsril.txt sil.txt 1.2 Training set phone label file creation Create the training set phone label files including optional inter-word silence (sp) models (the sp models are tee models) and forced silence (sil) at the ends of each utterance. > cd $RMLIB > HLEd -A -D -V -l '*' -i labs/mono.mlf -d dicts/mono.dct labs/mono.hled \ wlabs/ind_trn109.mlf 1.3 Initial phone models The supplied initial models should be in the directory $RMWORK/R1/hmm0. Note that the MODELS file also contains a varFloor vector which was calculated by using HCompV > cd $RMWORK/R1 > HCompV -A -D -V -C $RMLIB/configs/config.basic -S $RMLIB/flists/ind_trn109.scp \ -f 0.01 -m varProto The supplied HTK Environment file HTE should also be in $RMWORK/R1. This contains a number of shell variables that are used by the various training and testing scripts. Edit the HTE file so that it reflects your particular directory setup. 1.4 Model training Re-estimate the models in hmm0 for four iterations of HERest. > cd $RMWORK/R1 > hbuild 1 4 This should create the directories $RMWORK/R1/hmm[1-4]. Each of these directories should contain a LOG file and also $RMWORK/R1 will contain a file called blog which records the build process. After running hbuild the various LOG files should be checked for error messages. 1.5 Building the recognition networks Copy the word-pair grammar from the CD-ROM to $RMLIB. > cd $RMLIB; mkdir nets > cp $RMCD2/rm1/doc/wp_gram.txt $RMLIB/nets Use the supplied gawk scripts to generate word level networks for both the word-pair and no-grammar test conditions. Note that the language model likelihoods are based solely on the number of successors for each word. > cd $RMLIB/nets > cat wp_gram.txt wp_gram.txt | fgrep -v '*' | \ gawk -f $RMLIB/awks/wp.net.awk nn=993 nl=57704 > net.wp > gawk -f $RMLIB/awks/ng.net.awk nn=994 nl=1***5 $RMLIB/wordlist > net.ng > gzip net.wp net.ng 1.6 Testing the Monophone Models Check that all the recognition variables in HTE are set up correctly. In particular if you wish to use the NIST scoring package distributed on the CD-ROMs, make sure that NISTSCORE is set. The recognition tests use the script htestrm. The recognition parameters are set in the HTE file as for hbuild. htestrm allows the specification of the test set, the recognition mode (ng or wp) and automatically creates test directories for a number of runs (typically with different recognition parameters and perhaps run simultaneously on different machines). Test directories are created as sub-directories of the model directories. The first run on a particular hmm set with ng on the feb89 test set would be called test_ng_feb89.1, the second run test_ng_feb89.2, and tests with wp and other test-sets named according to the above convention. A number of parameters can be tuned including the grammar scale factor HVGSCALE variable (HVite -s) and inter-word probability HVIMPROB (HVite -p). These flags both control the ratio of insertion and deletion errors. HVite pruning is controlled by variables HVPRUNE (HVite -t) and HVMAXACTIVE (HVite -u). All of these values are supplied in an indexed list that correspond to the different test types listed in the HTE variable TYPELIST (here either ng or wp). The HVPRUNE values should be set at a higher value for wp testing than ng testing. After htestrm runs HVite, HResults is executed. The equivalent sets (homophone lists) for the different test conditions are listed in files specified by the HREQSETS variable. These sets are defined by DARPA/NIST and suitable files (eq.wp and eq.ng) are supplied with the distribution. If the variable HRNIST is set HResults operates in a manner that is compatible with the NIST scoring software and results identical to those from the NIST software should be obtained. If the NIST scoring software has been compiled, then after HResults has been run the output of HVite can be converted to a NIST compatible "hypothesis" file using the RM tool HLab2Hyp and then running the appropriate NIST scoring script. These steps will be performed by htestrm if the HTE variable NISTSCORE is set. Note that the NIST scoring tools should be in your path if you set NISTSCORE. Although the raw results obtained from the NIST software should be identical to the HResults output, the NIST output files can give additional information that can be useful for understanding recognizer performance. To test the models in $RMWORK/R1/hmm4 with the feb89 test set and with a word-pair grammar and the current recognition settings in HTE, execute the following > cd $RMWORK/R1 > htestrm HTE wp feb89 hmm4 Run other tests under a number of ng/wp conditions on different test sets as desired. 2. Multiple Mixture Monophones ============================== 2.1 Mixture splitting The models created in Step 1 will be converted to multiple mixture models by iterative "mixture splitting" and re-training. Mixture-splitting is performed by HHEd by taking an output distribution with M output components, choosing the components with the largest mixture weights and making two mixtures where there was one before by perturbing the mean values of the split mixture components. This can be done on a state-by-state basis but here all states will be mixture split. A script interface to HHEd called hedit takes a script of HHEd commands and applies them. First two-mixture models will be created, with the initial models in $RMWORK/R2/hmm10 formed by mixture-splitting the models in $RMWORK/R1/hmm4. First create the directory to hold the multiple mixture monophones and link all the files and directories to the new directory. > mkdir $RMWORK/R2 > cd $RMWORK/R2 > ln -s $RMWORK/R1/* . Now store the HHEd commands required in a file called edfile4.10 in $RMWORK/R2. This file need contain only the line MU 2 {*.state[2-4].mix} This command this will split into 2 mixtures all the states of all models in the HMMLIST defined in the HTE file. Now invoke hedit by > cd $RMWORK/R2 > hedit 4 10 This creates 2 mixture initial hmms in $RMWORK/R2/hmm10. 2.2 Retraining and testing the two-mixture models Run four iterations of HERest to re-train the models > cd $RMWORK/R2 > hbuild 11 14 and then test them with HVite (again recognition settings may need adjusting). > cd $RMWORK/R2 > htestrm HTE wp feb89 hmm14 and run other tests as required. 2.3 Iterative mixture-splitting The mixture splitting and re-training can be done repeatedly and it is suggested that after the 2 mixture models are trained in order 3 mixture, 5 mixture 7 mixture, 10 mixture and 15 mixture models are trained. Each cycle requires forming a file containing the appropriate HHEd MU command, running hedit, running hbuild and then testing the models with htestrm. Note that it is quite possible to go straight from 1 mixture models to 5 mixture models. However usually better performance results from a given number of mixture components if the model complexity is increased in stages. 3. Tied Mixture Monophones ========================== 3.1 Tied Mixture Initialisation The models created in step 2 will be converted to a set of tied-mixture models using HHEd. This conversion is performed in three phases. First the set of Gaussians that represent the models are pooled by choosing those with the highest weights. Then the models are rebuilt to share these Gaussians and the mixture weights recalculated. Finally the representation of the models changed to TIEDHS. First create the directory to hold the tied mixture monophones and copy over the HTE file. > mkdir $RMWORK/R3 > cd $RMWORK > cp R2/HTE R3 Create an HHEd edit file in $RMWORK/R3 called tied.hed containing the following commands. JO 128 2.0 TI MIX_ {*.state[2-4].stream[1].mix} HK TIEDHS Create a directory for the initial tied mixture models and run HHEd. > cd $RMWORK/R3 > mkdir hmm0 > HHEd -A -D -V -T 1 -H $RMWORK/R2/hmm4/MODELS -w hmm0/MODELS tied.hed \ $RMLIB/mlists/mono.list 3.2 Tied Mixture training and testing In the HTE file the line set TMTHRESH=20 now controls tied mixture as well as forward pass mixture pruning. Finally build the models > cd $RMWORK/R3 > hbuild 1 4 and then test them with HVite (recognition settings will need adjusting. Try setting the grammar scale to 2.0 rather than 7.0 and speed things up a little by changing the pruning beam width from 200.0 to 100.0). > cd $RMWORK/R3 > htestrm HTE wp feb89 hmm4 4. Discrete Density Monophones ============================== 4.1 Vector quantiser creation Since all the observation have to be held in memory for this operation (which can be computationally expensive) we will perform this on a subset of the data. Please note that this process can take a considerable amount of time. > mkdir $RMWORK/R4 > cd $RMWORK/R4 > gawk '{ if ((++i%10)==1) print }' $RMLIB/flists/ind_trn109.scp > ind_trn109.sub.scp > HQuant -A -D -V -T 1 -d -n 1 128 -n 2 128 -n 3 128 -n 4 32 -s 4 -S \ ind_trn109.sub.scp -C $RMLIB/configs/config.basic vq.table 4.2 Align data and initialise discrete models To create some discrete models we are going to use HInit and HRest both of which need data aligned at the phone level. We will use the monophones from either R1 or R2 to generate phone labels for the subset of the data used for the codebook. > HVite -A -D -V -T 1 -C $RMLIB/configs/config.basic -H ../R1/hmm4/MODELS \ -a -m -o SW -i ind_trn109.sub.mlf -X lab -b '!SENT_START' \ -l '*' -y lab -I $RMLIB/wlabs/ind_trn109.mlf -t 500.0 -s 0.0 -p 0.0 \ -S ind_trn109.sub.scp $RMLIB/dicts/mono.dct $RMLIB/mlists/mono.list 4.3 Create a prototype hmm set This can be done using the MakeProtoHMMSet discrete.pcf or by creating a file like the one below for each of the models apart from sp which should only have three states so that is reproduced in full. mkdir hmm0 for i in `cat $RMLIB/mlists/mono.list` do cat< hmm0/$i ~o 39 4 12 12 12 3 ~h "${i}" 5 2 128 128 128 32 1 11508*128 2 11508*128 3 11508*128 4 8220*32 3 128 128 128 32 1 11508*128 2 11508*128 3 11508*128 4 8220*32 4 128 128 128 32 1 11508*128 2 11508*128 3 11508*128 4 8220*32 5 0.000e+0 1.000e+0 0.000e+0 0.000e+0 0.000e+0 0.000e+0 6.000e-1 4.000e-1 0.000e+0 0.000e+0 0.000e+0 0.000e+0 6.000e-1 4.000e-1 0.000e+0 0.000e+0 0.000e+0 0.000e+0 6.000e-1 4.000e-1 0.000e+0 0.000e+0 0.000e+0 0.000e+0 0.000e+0 EOF done cat< hmm0/sp ~o 39 4 12 12 12 3 ~h "sp" 3 2 128 128 128 32 1 11508*128 2 11508*128 3 11508*128 4 8220*32 3 0.000e+0 6.000e-1 4.000e-1 0.000e+0 6.000e-1 4.000e-1 0.000e+0 0.000e+0 0.000e+0 EOF Then initialise the models. Note that the following script commands a ... ...

近期下载者

相关文件


收藏者