HHTK-samples-i
所属分类:人工智能/神经网络/深度学习
开发工具:Visual C++
文件大小:2105KB
下载次数:5
上传日期:2012-08-11 23:10:43
上 传 者:
imitates
说明: 隐马尔科夫模型工具箱使用样本本文件,可以广泛应用于识别等领域
(Hidden Markov model toolbox using the sample of this document can be widely used in the field of recognition)
文件列表:
HHTK-samples-i\samples\HTKDemo\accs.scr (63, 2000-09-15)
HHTK-samples-i\samples\HTKDemo\ChangeLog (212, 2000-10-13)
HHTK-samples-i\samples\HTKDemo\codebooks\currentCodebook (43971, 2000-09-12)
HHTK-samples-i\samples\HTKDemo\configs\DcfFormat (1665, 2000-09-12)
HHTK-samples-i\samples\HTKDemo\configs\directAudio.dcf (601, 2000-09-12)
HHTK-samples-i\samples\HTKDemo\configs\lbiPlainS1.dcf (460, 2000-09-12)
HHTK-samples-i\samples\HTKDemo\configs\monDiscM64S1Lin.dcf (458, 2000-09-12)
HHTK-samples-i\samples\HTKDemo\configs\monDiscM64S1Tree.dcf (458, 2000-09-12)
HHTK-samples-i\samples\HTKDemo\configs\monDiscM64S3Lin.dcf (463, 2000-09-12)
HHTK-samples-i\samples\HTKDemo\configs\monDiscM64S3Tree.dcf (463, 2000-09-12)
HHTK-samples-i\samples\HTKDemo\configs\monPlainM1S1.dcf (461, 2000-09-12)
HHTK-samples-i\samples\HTKDemo\configs\monPlainM1S3.dcf (464, 2000-09-12)
HHTK-samples-i\samples\HTKDemo\configs\monPlainM1S3FullCov.dcf (460, 2000-09-12)
HHTK-samples-i\samples\HTKDemo\configs\monPlainM1S3HERestPell.dcf (521, 2000-09-12)
HHTK-samples-i\samples\HTKDemo\configs\monPlainM4S1.dcf (459, 2000-09-12)
HHTK-samples-i\samples\HTKDemo\configs\monSharedM1S3.dcf (461, 2000-09-12)
HHTK-samples-i\samples\HTKDemo\configs\monTiedMixS1.dcf (472, 2000-09-12)
HHTK-samples-i\samples\HTKDemo\configs\monTiedMixS3.dcf (496, 2000-09-12)
HHTK-samples-i\samples\HTKDemo\configs\rbiPlainS1.dcf (461, 2000-09-12)
HHTK-samples-i\samples\HTKDemo\configs\triDiscM64S3.dcf (462, 2000-09-12)
HHTK-samples-i\samples\HTKDemo\configs\triDiscM64S3HSmooth.dcf (563, 2000-09-12)
HHTK-samples-i\samples\HTKDemo\configs\triPlainS1.dcf (455, 2000-09-12)
HHTK-samples-i\samples\HTKDemo\configs\triSharedS1.dcf (536, 2000-09-12)
HHTK-samples-i\samples\HTKDemo\configs\triTiedMixS1.dcf (503, 2000-09-12)
HHTK-samples-i\samples\HTKDemo\configs\triTiedMixS1HSmooth.dcf (609, 2000-09-12)
HHTK-samples-i\samples\HTKDemo\configs\triTiedStateS1.dcf (504, 2000-09-12)
HHTK-samples-i\samples\HTKDemo\data\store\te1.mfc (12804, 2000-09-12)
HHTK-samples-i\samples\HTKDemo\data\store\te2.mfc (14468, 2000-09-12)
HHTK-samples-i\samples\HTKDemo\data\store\te3.mfc (21748, 2000-09-12)
HHTK-samples-i\samples\HTKDemo\data\store\tr1.mfc (15768, 2000-09-12)
HHTK-samples-i\samples\HTKDemo\data\store\tr2.mfc (19824, 2000-09-12)
HHTK-samples-i\samples\HTKDemo\data\store\tr3.mfc (13844, 2000-09-12)
HHTK-samples-i\samples\HTKDemo\data\store\tr4.mfc (12544, 2000-09-12)
HHTK-samples-i\samples\HTKDemo\data\store\tr5.mfc (11140, 2000-09-12)
HHTK-samples-i\samples\HTKDemo\data\store\tr6.mfc (11036, 2000-09-12)
HHTK-samples-i\samples\HTKDemo\data\store\tr7.mfc (10100, 2000-09-12)
HHTK-samples-i\samples\HTKDemo\data\test\te1.mfc (12804, 2000-09-12)
HHTK-samples-i\samples\HTKDemo\data\test\te2.mfc (14468, 2000-09-12)
HHTK-samples-i\samples\HTKDemo\data\test\te3.mfc (21748, 2000-09-12)
HHTK-samples-i\samples\HTKDemo\data\train\tr1.mfc (15768, 2000-09-12)
... ...
The following document is a guide to how to build HTK based systems
for the ARPA RM task.
The systems described have not necessarily been optimised and so the
results should not be taken as 'the best that can be done' but more as
a baseline. Note that most of the HTK research work at CUED involves
continuous density models and little effort has been put in to
optimising the performance of the discrete and tied mixture systems
and so these are definitely naive baseline results.
Although it may seem that a lot of preparation work has been done to
keep this recipe short most of the scripts are highly configurable and
should be usable for many tasks. The more specific gawk and sed
commands were relatively simple to produce and the need to duplicate
their functionality (for instance to produce the networks or model
lists) should not discourage the user who wishes to attempt new tasks.
A version of this document specific to the use on Windows NT can be
found in README.NT.
Note. This is not a guide to setting up HTK and for the recipe to work
HTK must be correctly installed and all the executables in the users
path.
---------------------------------------------------------------------
RM RECIPE
=========
The environment variables RMCD1, RMCD2, RMDATA, RMLIB, and RMWORK
should be set as shown below, and the directories $RMDATA, $RMLIB, and
$RMWORK should be created. The contents of the RMHTK/lib should be
copied to $RMLIB and the contents of RMHTK/work should be copied to
$RMWORK.
The various HTK scripts as well as all the HTK tools should be in the
current path. The scripts are in the directory RMHTK/scripts.
The recipe provides instructions for creating several types of
systems.
1) Single mixture monophones
2) Multiple mixture monophones [from 1]
3) Tied-mixture monophones [from 1 or 2]
4) Discrete density monophones [with data aligned by 1 or 2]
5) Single mixture word internal triphones [from 1]
6) Tied mixture word internal triphones [from 2]
7) Single mixture cross word triphones [from 1]
8) Tied state triphones (data driven clustering) [from 5]
9) Tied state triphones (decision tree clustering) [from 5 or 7]
10) Bigram generation and testing.
As this shows there are many routes through this demo.
These instruction will explain (and give results for) the sequences
1 -> 2 -> 3,
1 -> 4,
1 -> 2 -> 6,
1 -> 5 -> 8,
1 -> 7 -> 9
The following instructions should be followed exactly as given, with
close attention paid to the naming conventions for files and directory
hierarchies.
0. Setting Up
=============
0.1 Copy the NIST file lists from RMCD2
Copy the speaker independent file lists.
> cd $RMLIB
> mkdir ndx
# Change Here : replaced '_ind' so pick up spk dependent files
# find $RMCD2/rm1/doc -name '*_ind*.ndx' -exec cp {} ndx \;
> find $RMCD2/rm1/doc -name '*.ndx' -exec cp {} ndx \;
# Added greps to extract files lists for adaptation
> grep 'dms0' ndx/6a_deptr.ndx | head -100 > ndx/6a_deptr.ndx
> grep -h 'dms0' ndx/*deptst*.ndx > ndx/dms0_tst.ndx
0.2 Coding the data using the coderm script
This can code either the speaker independent or speaker dependent data
and uses the file lists created in Step 0.1. Here just the speaker
independent data is used. Allow approximately 100MB of disk space for
this. The file-lists created are stored in $RMLIB/flists.
> mkdir $RMDATA
> mkdir $RMLIB/flists
> cd $RMLIB
> coderm ndx $RMCD1 ind train $RMDATA configs/config.code flists/train.scp
> coderm ndx $RMCD2 ind dev_aug $RMDATA configs/config.code flists/dev_aug.scp
Now make script files for both the mfc data and label files.
> cd $RMLIB/flists
> cat train.scp dev_aug.scp > ind_trn109.scp
> sed 's:\.mfc:\.lab:' ind_trn109.scp > ind_trn109.lab
Also code the four speaker independent test sets and create script files
for complete test label files.
> cd $RMLIB
> coderm ndx $RMCD2 ind feb89 $RMDATA configs/config.code flists/ind_feb89.scp
> coderm ndx $RMCD2 ind oct89 $RMDATA configs/config.code flists/ind_oct89.scp
> coderm ndx $RMCD2 ind feb91 $RMDATA configs/config.code flists/ind_feb91.scp
> coderm ndx $RMCD2 ind sep92 $RMDATA configs/config.code flists/ind_sep92.scp
> cat flists/ind_???[89][912].scp | sed 's:\.mfc:\.lab:' > flists/ind_tst.lab
# Added Adaptation Coding
If Adaptation is required, then the 3rd CDROM (data dependent) is required. Get the
adaptation training data:
> cd $RMLIB
> coderm ndx $RMCD3 dep traina $RMDATA configs/config.code flists/traina.scp
And the adaptation test data:
> coderm ndx $RMCD2 dep dms0_tst $RMDATA configs/config.code flists/dms0_tst.scp
> cd $RMLIB/flists
> sed 's:\.mfc:\.lab:' traina.scp > traina.lab
> sed 's:\.mfc:\.lab:' dms0_tst.scp > dms0_tst.lab
0.3 Fixing the coded data files that are corrupt on the CD-ROM
A very few files on the CD-ROM have a long series of identical
time-domain values. These should be removed since the net effect is to
cause the differential and second differential components to be zero
with zero variance. The files which are corrupted were identified
automatically as having a run of 250 or more identical time domain
samples. Use the script fixrm and the file list $RMLIB/corrupt to
delete these frames. Note that $RMLIB/corrupt includes the corrupt
frame numbers and this assumes that the default 10ms frames have been
used by coderm. All the corrupt frames are in the silence portion of
the files. Note that no speaker dependent data has been checked for
corrupt files.
> fixrm $RMLIB/corrupt $RMDATA
0.4 Create word-level label files (and associated lists) for the test sets
This uses grep, gawk and sed to convert the SNOR word-level
transcriptions distributed on the CD-ROMs into HTK format. gawk is the
public domain GNU version of awk which is specified because of its
built in tolower command that allows the uppercase file names in SNOR
format to be converted to lower-case. Many versions of nawk also have
this functionality, if gawk is not installed but nawk is, use nawk,
otherwise use perl on this occasion and awk elsewhere. First we create
generic label files based on the prompts.
> cd $RMLIB
> cp $RMCD2/rm1/doc/al_sents.snr sents.snr
> mkdir wlabs
> grep -v '^;' $RMCD2/rm1/doc/al_sents.snr | \
gawk 'BEGIN { printf("#\!MLF\!#\n"); } \
{ printf("%c*%s.lab%c\n",34,tolower(substr($NF,2,length($NF)-2)),34); \
for (i=1;i
wlabs/all.mlf
[
Alternatively use the following perl script (this was generated using
a2p from above and then a tolower equivalent added so apologies to any
perl purists for the syntax etc
> cat mk_mlf.perl
#!/usr/local/bin/perl
eval "exec /usr/local/bin/perl -S $0 $*"
if $running_under_some_shell;
# this emulates #! processing on NIH machines.
# (remove #! line above if indigestible)
eval '$'.$1.'$2;' while $ARGV[0] =~ /^([A-Za-z_0-9]+=)(.*)/ && shift;
# process any FOO=bar switches
$[ = 1; # set array base to 1
printf (("#!MLF!#\n"));
while (<>) {
chop; # strip record separator
@Fld = split(' ', $_, 9999);
$name = substr($Fld[$#Fld], 2, length($Fld[$#Fld]) - 2);
$name =~ tr/A-Z/a-z/;
printf "%c*%s.lab%c\n", 34, $name, 34;
for ($i = 1; $i < $#Fld; $i++) {
printf "%s\n", $Fld[$i];
}
printf ((".\n"));
}
> grep -v '^;' $RMCD2/rm1/doc/al_sents.snr | ./mk_mlf.perl > wlabs/all.mlf
]
Now create specific labels for the actual files.
> cd $RMLIB
> touch wlabs/null.hled
> HLEd -l '*' -i wlabs/ind_trn109.mlf -I wlabs/all.mlf \
-S flists/ind_trn109.lab wlabs/null.hled
> HLEd -l '*' -i wlabs/ind_tst.mlf -I wlabs/all.mlf \
-S flists/ind_tst.lab wlabs/null.hled
# Added label creation for adaptation
> HLEd -l '*' -i wlabs/traina.mlf -I wlabs/all.mlf \
-S flists/traina.lab wlabs/null.hled
> HLEd -l '*' -i wlabs/dms0_tst.mlf -I wlabs/all.mlf \
-S flists/dms0_tst.lab wlabs/null.hled
1. Creating a Base-Line Monophone Set
=====================================
1.1 Basic dictionary and phone list creation
Copy the SRI dictionary from the CD-ROM. You should read the header of
this file to determine the conditions of use. Dictionaries are stored
in $RMLIB/dicts. Model lists are stored in $RMLIB/mlists.
> mkdir $RMLIB/mlists
> cd $RMLIB
> cp $RMCD2/score/src/rdev/pcdsril.txt dicts
Use the tool HDMan to transform the dictionary. The supplied HDMan
script mono.ded should be in $RMLIB/dicts.
> cd $RMLIB/dicts
> HDMan -T 1 -a '*;' -n $RMLIB/mlists/mono.list -g mono.ded mono.dct pcdsril.txt sil.txt
1.2 Training set phone label file creation
Create the training set phone label files including optional
inter-word silence (sp) models (the sp models are tee models) and
forced silence (sil) at the ends of each utterance.
> cd $RMLIB
> HLEd -l '*' -i labs/mono.mlf -d dicts/mono.dct labs/mono.hled \
wlabs/ind_trn109.mlf
1.3 Initial phone models
The supplied initial models should be in the directory $RMWORK/R1/hmm0.
Note that the MODELS file also contains a varFloor vector which
was calculated by using HCompV
> HCompV -C $RMLIB/configs/config.basic -S $RMLIB/flists/ind_trn109.scp \
-f 0.01 -m varProto
The supplied HTK Environment file HTE should also be in $RMWORK/R1.
This contains a number of shell variables that are used by the
various training and testing scripts. Edit the HTE file so that
it reflects your particular directory setup.
1.4 Model training
Re-estimate the models in hmm0 for four iterations of HERest.
> cd $RMWORK/R1
> hbuild 1 4
This should create the directories $RMWORK/R1/hmm[1-4]. Each of these
directories should contain a LOG file and also $RMWORK/R1 will contain
a file called blog which records the build process. After running
hbuild the various LOG files should be checked for error messages.
1.5 Building the recognition networks
Copy the word-pair grammar from the CD-ROM to $RMLIB.
> cd $RMLIB; mkdir nets
> cp $RMCD2/rm1/doc/wp_gram.txt $RMLIB/nets
Use the supplied gawk scripts to generate word level networks for
both the word-pair and no-grammar test conditions. Note that the
language model likelihoods are based solely on the number of successors
for each word.
> cd $RMLIB/nets
> cat wp_gram.txt wp_gram.txt | fgrep -v '*' | \
gawk -f $RMLIB/awks/wp.net.awk nn=993 nl=57704 > net.wp
> gawk -f $RMLIB/awks/ng.net.awk nn=994 nl=1***5 $RMLIB/wordlist > net.ng
> gzip net.wp net.ng
1.6 Testing the Monophone Models
Check that all the recognition variables in HTE are set up correctly.
In particular if you wish to use the NIST scoring package distributed
on the CD-ROMs, make sure that NISTSCORE is set. The recognition tests
use the script htestrm. The recognition parameters are set in the HTE
file as for hbuild. htestrm allows the specification of the test set,
the recognition mode (ng or wp) and automatically creates test
directories for a number of runs (typically with different recognition
parameters and perhaps run simultaneously on different machines). Test
directories are created as sub-directories of the model directories.
The first run on a particular hmm set with ng on the feb89 test set
would be called test_ng_feb89.1, the second run test_ng_feb89.2, and
tests with wp and other test-sets named according to the above
convention.
A number of parameters can be tuned including the grammar scale factor
HVGSCALE variable (HVite -s) and inter-word probability HVIMPROB
(HVite -p). These flags both control the ratio of insertion and
deletion errors. HVite pruning is controlled by variables HVPRUNE
(HVite -t) and HVMAXACTIVE (HVite -u). All of these values are
supplied in an indexed list that correspond to the different test types
listed in the HTE variable TYPELIST (here either ng or wp). The
HVPRUNE values should be set at a higher value for wp testing than ng
testing.
After htestrm runs HVite, HResults is executed. The equivalent sets
(homophone lists) for the different test conditions are listed in
files specified by the HREQSETS variable. These sets are defined by
DARPA/NIST and suitable files (eq.wp and eq.ng) are supplied with the
distribution. If the variable HRNIST is set HResults operates in a
manner that is compatible with the NIST scoring software and results
identical to those from the NIST software should be obtained.
If the NIST scoring software has been compiled, then after HResults
has been run the output of HVite can be converted to a NIST
compatible "hypothesis" file using the RM tool HLab2Hyp and then
running the appropriate NIST scoring script. These steps will be
performed by htestrm if the HTE variable NISTSCORE is set. Note that
the NIST scoring tools should be in your path if you set NISTSCORE.
Although the raw results obtained from the NIST software should be
identical to the HResults output, the NIST output files can give
additional information that can be useful for understanding recognizer
performance.
To test the models in $RMWORK/R1/hmm4 with the feb89 test set and with
a word-pair grammar and the current recognition settings in HTE,
execute the following
> cd $RMWORK/R1
> htestrm HTE wp feb89 hmm4
Run other tests under a number of ng/wp conditions on different test
sets as desired.
2. Multiple Mixture Monophones
==============================
2.1 Mixture splitting
The models created in Step 1 will be converted to multiple mixture
models by iterative "mixture splitting" and re-training.
Mixture-splitting is performed by HHEd by taking an output distribution
with M output components, choosing the components with the largest
mixture weights and making two mixtures where there was one before by
perturbing the mean values of the split mixture components. This can
be done on a state-by-state basis but here all states will be mixture
split.
A script interface to HHEd called hedit takes a script of HHEd
commands and applies them. First two-mixture models will be created,
with the initial models in $RMWORK/R2/hmm10 formed by
mixture-splitting the models in $RMWORK/R1/hmm4.
First create the directory to hold the multiple mixture monophones and
link all the files and directories to the new directory.
> mkdir $RMWORK/R2
> cd $RMWORK/R2
> ln -s $RMWORK/R1/* .
Now store the HHEd commands required in a file called edfile4.10 in
$RMWORK/R2. This file need contain only the line
MU 2 {*.state[2-4].mix}
This command this will split into 2 mixtures all the states of all
models in the HMMLIST defined in the HTE file. Now invoke hedit by
> cd $RMWORK/R2
> hedit 4 10
This creates 2 mixture initial hmms in $RMWORK/R2/hmm10.
2.2 Retraining and testing the two-mixture models
Run four iterations of HERest to re-train the models
> cd $RMWORK/R2
> hbuild 11 14
and then test them with HVite (again recognition settings may need
adjusting).
> cd $RMWORK/R2
> htestrm HTE wp feb89 hmm14
and run other tests as required.
2.3 Iterative mixture-splitting
The mixture splitting and re-training can be done repeatedly and it is
suggested that after the 2 mixture models are trained in order 3
mixture, 5 mixture 7 mixture, 10 mixture and 15 mixture models are
trained. Each cycle requires forming a file containing the
appropriate HHEd MU command, running hedit, running hbuild and then
testing the models with htestrm.
Note that it is quite possible to go straight from 1 mixture models to
5 mixture models. However usually better performance results from a
given number of mixture components if the model complexity is
increased in stages.
3. Tied Mixture Monophones
==========================
3.1 Tied Mixture Initialisation
The models created in step 2 will be converted to a set of tied-mixture
models using HHEd. This conversion is performed in three phases.
First the set of Gaussians that represent the models are pooled by
choosing those with the highest weights. Then the models are rebuilt
to share these Gaussians and the mixture weights recalculated.
Finally the representation of the models changed to TIEDHS.
First create the directory to hold the tied mixture monophones and
copy over the HTE file.
> mkdir $RMWORK/R3
> cd $RMWORK
> cp R2/HTE R3
Create an HHEd edit file in $RMWORK/R3 called tied.hed containing the
following commands.
JO 128 2.0
TI MIX_ {*.state[2-4].stream[1].mix}
HK TIEDHS
Create a directory for the initial tied mixture models and run HHEd.
> cd $RMWORK/R3
> mkdir hmm0
> HHEd -T 1 -H $RMWORK/R2/hmm4/MODELS -w hmm0/MODELS tied.hed \
$RMLIB/mlists/mono.list
3.2 Tied Mixture training and testing
In the HTE file the line
set TMTHRESH=20
now controls tied mixture as well as forward pass mixture pruning.
Finally build the models
> cd $RMWORK/R3
> hbuild 1 4
and then test them with HVite (recognition settings will need adjusting.
Try setting the grammar scale to 2.0 rather than 7.0 and speed things up
a little by changing the pruning beam width from 200.0 to 100.0).
> cd $RMWORK/R3
> htestrm HTE wp feb89 hmm4
4. Discrete Density Monophones
==============================
4.1 Vector quantiser creation
Since all the observation have to be held in memory for this operation
(which can be computationally expensive) we will perform this on a
subset of the data. Please note that this process can take a considerable
amount of time.
> mkdir $RMWORK/R4
> cd $RMWORK/R4
> gawk '{ if ((++i%10)==1) print }' $RMLIB/flists/ind_trn109.scp > ind_trn109.sub.scp
> HQuant -A -T 1 -d -n 1 128 -n 2 128 -n 3 128 -n 4 32 -s 4 -S \
ind_trn109.sub.scp -C $RMLIB/configs/config.basic vq.table
4.2 Align data and initialise discrete models
To create some discrete models we are going to use HInit and HRest both
of which need data aligned at the phone level. We will use the monophones
from either R1 or R2 to generate phone labels for the subset of the data
used for the codebook.
> HVite -A -T 1 -C $RMLIB/configs/config.basic -H ../R1/hmm4/MODELS \
-a -m -o SW -i ind_trn109.sub.mlf -X lab -b '!SENT_START' \
-l '*' -y lab -I $RMLIB/wlabs/ind_trn109.mlf -t 500.0 -s 0.0 -p 0.0 \
-S ind_trn109.sub.scp $RMLIB/dicts/mono.dct $RMLIB/mlists/mono.list
4.3 Create a prototype hmm set
This can be done using the MakeProtoHMMSet discrete.pcf or by creating
a file like the one below for each of the models apart from sp which should
only have three states so that is reproduced in full.
~o 39 4 12 12 12 3
~h "aa"
5
2 128 128 128 32
1
11508*128
2
11508*128
3
11508*128
4
8220*32
3 128 128 128 32
1
11508*128
2
11508*128
3
11508*128
4
8220*32
4 128 128 128 32
1
11508*128
2
11508*128
3
11508*128
4
8220*32
5
0.000e+0 1.000e+0 0.000e+0 0.000e+0 0.000e+0
0.000e+0 6.000e-1 4.000e-1 0.000e+0 0.000e+0
0.000e+0 0.000e+0 6.000e-1 4.000e-1 0.000e+0
0.000e+0 0.000e+0 0.000e+0 6.000e-1 4.000e-1
0.000e+0 0.000e+0 0.000e+0 0.000e+0 0.000e+0
近期下载者:
相关文件:
收藏者: