atk160
所属分类:Windows编程
开发工具:Visual C++
文件大小:18704KB
下载次数:36
上传日期:2011-04-12 15:07:13
上 传 者:
tianzhiyuan126
说明: ATK是以HTK為基礎,可用於開發實驗性應用程式所設計的一組API。它是HTK標準函式庫上層的一組C++中介層所組成。這個中介層可以讓目前的語音辨識開發者選擇不同版本的HTK並與ATK一起編譯後來開發應用程式。如同HTK一般,ATK也是具有跨兩種作業系統平台(Unixs與Windows)的特性。
(ATK is based on HTK can be used to develop experimental applications designed a set of API. It is the standard HTK library a group of upper intermediate C++ layer. The intermediate layer can allow the current developer of voice recognition to select a different version of the HTK and ATK will be compiled together with the subsequent development application. As HTK general, ATK is a cross of two operating system platforms (Unixs and Windows) features.)
文件列表:
atk160\atk.sln (14291, 2007-05-19)
atk160\atk160.tgz (45, 2007-05-20)
atk160\ATKApps\asds\ASDS.cpp (14478, 2007-05-19)
atk160\ATKApps\asds\asds.vcproj (4732, 2007-05-19)
atk160\ATKApps\asds\Makefile (1493, 2007-05-19)
atk160\ATKApps\asds\Test\asds.cfg (2427, 2007-05-19)
atk160\ATKApps\asds\Test\global.net (725, 2007-05-19)
atk160\ATKApps\asds\Test\howmany.net (455, 2007-05-19)
atk160\ATKApps\asds\Test\pizza.dict (379, 2007-05-19)
atk160\ATKApps\asds\Test\topping.net (699, 2007-05-19)
atk160\ATKApps\avite\AVite.cpp (20227, 2007-05-19)
atk160\ATKApps\avite\AVite.vcproj (4929, 2007-05-19)
atk160\ATKApps\avite\Makefile (1074, 2007-05-19)
atk160\ATKApps\avite\Test\arun (1926, 2007-05-19)
atk160\ATKApps\avite\Test\avite.cfg (640, 2007-05-19)
atk160\ATKApps\avite\Test\avite_ng.cfg (647, 2007-05-19)
atk160\ATKApps\avite\Test\bg (26833, 2007-05-19)
atk160\ATKApps\avite\Test\bg.net (51831, 2007-05-19)
atk160\ATKApps\avite\Test\reco.mlf (1071, 2007-05-19)
atk160\ATKApps\avite\Test\scpfile (36, 2007-05-19)
atk160\ATKApps\avite\Test\sjy0001.wav (144044, 2007-05-19)
atk160\ATKApps\avite\Test\sjy0008.wav (163244, 2007-05-19)
atk160\ATKApps\avite\Test\sjy0025.wav (124844, 2007-05-19)
atk160\ATKApps\avite\Test\sjy0200.wav (99244, 2007-05-19)
atk160\ATKApps\avite\Test\tg (261480, 2007-05-19)
atk160\ATKApps\avite\Test\wl.net (12107, 2007-05-19)
atk160\ATKApps\avite\Test\words.mlf (223, 2007-05-19)
atk160\ATKApps\ssds\Makefile (1171, 2007-05-19)
atk160\ATKApps\ssds\SSDS.cpp (12399, 2007-05-19)
atk160\ATKApps\ssds\ssds.vcproj (4972, 2007-05-19)
atk160\ATKApps\ssds\Test\global.net (725, 2007-05-19)
atk160\ATKApps\ssds\Test\howmany.net (455, 2007-05-19)
atk160\ATKApps\ssds\Test\pizza.dict (379, 2007-05-19)
atk160\ATKApps\ssds\Test\ssds.cfg (2133, 2007-05-19)
atk160\ATKApps\ssds\Test\topping.net (699, 2007-05-19)
atk160\ATKDoc\ATK_Manual.pdf (1097900, 2007-05-19)
atk160\ATKLib\ABuffer.cpp (4808, 2007-05-19)
atk160\ATKLib\ABuffer.h (3211, 2007-05-19)
... ...
ATK/HTK Speech Recognition Software
===================================
RELEASE 1.6 June 2007
== Windows/Visual Studio and Linux Version ==
LICENSE TERMS
"""""""""""""
This software is copyright Cambridge University. It may not be copied
or distributed to 3rd parties without the permission of the copyright
owner. It is distributed for research and non-commercial use only.
The full license terms are in the accompanying file "License.txt".
Although not a license condition, all users of this package are
requested to report all bugs by emailing sjy@eng.cam.ac.uk. Please
include ATK in the subject line - messages without ATK in the subject
line are likely to be ignored (see Reporting Problems below).
Contents
""""""""
ATK is distributed as a single tarball containing all of the source
files and a basic set of UK English recognition resources for testing.
a) ATKLib: the ATK source libraries for real-time applications of
HTK-based recognisers
b) HTKLib: an ATK compatible version of the HTK Libraries with various
extensions. This library is compatible with the functionality
of HTK3.4 but it can only be used with ATK.
c) SYNLib: an interface to Alan Black's flite synthesiser plus support for
a US male English voice: US_English, CMU_Lexicon, CMU_US_KAL16.
d) ATKApps: a set of example ATK applications.
ssds - a very simple speech-in/text-out dialog
***s - a speech in/speech out demo of the asynchronous audio
i/o support provided by ATK's AIO interface
avite - an ATK equivalent to HTK's HVite which is
provided for off-line testing of recognition
resources intended for us in ATK.
e) ATKDoc: documentation for ATK
f) Resources:
HMM models, dictionaries and support files for use in ATK
applications. See the Readme files in each directory for
more information.
1) UK_SI_MFCC - UK English, speaker independent, MFCC encoding.
2) UK_SI_ZMFCC - UK English, speaker independent, ZMFCC encoding.
Note that these resources are provided just to provide a
quick start for application developers. No claims are
made about the performance of these model sets. The main
point in using ATK is that it allows developers to build
their own resources using HTK.
Compile Instructions (Windows)
""""""""""""""""""""""""""""""
This distribution of ATK requires Visual Studio 2005. It can be built with
earlier versions, but the project files would need to be created from scratch
since VS 2005 project file format is not compatible with earlier versions.
It is strongly recommended that you have Cygwin installed and you run ATK
console programs from a bash shell rather than cmd.exe (see www.Cygwin.com).
A. Building the basic libraries and test Programs
Assuming that the entire ATK source tree is stored in a directory
called atk, open VS by double clicking on atk/atk.sln.
Set the configuration to Debug in the main tool bar (when everything
has been built and tested, release versions can be built).
The entire system can be built by selecting 'build solution' from the
build menu. Alternatively, if problems are encountered or
you wish to understand the structure of the software better, it
is recommended that you build ATK in stages as follows:
1. Using the solution explorer, right click on HTKLib and build it.
2. In the same way, build ATKLib.
3. In the solution explorer select TBase as the "startup project", and
build it. Then select 'Start (F5)' from the Debug menu. You should see
3 yellow windows with red balls in each. Click on a window and
check that the balls get transferred from window to window. This
tests the ATK base classes.
4. Now build TSource, TCode and TRec in the same way. TSource just
tests the audio source, TCode tests the audio plus coder and
TRec tests the full recognition system. TRec expects inputs of
the form "Dial one three nine seven" etc. All test programs
Txxx have a config file of the form Txxx.cfg and a Readme.txt file
giving more information. Note that it is easiest to run test programs
from the command line rather than from withing Visual Studio.
5. Build SYNLib and its dependent libraries, US_English, CMU_Lexicon,
CMU_US_KAL16. Then build TSyn and run it to test the synthesiser.
6. Build TIO. This program tests ATK's asynchronous input/output
facilities. Like TRec it expects input of the form
"Dial one three nine seven", but it provides spoken prompts,
and supports time-outs and barge-in.
B. The ATK Applications
ATK 1.6 includes three demonstration programs. ssds and ***s are
simple spoken dialog applications which can be built and tested in
the same way as the test programs above. Ssds provides speech input
but text output. Asds is very similar except that it also supports
speech output with barge-in. (Note ATK does not currently support
echo cancellation - hence to avoid feedback from speakers to microphone
do all initial testing with a headset earphones and close talking mic.)
AVite is really a tool, provided for off-line testing of HTK
resources (note that HVite uses a different front end to AVite and
does not support trigram LMs). AVite can be tested by opening a
shell in the Test directory and typing arun. The resulting help
message indicates the options that can then be tried. The
recognition accuracy for the cases which use a language model should
should be 100%.
Compiling instructions (Linux)
""""""""""""""""""""""""""""""
The environment variables for the compiler, compiler flags and link flags
(HTKCC, HTKCF, HTKCF) can be set, otherwise the defaults will be used.
ATK requires ALSA for the sound input and output. Some older linux
distributions (newer than 2002/pre-2.5 kernel) do not have the ALSA
drivers installed. If this is the
case, go to http://www.alsa-project.org/ and install.
http://alsa.opensrc.org contains more information about installing ALSA.
The linux build requires both X11 and the pthreads libraries. If the
location of the X11 library is non-standard (ie, not in
/usr/X11R6/lib), edit the Makefiles and/or the linker flag (HTKLF)
to reflect this.
The top-level directory contains a Makefile which will compile all of
the subcomponents in any particular combination desired. "make All"
will make the HTK, ATK and Flite (TTS) libraries, together with the
sample applications in ATKApps and the test applications in ATKLib.
Typing "make" in the top level ATK directory will show the various
options. Alternatively, it is possible to make each component by
visiting each directory and making the libraries individually.
Release Notes for 1.6
"""""""""""""""""""""
This is a major release. HAudio has been rewritten and a new AIO
module added to provide support for simultaneous speech out and audio
in. Linux audio has been switched from OSS to ALSA.
A synthesis interface called ASyn has been added and Alan Black's
flite synthesis software has been integrated in order to provide "out
of the box" speech output.
Support for N-best output has been implemented.
A simple data logging module has been added.
Support for building HTKTools with the ATK libaries has been dropped
since it provides no extra functionality and adds unnecessary
complexity. Hence, the ATK directive is no longer required and one
version of the HTKLib library is needed (but note that this is NOT
compatable with the version of HTKLib in the standard HTK distribution).
Various bugs have been fixed, especially a serious bug in the network
building routines which prevented proper implementation of duplicated
subnetworks.
Release Notes for 1.5
"""""""""""""""""""""
This was an internal CUED-only release.
Release Notes for 1.4.1
"""""""""""""""""""""""
This was a minor bug fix release.
Release Notes for 1.4
"""""""""""""""""""""
1) The main new features in this release are full support for trigram
language models and Linux support.
2) The use of Visual Studio V6 has been replaced by the .NET V7 version.
3) A variety of minor bugs have been fixed.
Release Notes for 1.3
"""""""""""""""""""""
1) Each directory has a Visual Studio .dsw. To compile a library or
application, open this file and then build using the Build menu (eg
F7) as normal.
2) HTKLib has two sets of settings labeled _ATK and _HTK. The _ATK
version defines the compilation flag ATK which changes the behaviour
of the standard HTK lib files. These changes are mainly concerned
with multithreading (HThreads) and the way that real time audio coding
is handled (ie HAudio and HParm).
3) HTKLib has also been enhanced in various other minor ways compared
to the current public HTK release (currently at version 3.2). These
changes include extensions to HRec/HLM to support trigram language
models and confidence scoring, HModel to support triphone synthesis and
HGraf to support multiple windows. HNet also has a trace option which allows
a visual plot of the phone level recognition network to be displayed.
4) Only VS6 settings for a "debug" build have been implemented.
5) Compiled HTK tools are left in the local Debug or Release
directories. To copy them to a 'bin' directory (specified by setting
the shell variable HBIN to the required path) run the 'install_tools'
script.
6) The ATKLib contains a number of simple test programs (named
Tlibname). These should be built and tested before attempting
to build any ATK based applications.
7) This release contains 1 set of 4-mix word internal MFCC
triphones (MFCC_0_D_A) and 1 set of 4-mix word internal MFCC triphones
with the cepstral mean removed (MFCC_0_D_A_Z). This latter set
requires running average cepstral mean removal to be enabled when
using ATK. Both these acoustic models were trained using WSJCAM0.
Both model sets include the decision tree used for state tying
embedded within them. This allows HModel to synthesise arbitrary
triphones on demand. The _Z set also have a background model
supplied with them.
Reporting Problems
""""""""""""""""""
All users of ATK are requested to report bugs by emailing
sjy@eng.cam.ac.uk. ATK must be included in the subject line otherwise
the message will almost certainly be treated as spam and ignored. All
bug reports should include version information. To obtain this in a
command line application, simply set the -V option as in HTK. For all
other applications, set PRINTVERSIONINFO=T in the configuration file.
In both cases, the release number and individual component versions
are printed on the console. If the ATK monitor is being used for
console output, then the display area will need to made large enough
to view the version information.
Last Updated
May 2007 by SJY
近期下载者:
相关文件:
收藏者: