KlustaKwik-R1-7

所属分类:其他
开发工具:Visual C++
文件大小:361KB
下载次数:108
上传日期:2010-02-23 15:30:28
上 传 者706507715706507715
说明:  基于监督法的图像的自动分类。该方法快速,效果好,好理解
(Monitoring method based on image automatic classification. This method is rapid, effective, easy to understand)

文件列表:
KlustaKwik (0, 2004-06-24)
KlustaKwik\Array.h (3280, 2004-06-24)
KlustaKwik\KK.cpp (24023, 2004-06-24)
KlustaKwik\KK.h (2118, 2004-06-24)
KlustaKwik\KlustaKwik.cpp (11266, 2004-06-24)
KlustaKwik\KlustaKwik.h (2118, 2004-06-24)
KlustaKwik\KlustaKwik.mcp (155885, 2004-06-24)
KlustaKwik\KlustaSave.h (1622, 2004-06-24)
KlustaKwik\ReleaseNotes.txt (379, 2004-06-24)
KlustaKwik\Windows (0, 2004-06-24)
KlustaKwik\Windows\KlustaKwik.exe (634768, 2004-06-24)
KlustaKwik\linux-pentium (0, 2004-06-24)
KlustaKwik\linux-pentium\KlustaKwik (75926, 2004-06-24)
KlustaKwik\macosx (0, 2004-06-24)
KlustaKwik\macosx\KlustaKwik (262428, 2004-06-24)
KlustaKwik\makefile (541, 2004-06-24)
KlustaKwik\param.c (2005, 2004-06-24)
KlustaKwik\param.h (544, 2004-06-24)
KlustaKwik\test (0, 2004-06-24)
KlustaKwik\test\test.fet.1 (2197, 2004-06-24)
KlustaKwik\test\test.model.1 (261, 2004-06-24)
KlustaKwik\test\test_res.clu.1 (404, 2004-06-24)
KlustaKwik\test\test_res.model.1 (261, 2004-06-24)

KlustaKwik version 1.6 ---------------------- KlustaKwik is a program for unsupervised classification of multidimensional continuous data. It arose from a specific need - automatic sorting of neuronal action potential waveforms (see KD Harris et al, Journal of Neurophysiology 84:401-414,2000), but works for any type of data. We needed a program that would: 1) Fit a mixture of Gaussians with unconstrained covariance matrices 2) Automatically choose the number of mixture components 3) Be robust against noise 4) Reduce the problem of local minima 5) Run fast on large data sets (up to 100000 points, 48 dimensions) Speed in particular was essential. KlustaKwik is based on the CEM algorithm of Celeux and Govaert (which is faster than the standard EM algorithm), and also uses several tricks to improve execution speed while maintaining good performance. On our data, it runs at least 10 times faster than Autoclass. Cluster splitting and deletion ------------------------------ The main improvement in version 1.6 is the ability to specify Bayesian information content or AIC information content as the penalty for a larger number of clusters or a mixture of these two. KlustaKwik allows for a variable number of clusters to be fit. The program periodically checks if splitting any cluster would improve the overall score. It also checks to see if deleting any cluster and reallocating its points would improve overall score. The splitting and deletion features allow the program to often escape from local minima, reducing sensitivity to the initial number of clusters, and reducing the total number of starts needed for a data set. Binaries -------- The current distribution contains pre-compiled binaries for Linux Pentium, Mac OS X, and Windows. These are located in separate subdirectories. Compilation ----------- The program is written in C++. To compile under unix, extract all files to a single directory and type make. That should be all you need to do. If it doesn't work, change the makefile to replace g++ with the name of your C++ compiler. A Metrowerks Codewarrior project is included for compilation on the Mac or Windows. To check it compiled properly type "KlustaKwik test 1 -MinClusters 2" to run the program on the supplied test file. The outputs, test.clu.1 and test.model.1, should match the supplied test_res.clu.1 and test_res.model.1 files (excepting line endings, which may vary by platform). Usage ----- The program takes a "feature file" as input, and produces two output files, the "cluster file", and a log file. The file formats and conventions may seem slightly strange. This is for historical reasons. If you want to change the code, go ahead, this is open source software. The feature file should have a name like FILE.fet.n, where FILE is any string, and n is a number. The program is invoked by running "KlustaKwik FILE n", and will create a cluster file FILE.clu.n and a log file FILE.klg.n. The number n doesn't serve any purpose other than to let you have several files with the same file base. The first line of the feature file should be the number of input dimensions. The following lines are the data, with each line being one data instance, consisting of a list of numbers separated by spaces. An example file test.fet.1 is provided. Please note that the features can be the sample values of a putative waveform event. The first line of the cluster file will be the number of classes that the program chose. The following lines will be the classes asigned to the data points. Class 1 is a "noise cluster" modelled by a uniform distribution, which should contain outliers, if there are any. Parameters ---------- It is possible to pass the program parameters by running "KlustaKwik FILE n params" etc. All parameters have default values. Here are the parameters you can use: -ChangedThresh f (default 0.05) All log-likelihoods are recalculated if the fraction of instances changing class exeeds f (see DistThresh) -Debug n (default 0) Miscellaneous debugging information (not recommended) -DistDump (default 0) Outputs a ridiculous amount of debugging information (definately not recommended). -DistThresh d (default 6.907755 = ln(1000) ) Time-saving paramter. If a point has log likelihood more than d worse for a given class than for the best class, the log likelihood for that class is not recalculated. This saves an awful lot of time. -fSaveModel n (default 1) If n is non-zero, save final model to .model file. -FullStepEvery n (default 10) All log-likelihoods are recalculated every n steps (see DistThresh) -help Prints a short message and then the default parameter values. -Log n (default 1) Produces .klg log file if non-zero. (to switch off use -Log 0) -MaxClusters n (default 10) The random initial assignment will have no more than n clusters. -MaxIter n (default 500) Don't try more than n iterations from any starting point. -MaxPossibleClusters n (default 100) Cluster splitting can produce no more than n clusters. -MinClusters n (default 2) The random intial assignment will have no less than n clusters. The final number may be different, since clusters can be split or deleted during the course of the algorithm. -PenaltyMix d (default 0.0) Amount of BIC to use a penalty for more clusters. Default of 0 sets to use all AIC. Use 1.0 to use all BIC (this generally produces fewer clusters). -nStarts n (default 1) The algorithm will be started n times for each inital cluster count between MinClusters and MaxClusters. -RandomSeed n (default 1) Specifies a seed for the random number generator -Screen n (default 1) Produces parameters and progress information on the console. Set to 0 to suppress output in batches. -SplitEvery n (default 50) Test to see if any clusters should be split every n steps. 0 means don't split. -StartCluFile STRING (default "") Treats the specified cluster file as a "gold standard". If it can't find a better cluster assignment, it will output this. -UseFeatures STRING (default 11111111111100001) Specifies a subset of the input features to use. STRING should consist of 1s and 0s with a 1 indicating to use the feature and a 0 to leave it out, or the string 'ALL' indicating that every column in the feature file should be used. NB The default value for this parameter is 11111111111100001 (because this is what we use in the lab). -Verbose n (default 1) Provide more diagnostic output if non-zero. Contact Information ------------------- This program is copyright Ken Harris (harris@axon.rutgers.edu), 2000-2003. It is distributed under the GNU General Public License (www.gnu.org) at http:klustakwik.sourceforge.net. If you make any changes or improvements, please let me know.

近期下载者

相关文件


收藏者