clustershiyongdaquan

所属分类:人工智能/神经网络/深度学习
开发工具:C/C++
文件大小:154KB
下载次数:123
上传日期:2006-10-12 19:08:04
上 传 者linazyjch
说明:  聚类算法全集以及内附数据集,可以方便使用。
(clustering algorithm and the Collected Works of containing data sets, it is convenient to use.)

文件列表:
cluster\doc\copying (26428, 2004-08-13)
cluster\doc\norm.tex (7719, 2004-08-13)
cluster\doc\regular.tex (8864, 2004-08-13)
cluster\doc (0, 2006-09-19)
cluster\ex\iris.dom (296, 2005-06-08)
cluster\ex\iris.pat (2400, 2004-08-13)
cluster\ex\iris.tab (4610, 2004-08-13)
cluster\ex\iris2d.dom (226, 2004-08-13)
cluster\ex\run (1269, 2004-08-13)
cluster\ex\step (682, 2004-08-13)
cluster\ex\sym.dom (198, 2004-11-14)
cluster\ex\sym.fcm (480, 2004-11-14)
cluster\ex\sym.tab (72, 2004-11-14)
cluster\ex\test.dom (198, 2004-08-13)
cluster\ex\test.tab (66, 2004-08-13)
cluster\ex\wine.dom (236, 2005-05-11)
cluster\ex\wine.tab (12188, 2005-05-10)
cluster\ex (0, 2006-09-19)
cluster\fieee_03\abalone.pat (183519, 2004-05-15)
cluster\fieee_03\breast.pat (12800, 2004-05-15)
cluster\fieee_03\cluster (6623, 2004-05-15)
cluster\fieee_03\cluster.out (23256, 2004-05-15)
cluster\fieee_03\cluster.res (5123, 2004-05-15)
cluster\fieee_03\collect (1058, 2004-05-15)
cluster\fieee_03\fieee_03.res (5095, 2004-05-15)
cluster\fieee_03\iris.pat (2400, 2004-05-15)
cluster\fieee_03\wine.pat (11748, 2004-05-15)
cluster\fieee_03 (0, 2006-09-20)
cluster\src\clc.c (30860, 2006-08-01)
cluster\src\clc.dsp (5006, 2006-01-28)
cluster\src\cle.c (15896, 2006-01-30)
cluster\src\cle.dsp (5069, 2004-09-13)
cluster\src\cli.c (40095, 2006-07-21)
cluster\src\cli.dsp (5220, 2004-09-13)
cluster\src\cluster.dsw (1832, 2006-01-28)
cluster\src\cluster.h (13836, 2004-09-13)
cluster\src\cluster.mak (7352, 2006-07-21)
cluster\src\cluster1.c (49841, 2004-09-13)
... ...

mcli/mclx/mcle/cli/clx/cle - probabilistic and fuzzy clustering This file provides some explanations on how to use the programs mcli, mclx, mcle, cli, clx, cle to induce, execute and evaluate a set of clusters. However, it does not explain all options of these programs. For a list of options, call mcli, mclx, mcle, cli, clx, and cle without any arguments. Enjoy, Christian Borgelt e-mail: borgelt@iws.cs.uni-magdeburg.de WWW: http://fuzzy.cs.uni-magdeburg.de/~borgelt ------------------------------------------------------------------------ In this directory (cluster/ex) you can find the well-known iris data (measurements of the sepal length / width and the petal length / width of three types of iris flowers) in formats suitable for the clustering programs. There are two versions: a matrix version iris.pat, which contains only a matrix of numbers, and a table version iris.tab, which contains column names and an additional column with the iris type information. The matrix version can be processed with the programs mcli and mclx. To induce a set of three clusters with the fuzzy c-means algorithm, type mcli -c3 iris.pat iris.cls The option -c3 instructs the program to find three clusters. iris.pat is the input file containing the data, iris.cls the output file to which a description of the clusters will be written. The result of this program call should look like this (contents of iris.cls): function = cauchy(2,0); normmode = sum1; params = {{ [-1.00478, 0.84***84, -1.28465, -1.23865] }, { [-0.0383***5, -0.818721, 0.32297, 0.232151] }, { [1.06925, 0.0374249, 0.970174, 1.02979] }}; scales = [5.84333, 1.21168], [3.05733, 2.30197], [3.758, 0.568374], [1.19933, 1.31632]; The first line states the membership function used (it is the same for all clusters). In this case it is the (generalized) Cauchy function f(d) = 1/(d^a +b), where d is the distance from the cluster center, with parameters a = 2 and b = 0. That is, the (unnormalized) degree of membership is computed as the inverse squared distance from the cluster center. An alternative is the (generalized) Gaussian function f(d) = exp(-0.5 *d^a), which can be selected with the option -G. The second line states the normalization mode for the membership degrees. Here it is "sum1", which means that the membership degrees are scaled in such a way that they sum up to 1. The line starting with "params" and the two lines following it specify the cluster parameters, which in this case (fuzzy c-means algorithm) are the coordinates of the cluster centers. Each section enclosed in curly braces specifies the center of one cluster. The last two lines specify the scaling parameters (offset and scaling factor), which describe how the input data are scaled in order to achieve a distribution with mean 0 and variance 1 in each dimension. The reason for this scaling is to avoid a distortion of the clustering result due to considerably different ranges of values in the input dimensions. If such a normalization is not desired, it can be switched off with the option -q. For example mcli -qc3 iris.pat iris.cls (note how several options can be combined) yields function = cauchy(2,0); normmode = sum1; params = {{ [5.00397, 3.41409, 1.48282, 0.253546] }, { [5.88893, 2.76107, 4.36395, 1.39732] }, { [6.77501, 3.05238, 5.***678, 2.05355] }}; scales = [0, 1], [0, 1], [0, 1], [0, 1]; Here the scaling parameters all specify the identity function, so that the clustering algorithm is executed directly in the input space. The induced set of clusters can than be executed on the data in order to compute the membership degrees for the different data points. This is done with the program mclx. For example, mclx iris.cls iris.pat iris.out creates a table iris.out, which contains three additional columns - one for each cluster. These columns hold the degrees of membership, rounded to two decimal places. (If a higher (or lower) accuracy is desired, the output format of the membership degrees can be changed with the option -o.) If only the cluster with the highest degree of membership is desired, one may use the option -c, which produces only one additional column containing the index of the cluster with the highest degree of membership. To this another column, containing the membership degree for this cluster, may be added with the option -m. The programs cli and clx perform exactly the same tasks as the programs mcli and mclx, only on a different input format, namely the format of the file iris.tab. This format is processed in connection with a domain description file (here: iris.dom) that specifies which columns are to be used and the data types of these columns. In this way it is possible to execute the clustering algorithm on a subset of the attributes without changing the data file. It is also possible to handle symbolic attributes, which are coded by a simple 1-in-n code before they are presented to the clustering algorithm. As a consequence, the output of the program cli contains (compared to the output of the program mcli) an additional section stating the domain information for the attributes. Both programs, mcli as well as cli, are highly parameterizable, so that a large variety of clustering algorithms can be carried out. Here is a list of some options that lead to well-known algorithms: options algorithm -jhard hard c-means algorithm none fuzzy c-means algorithm -v axes-parallel Gustafson-Kessel algorithm -V general Gustafson-Kessel algorithm -wvG axes-parallel Gath-Geva (FMLE) algorithm -wVG general Gath-Geva (FMLE) algorithm -wvGNx1 axes-parallel mixture of Gaussians (EM algorithm) -wVGNx1 general mixture of Gaussians (EM algorithm) Explanation of the individual options: -j# membership normalization mode -v adaptable variances -V adaptable covariances (covariance matrix) -Z adaptable cluster sizes -w adaptable weights/prior probabilities -G Gaussian radial function (default: Cauchy function) -N normalize to unit integral (probability density) -x exponent for pattern weight It is usually advisable to initialize the higher algorithms (like Gustafson-Kessel and Gath-Geva) with a few epochs of the fuzzy c-means algorithm. This can be achieved by exploiting that the programs mcli and cli can read in a clustering result. That is, by a call like mcli -OV iris.pat iris.gk iris.cls the fuzzy c-means result obtained with the program call stated above (stored in iris.cls) is further processed with the Gustafson-Kessel algorithm. The result is written to the file iris.gk. The option -O is necessary to overwrite the cluster type and radial function parameters read from the input file with the command line values. The shape and size regularization options (-H and -R) are described briefly in the file ../doc/regular.tex

近期下载者

相关文件


收藏者