cluster
所属分类:数据挖掘/数据仓库
开发工具:Visual C++
文件大小:114KB
下载次数:211
上传日期:2006-07-06 15:55:46
上 传 者:
snowbye
说明: 这个也是一个聚类算法的源代码,下载下来觉得还不错,希望可以和相同研究方向的同胞们共同探讨有关空间聚类的理论和应用。谢谢您查阅
(this a clustering algorithm source code, downloaded think that was pretty good, hope and the same research fellow jointly explore the spatial clustering theory and application. Thank you for your inspection. .)
文件列表:
cluster\doc (0, 2006-02-01)
cluster\doc\copying (26428, 2004-08-13)
cluster\doc\norm.tex (7719, 2004-08-13)
cluster\doc\regular.tex (8864, 2004-08-13)
cluster\ex (0, 2006-02-01)
cluster\ex\iris.dom (296, 2005-06-08)
cluster\ex\iris.pat (2400, 2004-08-13)
cluster\ex\iris.tab (4610, 2004-08-13)
cluster\ex\iris2d.dom (226, 2004-08-13)
cluster\ex\run (1269, 2004-08-13)
cluster\ex\step (682, 2004-08-13)
cluster\ex\sym.dom (198, 2004-11-15)
cluster\ex\sym.fcm (480, 2004-11-15)
cluster\ex\sym.tab (72, 2004-11-15)
cluster\ex\test.dom (198, 2004-08-13)
cluster\ex\test.tab (66, 2004-08-13)
cluster\ex\wine.dom (236, 2005-05-11)
cluster\ex\wine.tab (12188, 2005-05-10)
cluster\src (0, 2006-04-26)
cluster\src\clc.c (27129, 2006-02-01)
cluster\src\clc.dsp (5006, 2006-01-28)
cluster\src\cle.c (15896, 2006-01-31)
cluster\src\cle.dsp (5069, 2004-09-14)
cluster\src\cli.c (40090, 2006-01-31)
cluster\src\cli.dsp (5220, 2004-09-14)
cluster\src\cluster.dsw (1832, 2006-01-28)
cluster\src\cluster.h (13836, 2004-09-14)
cluster\src\cluster.mak (7453, 2006-01-28)
cluster\src\cluster1.c (49841, 2004-09-14)
cluster\src\cluster2.c (55775, 2004-09-14)
cluster\src\cluster3.c (13173, 2004-09-14)
cluster\src\cluster4.c (698, 2004-09-14)
cluster\src\clx.c (21399, 2006-01-28)
cluster\src\clx.dsp (5006, 2004-09-14)
cluster\src\makefile (7473, 2006-01-31)
cluster\src\mcle.dsp (4533, 2004-09-14)
cluster\src\mcli.dsp (4684, 2004-09-14)
cluster\src\mclx.dsp (4470, 2004-09-14)
... ...
mcli/mclx/mcle/cli/clx/cle - probabilistic and fuzzy clustering
This file provides some explanations on how to use the programs mcli,
mclx, mcle, cli, clx, cle to induce, execute and evaluate a set of
clusters. However, it does not explain all options of these programs.
For a list of options, call mcli, mclx, mcle, cli, clx, and cle without
any arguments.
Enjoy,
Christian Borgelt
e-mail: borgelt@iws.cs.uni-magdeburg.de
WWW: http://fuzzy.cs.uni-magdeburg.de/~borgelt
------------------------------------------------------------------------
In this directory (cluster/ex) you can find the well-known iris data
(measurements of the sepal length / width and the petal length / width
of three types of iris flowers) in formats suitable for the clustering
programs. There are two versions: a matrix version iris.pat, which
contains only a matrix of numbers, and a table version iris.tab, which
contains column names and an additional column with the iris type
information.
The matrix version can be processed with the programs mcli and mclx.
To induce a set of three clusters with the fuzzy c-means algorithm, type
mcli -c3 iris.pat iris.cls
The option -c3 instructs the program to find three clusters. iris.pat
is the input file containing the data, iris.cls the output file to
which a description of the clusters will be written. The result of
this program call should look like this (contents of iris.cls):
function = cauchy(2,0);
normmode = sum1;
params = {{ [-1.00478, 0.84***84, -1.28465, -1.23865] },
{ [-0.0383***5, -0.818721, 0.32297, 0.232151] },
{ [1.06925, 0.0374249, 0.970174, 1.02979] }};
scales = [5.84333, 1.21168], [3.05733, 2.30197],
[3.758, 0.568374], [1.19933, 1.31632];
The first line states the membership function used (it is the same for
all clusters). In this case it is the (generalized) Cauchy function
f(d) = 1/(d^a +b),
where d is the distance from the cluster center, with parameters a = 2
and b = 0. That is, the (unnormalized) degree of membership is computed
as the inverse squared distance from the cluster center. An alternative
is the (generalized) Gaussian function
f(d) = exp(-0.5 *d^a),
which can be selected with the option -G.
The second line states the normalization mode for the membership
degrees. Here it is "sum1", which means that the membership degrees
are scaled in such a way that they sum up to 1.
The line starting with "params" and the two lines following it specify
the cluster parameters, which in this case (fuzzy c-means algorithm)
are the coordinates of the cluster centers. Each section enclosed in
curly braces specifies the center of one cluster.
The last two lines specify the scaling parameters (offset and scaling
factor), which describe how the input data are scaled in order to
achieve a distribution with mean 0 and variance 1 in each dimension.
The reason for this scaling is to avoid a distortion of the clustering
result due to considerably different ranges of values in the input
dimensions.
If such a normalization is not desired, it can be switched off with
the option -q. For example
mcli -qc3 iris.pat iris.cls
(note how several options can be combined) yields
function = cauchy(2,0);
normmode = sum1;
params = {{ [5.00397, 3.41409, 1.48282, 0.253546] },
{ [5.88893, 2.76107, 4.36395, 1.39732] },
{ [6.77501, 3.05238, 5.***678, 2.05355] }};
scales = [0, 1], [0, 1], [0, 1], [0, 1];
Here the scaling parameters all specify the identity function, so that
the clustering algorithm is executed directly in the input space.
The induced set of clusters can than be executed on the data in order
to compute the membership degrees for the different data points. This
is done with the program mclx. For example,
mclx iris.cls iris.pat iris.out
creates a table iris.out, which contains three additional columns -
one for each cluster. These columns hold the degrees of membership,
rounded to two decimal places. (If a higher (or lower) accuracy is
desired, the output format of the membership degrees can be changed
with the option -o.)
If only the cluster with the highest degree of membership is desired,
one may use the option -c, which produces only one additional column
containing the index of the cluster with the highest degree of
membership. To this another column, containing the membership degree
for this cluster, may be added with the option -m.
The programs cli and clx perform exactly the same tasks as the programs
mcli and mclx, only on a different input format, namely the format of
the file iris.tab. This format is processed in connection with a domain
description file (here: iris.dom) that specifies which columns are to
be used and the data types of these columns. In this way it is possible
to execute the clustering algorithm on a subset of the attributes
without changing the data file. It is also possible to handle symbolic
attributes, which are coded by a simple 1-in-n code before they are
presented to the clustering algorithm. As a consequence, the output of
the program cli contains (compared to the output of the program mcli)
an additional section stating the domain information for the attributes.
Both programs, mcli as well as cli, are highly parameterizable, so
that a large variety of clustering algorithms can be carried out.
Here is a list of some options that lead to well-known algorithms:
options algorithm
-jhard hard c-means algorithm
none fuzzy c-means algorithm
-v axes-parallel Gustafson-Kessel algorithm
-V general Gustafson-Kessel algorithm
-wvG axes-parallel Gath-Geva (FMLE) algorithm
-wVG general Gath-Geva (FMLE) algorithm
-wvGNx1 axes-parallel mixture of Gaussians (EM algorithm)
-wVGNx1 general mixture of Gaussians (EM algorithm)
Explanation of the individual options:
-j# membership normalization mode
-v adaptable variances
-V adaptable covariances (covariance matrix)
-Z adaptable cluster sizes
-w adaptable weights/prior probabilities
-G Gaussian radial function (default: Cauchy function)
-N normalize to unit integral (probability density)
-x exponent for pattern weight
It is usually advisable to initialize the higher algorithms (like
Gustafson-Kessel and Gath-Geva) with a few epochs of the fuzzy c-means
algorithm. This can be achieved by exploiting that the programs mcli
and cli can read in a clustering result. That is, by a call like
mcli -OV iris.pat iris.gk iris.cls
the fuzzy c-means result obtained with the program call stated above
(stored in iris.cls) is further processed with the Gustafson-Kessel
algorithm. The result is written to the file iris.gk. The option -O
is necessary to overwrite the cluster type and radial function
parameters read from the input file with the command line values.
The shape and size regularization options (-H and -R) are described
briefly in the file ../doc/regular.tex
近期下载者:
相关文件:
收藏者: