kmeans_mt
所属分类:matlab编程
开发工具:matlab
文件大小:8KB
下载次数:3
上传日期:2014-09-05 13:10:42
上 传 者:
tariq
说明: Efficient Kmeans using Multiple Threads
文件列表:
kmeans_mt\kmeans_mt.cpp (2255, 2014-01-13)
kmeans_mt\kmeans_unix.h (5726, 2014-05-27)
kmeans_mt\kmeans_win.h (5799, 2014-05-27)
kmeans_mt\newdelete.cpp (756, 2014-01-03)
kmeans_mt\newdelete.h (487, 2014-01-03)
license.txt (1500, 2014-09-04)
Efficient Kmeans using Multiple Threads
-------------------------------
by Haw-Shiuan Chang
-------------------------------
contact samkendi@hotmail.com if you have any question about this code
-------------------------------
Last updated date: 2014/9/4
-------------------------------
About:
This code implements the basic kmeans algorithm using Euclidean distance, and its computation speed is optimized using C/C++ and multiple threads.
When the number of samples and feature dimensions are large, this code would be significantly faster than the one in the Matlab toolbox and other efficient implementation such as litekmeans(http://www.cad.zju.edu.cn/home/dengcai/Data/code/litekmeans.m)
For example, for data with 17 dimensions and 154401 samples, the following are the speeds of different codes to generate the same result after 100 iterations in a PC with 3.4GHz i7 Intel CPU:
Matlab toolbox: 10.32 sec
litekmeans: 7.50 sec
This code: 2.92 sec
For research purposes, using or modifying our soure code is granted, but any form of commercial usage is not allowed.
This code is originally written for the building visual words in following publication:
Haw-Shiuan Chang and Yu-Chiang Frank Wang, "Simple-to-Complex Discriminative Clustering for Hierarchical Image Segmentation", ACCV 2014
If you use this code and your research is related to ours, you can consider to cite our paper.
-------------------------------
Compilation:
You first need to compile this code, execute following command after mex is setted:
mex kmeans_mt.cpp newdelete.cpp;
-------------------------------
Usage:
Execute following command to run this code:
[IDX,final_k_centers]=kmeans_mt(X,init_k_centers,max_iter,1);
where "X" is the data, "init_k_centers" is the location of k initial centers, "max_iter" is the maximal iteration number of EM algorithm, and the final input argument indicates whether the program shows the summation of distances between every samples and its closest center at each iteration.
The first output ("IDX") is the belonging index of every sample, and the second output ("final_k_centers") is the location of k final centers.
If you can use Matlab statistics toolbox, the command would output the same result as:
opts = statset('Display','iter');
[IDX,final_k_centers]=kmeans(X,[],'Options',opts,'start',init_k_centers);
The detail formats of "X", "init_k_centers", "IDX" and "final_k_centers" are the same as the ones in Matlab toolbox. You can find them in http://www.mathworks.com/help/stats/kmeans.html
-------------------------------
Note:
The initialization of kmeans is not the speed bottleneck of the algorithm, so you can use Matlab code to perform this step according your own needs.
The following is the simple example to directly use samples as initial centers if K clusters are needed:
sample_num=size(X,1);
init_k_centers=X(round(1:sample_num/K:sample_num),:);
近期下载者:
相关文件:
收藏者: