spectralclustering-1.1

所属分类:matlab编程
开发工具:matlab
文件大小:15437KB
下载次数:18
上传日期:2017-02-02 20:51:18
上 传 者fzh5224
说明:  基于Nystrom的约束谱聚类,通过Nystrom取样的方式有效降低维度,提高求解速度
(Nystrom spectral clustering based on constraints, sampling by Nystrom way to reduce the dimensions and improve the solution speed)

文件列表:
spectralclustering-1.1 (0, 2013-01-08)
spectralclustering-1.1\gen_nn_distance.m (4020, 2013-01-08)
spectralclustering-1.1\sc.m (2714, 2013-01-08)
spectralclustering-1.1\nystrom.m (4759, 2013-01-08)
spectralclustering-1.1\nystrom_no_orth.m (4463, 2013-01-08)
spectralclustering-1.1\nmi.m (1645, 2013-01-08)
spectralclustering-1.1\accuracy.m (750, 2013-01-08)
spectralclustering-1.1\hungarian.m (9229, 2013-01-08)
spectralclustering-1.1\k_means.m (4309, 2013-01-08)
spectralclustering-1.1\script_sc.m (2341, 2013-01-08)
spectralclustering-1.1\script_sc_selftune.m (2371, 2013-01-08)
spectralclustering-1.1\script_nystrom.m (2043, 2013-01-08)
spectralclustering-1.1\script_nystrom_no_orth.m (2059, 2013-01-08)
spectralclustering-1.1\script_kmeans.m (1267, 2013-01-08)
spectralclustering-1.1\data (0, 2013-01-08)
spectralclustering-1.1\data\corel_100_NN_sym_distance.mat (2626906, 2013-01-08)
spectralclustering-1.1\data\corel_10_NN_sym_distance.mat (258581, 2013-01-08)
spectralclustering-1.1\data\corel_150_NN_sym_distance.mat (3982453, 2013-01-08)
spectralclustering-1.1\data\corel_15_NN_sym_distance.mat (385884, 2013-01-08)
spectralclustering-1.1\data\corel_200_NN_sym_distance.mat (5315835, 2013-01-08)
spectralclustering-1.1\data\corel_20_NN_sym_distance.mat (515644, 2013-01-08)
spectralclustering-1.1\data\corel_50_NN_sym_distance.mat (1297736, 2013-01-08)
spectralclustering-1.1\data\corel_5_NN_sym_distance.mat (129836, 2013-01-08)
spectralclustering-1.1\data\corel_feature.mat (1292947, 2013-01-08)
spectralclustering-1.1\data\corel_label.mat (236, 2013-01-08)

Table of Contents ================= - Introduction - Usage - Examples - Hardware Requirement - Additional Information Introduction ============ This directory includes sources used in the following paper: Parallel Spectral Clustering in Distributed Systems Wen-Yen Chen, Yangqiu Song, Hongjie Bai, Chih-Jen Lin, and Edward Chang IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 33, No. 3, pp. 568-586, March 2011 This code has been tested under ***-bit Linux environment using MATLAB 7.4.0.287 (R2007a). You will be able to regenerate experiment results in the paper. However, results may be slightly different due to the randomness, the CPU speed, and the load of your computer. In the data/ directory, we include the Corel data set and its nearest neighbors files. For the RCV1 data set, due to its large size (~100MB), please check our project page for download link: http://alumni.cs.ucsb.edu/~wychen/sc.html#Download To generate nearest neighbors for RCV1, please refer to the Examples section. Please also note that we assume true/cluster labels are integers 1, 2, ..., num_labels. Usage ===== 1. Generate sparse symmetric distance matrix using t-nearest-neighbor method: matlab> gen_nn_distance(data, num_neighbors, block_size, save_type) -data: N-by-D data matrix, where N is the number of data, D is the number of dimensions. -num_neighbors: Number of nearest neighbors. -block_size: Block size for partitioning the data matrix. We process the data matrix in a divide-and-conquer manner to alleviate memory use. This is useful for processing very large data set when physical memory is limited. -save_type: 0 for .mat file, 1 for .txt file, 2 for both. [Note] The file format of .txt is as follows: data_id #_of_neighbors data_id:distance_value data_id:distance_value ... 2. Run spectral clustering using a sparse similarity matrix: matlab> [cluster_labels evd_time kmeans_time total_time] = sc(A, sigma, num_clusters); -A: N-by-N sparse symmetric distance matrix, where N is the number of data. -sigma: Sigma value used in similarity function S, where S_ij = exp(-dist_ij^2 / 2*sigma*sigma); if sigma is 0, apply self-tunning technique, where S_ij = exp(-dist_ij^2 /2*avg_dist_i*avg_dist_j). -num_clusters: Number of clusters. 3. Run sepctral clustering using Nystrom method with orthogonalization: matlab> [cluster_labels evd_time kmeans_time total_time] = nystrom(data, num_samples, sigma, num_clusters); -data: N-by-D data matrix, where N is the number of data, D is the number of dimensions. -num_samples: Number of random samples. -sigma: Sigma value used in similarity function S, where S = exp(-dist^2 / 2*sigma*sigma). -num_clusters: Number of clusters. 4. Run sepctral clustering using Nystrom method without orthogonalization: matlab> [cluster_labels evd_time kmeans_time total_time] = nystrom_no_orth(data, num_samples, sigma, num_clusters); -data: N-by-D data matrix, where N is the number of data, D is the number of dimensions. -num_samples: Number of random samples. -sigma: Sigma value used in similarity function S, where S = exp(-dist^2 / 2*sigma*sigma). -num_clusters: Number of clusters. 5. Run k-means clustering: matlab> cluster_labels = k_means(data, centers, num_clusters); -data: N-by-D data matrix, where N is the number of data, D is the number of dimensions. -centers: K-by-D centers matrix, where K is num_clusters, or 'random', random initialization, or [], empty matrix, orthogonal initialization -num_clusters: Number of clusters. 6. Evaludate clustering quality using NMI (Normalized Mutual Information): matlab> score = nmi(true_labels, cluster_labels) -true_labels: N-by-1 vector containing true labels. -cluster_labels: N-by-1 vector containing cluster labels. 7. Evaludate clustering quality using accuracy (Hungarian algorithm): matlab> score = accuracy(true_labels, cluster_labels) -true_labels: N-by-1 vector containing true labels. -cluster_labels: N-by-1 vector containing cluster labels. 8. Run scripts (sparse/sparse selftune/nystrom/nystrom without orthogonalization/kmeans): matlab> script_sc(dataset) # spectral clustering using sparse smilarity matrix with given sigma -dataset: data set number, 1 = Corel, 2 = RCV1. matlab> script_sc_selftune(dataset) # spectral clustering using sparse similarity matrix with selftune sigma -dataset: data set number, 1 = Corel, 2 = RCV1. matlab> script_nystrom(dataset) # spectral clustering using nystrom with orthogonalization -dataset: data set number, 1 = Corel, 2 = RCV1. matlab> script_nystrom_no_orth(dataset) # spectral clustering using nystrom without orthogonalization -dataset: data set number, 1 = Corel, 2 = RCV1. matlab> script_kmeans(dataset) # k-means clustering -dataset: data set number, 1 = Corel, 2 = RCV1. Examples ======== 1. Ggenerate sparse symmetric distance matrix using t-nearest-neighbor method: matlab> load data/corel_feature.mat; matlab> gen_nn_distance(feature, 50, 10, 2); This generates two sparse symmetric distance files: 50_NN_sym_distance.mat and 50_NN_sym_distance.txt 2. Run spectral clustering using a sparse similarity matrix: matlab> load data/corel_50_NN_sym_distance.mat; matlab> [cluster_labels evd_time kmeans_time total_time] = sc(A, 20, 18); matlab> load data/corel_label.mat; matlab> nmi_score = nmi(label, cluster_labels) matlab> accuracy_score = accuracy(label, cluster_labels) 3. Run spectral clustering using Nystrom method with orthogonalization: matlab> load data/corel_feature.mat; matlab> [cluster_labels evd_time kmeans_time total_time] = nystrom(feature, 200, 20, 18); matlab> load data/corel_label.mat; matlab> nmi_score = nmi(label, cluster_labels) matlab> accuracy_score = accuracy(label, cluster_labels) 4. Run spectral clustering using Nystrom method without orthogonalization: matlab> load data/corel_feature.mat; matlab> [cluster_labels evd_time kmeans_time total_time] = nystrom_no_orth(feature, 200, 20, 18); matlab> load data/corel_label.mat; matlab> nmi_score = nmi(label, cluster_labels) matlab> accuracy_score = accuracy(label, cluster_labels) 5. Run k-means clustering: matlab> load data/corel_feature.mat; matlab> cluster_labels = k_means(feature, 'random', 18); matlab> nmi_score = nmi(label, cluster_labels) matlab> accuracy_score = accuracy(label, cluster_labels) 6. Run scripts for Corel data: matlab> script_sc(1); matlab> script_sc_selftune(1); matlab> script_nystrom(1); matlab> script_nystrom_no_orth(1); matlab> script_kmeans(1); Hardware Requirement ==================== Note that when running Nystrom with RCV1 data (193,844 instances), it may consume more than 3GB memory with 2000 random samples (193,844 * 2,000 * 8Bytes > 3GB). If you want to run with more number of samples, please make sure you have enough memory on your machine. Frequent Aasked Questions (FAQ) =============================== Please check our project page for FAQ: http://alumni.cs.ucsb.edu/~wychen/sc.html#FAQ Additiona Information ===================== If you find this tool useful, please cite it as Parallel Spectral Clustering in Disributed Systems Wen-Yen Chen, Yangqiu Song, Hongjie Bai, Chih-Jen Lin, and Edward Chang IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 33, No. 3, pp. 568-586, March 2011 The bibtex format is @article{Chen11, author = {Wen-Yen Chen and Yangqiu Song and Hongjie Bai and Chih-Jen Lin and Edward Y. Chang}, title = {Parallel Spectral Clustering in Distributed Systems}, journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence}, volume = {33}, number = {3}, pages = {568-586}, year = {2011} } For any question, please contact Wen-Yen Chen and Chih-Jen Lin .

近期下载者

相关文件


收藏者