Isomapfiles

所属分类:模式识别(视觉/语音等)
开发工具:matlab
文件大小:576KB
下载次数:108
上传日期:2007-10-26 13:45:30
上 传 者country2008
说明:  此程序是进行人脸识别时,进行非线性降维后提取特征的一个典型方法,想做非线性降维特征提取的朋友可以研究,本人看后收获颇多
(This procedure is carried out face recognition, the non-linear dimensionality reduction after the extraction of the characteristics of a typical method to do non-linear dimensionality reduction Friend feature extraction can be studied, I watch a lot of post-harvest)

文件列表:
Isomapfiles\Isomap.m (7476, 2007-10-09)
Isomapfiles\IsomapII.m (13395, 2007-10-09)
Isomapfiles\L2_distance.m (1956, 2007-10-09)
Isomapfiles\dfun.m (85, 2007-10-09)
Isomapfiles\dijk.m (3244, 2007-10-09)
Isomapfiles\dijkstra.cpp (28120, 2007-10-09)
Isomapfiles\dijkstra.dll (57344, 2007-10-09)
Isomapfiles\dijkstra.m (1822, 2007-10-09)
Isomapfiles\fibheap.h (3046, 2007-10-09)
Isomapfiles\swiss_roll_data.mat (800256, 2007-10-09)
Isomapfiles (0, 2007-10-08)

Instructions for Isomap code, Release 1 --------------------------------------- Author: Josh Tenenbaum (jbt@psych.stanford.edu) [Dijkstra code by Mark Steyvers (msteyver@psych.stanford.edu)] Date: December 22, 2000 Website for updates: http://isomap.stanford.edu (Last update: 12/28/00, added "fibheap.h" to code package.) 0. Copyright notice Isomap code -- (c) 19***-2000 Josh Tenenbaum This code is provided as is, with no guarantees except that bugs are almost surely present. Published reports of research using this code (or a modified version) should cite the article that describes the algorithm: J. B. Tenenbaum, V. de Silva, J. C. Langford (2000). A global geometric framework for nonlinear dimensionality reduction. Science 290 (5500): 2319-2323, 22 December 2000. Comments and bug reports are welcome. Email to jbt@psych.stanford.edu. I would also appreciate hearing about how you used this code, improvements that you have made to it, or translations into other languages. You are free to modify, extend or distribute this code, as long as this copyright notice is included whole and unchanged. 1. Contents This package of Matlab (version 5.3) code implements the Isomap algorithm of Tenenbaum, de Silva, and Langford (2000) [TdSL]. The contents of the file "IsomapR1.tar" are: Isomap.m IsomapII.m L2_distance.m Readme dfun.m dijk.m dijkstra.cpp dijkstra.dll (Windows binary produced by "mex -O dijkstra") dijkstra.m fibheap.h (header file for Fibonacci heaps, used by dijkstra.cpp) swiss_roll_data.mat The data file "swiss_roll_data" contains the input-space coordinates ("X_data") and the known low-dimensional manifold coordinates ("Y_data") for 20,000 data points on the Swiss roll. Separately available at isomap.stanford.edu is a large data file of synthetic face images, "face_data.mat.Z". The face data file contains images of 6*** faces ("images"), the projections of those faces onto the first 240 (scaled) principal components ("image_pcs"), and the known pose and lighting parameters for each face ("poses", "lights"). 2. Getting started (Isomap.m) Isomap.m implements the basic version of the algorithm described in [TdSL]. The algorithm takes as input the distances between points in a high-dimensional observation space, and returns as output their coordinates in a low-dimensional embedding that best preserves their intrinsic geodesic distances. Try it out on N=1000 points from the "Swiss roll" data set: >> load swiss_roll_data >> D = L2_distance(X_data(:,1:1000), X_data(:,1:1000), 1); To run Isomap with K = 7 neighbors/point, and produce embeddings in dimensions 1, 2, ..., 10, type these commands: >> options.dims = 1:10; >> [Y, R, E] = Isomap(D, 'k', 7, options); The other options for the code are explained in the header of Isomap.m. The only subtle option is "option.comp", which specifies which connected component to embed in the final step of the algorithm when more than one component has been detected in the neighborhood graph. The default is to embed the largest component. This code should work reasonably well for data sets with 1000 or fewer points. For larger data sets, consider using the advanced code described below. 3. Advanced code (IsomapII.m) IsomapII.m implements a more advanced version of the algorithm that exploits the sparsity of the neighborhood graph in computing shortest-path distances, and can also exploit the redundancy of the distances in constructing a low-dimensional embedding. IsomapII uses Dijkstra's algorithm to compute graph distances. IsomapII works optimally when the file "dijkstra.cpp" (which uses Fibonacci heap data structures) has been compiled (with the command "mex -O dijkstra.cpp") to produce "dijkstra.dll". If IsomapII can't find a file called dijkstra.dll, it will default to a much slower Matlab implementation of Dijkstra's algorithm in dijk.m. Alternatively, setting "option.dijkstra = 0" tells IsomapII to use a Matlab implementation of Floyd's algorithm (also used in Isomap.m), which is generally faster than dijk.m for small-to-medium-size data sets but does not exploit sparsity to achieve better time and space efficiency for large data sets. Both dijk.m and the Floyd algorithm are much much slower than dijkstra.dll, so the latter should be used if at all possible. This code package includes a version of dijkstra.dll suitable for running on Windows platforms. For very large data sets, it is impractical to store in memory a full N x N distance matrix, as Isomap produces after step 2 (computing shortest-path distances), or to calculate its eigenvectors, as Isomap does in step 3 (constructing a low-dimensional embedding). However, in many cases where the data lie on a low-dimensional manifold, the distances computed in step 2 are heavily redundant and most of them can be ignored with little effect on the final embedding. IsomapII constructs embeddings that, rather than trying to preserve distances between all pairs of points, preserve only the distances between all points and a subset of "landmark" points. The user specifies which points to use as landmarks (in the options.landmarks field). Setting "options.landmarks = 1:N" (i.e. the entire data set) makes step 3 of IsomapII equivalent to classical multidimensional scaling (MDS); this is just as in Isomap.m, and it is the default mode for IsomapII. Choosing a much smaller set of landmarks (e.g. by sampling randomly, or by using some subset of the data that is known a priori to be representative) can often be quite a good approximation. Try this version of the Swiss roll example with N=1000 data points but only 50 (random) landmark points: >> load swiss_roll_data >> D = L2_distance(X_data(:,1:1000), X_data(:,1:1000), 1); >> options.dims = 1:10; >> options.landmarks = 1:50; >> [Y, R, E] = IsomapII(D, 'k', 7, options); We do not know of any prior studies of the use of landmark points with classical MDS, and we have only just begun to explore this approach as an extension to Isomap. A more detailed discussion of the use of landmark points in Isomap and classical MDS will be presented in Steyvers, de Silva, and Tenenbaum (in preparation, at http://isomap.stanford.edu). The use of landmark points in nonmetric MDS for data visualization is discussed briefly in a paper by Buja, Swayne, Littman, and Dean ("XGvis: Interactive Data Visualization with Multidimensional scaling", J. Comp. & Graph. Statistics, http://www.research.att.com/areas/stat/xgobi/#xgvis-paper). To facilitate working with large data sets, IsomapII can take as input distances in one of three formats: Mode 1: A full N x N matrix Mode 2: A sparse N x N matrix Mode 3: A distance function (see sample: "dfun.m") Mode 1 is equivalent to Isomap.m (except for the use of dijkstra.dll, which will usually be much faster than Isomap.m). Mode 2 is designed for cases where the distances between nearby points are known, but the differences between faraway points are not known. The sparse input matrix is assumed to contain distances between each point and a set of neighboring points, from which the graph neighbors will be chosen using the standard K or epsilon methods. Any missing entries in the sparse input matrix are effectively assumed to be infinite (NOT zero) -- these represents non-neighboring pairs of points. Except for the sparsity of the input matrix, Mode 2 does not differ in any noticeable way from Mode 1. Mode 3 is designed for cases where the input-space distances have not already been computed explicitly. The user provides a distance function, which takes as input one argument, i, and returns as output a row vector containing the input-space distances from all N points to point i. A sample distance function, "dfun.m", is provided. Note that the distance function may have to use a global variable in order to encode the information necessary to compute the appropriate distances. For example, dfun.m assumes that the coordinates of points in the high-dimensional input space are encoded in the global variable X. It then uses these coordinates to compute Euclidean distances in input space. Try this example: >> load swiss_roll_data >> global X >> X = X_data(:,1:1000); >> options.dims = 1:10; >> options.landmarks = 1:size(X,2); >> [Y, R, E] = IsomapII('dfun', 'k', 7, options); This should give exactly the same results (subject to numerical error and sign changes) as the first example in Section 2, but much more quickly! Using a distance function rather than a distance matrix allows IsomapII to handle much larger data sets (we have tested it up to N=20,000) than the simple Isomap code can handle. Also, note that the distance function need not be Euclidean. Depending on the application, domain-specific knowledge may be useful for designing a more sophisticated distance function (as in Fig. 1B of [TdSL]).

近期下载者

相关文件


收藏者