GP

所属分类:数值算法/人工智能
开发工具:Visual C++
文件大小:20405KB
下载次数:5
上传日期:2016-11-22 17:34:40
上 传 者笑看星辰
说明:  基于贝叶斯理论的高斯过程代码,包含高斯过程回归分析,以及相关噪声处理和高斯过程分类,提供数据进行测试,等等
(Gauss procedure code based on Bayesian theory, including Gaussian process regression analysis, and related processing and noise Gaussian process classification, to provide data for testing, etc.)

文件列表:
ard_gaussian_clusters_plot.png (7272, 2015-05-17)
CClctrl.cpp (4971, 2015-05-17)
CClctrl.h (4521, 2015-05-17)
CClctrl.txt (326, 2015-05-17)
CDataModel.h (3031, 2015-05-17)
CDataModel.txt (0, 2015-05-17)
CDist.cpp (8761, 2015-05-17)
CDist.h (9400, 2015-05-17)
CDist.txt (0, 2015-05-17)
CGp.cpp (47389, 2015-05-17)
CGp.h (13969, 2015-05-17)
CGplvm.cpp (26299, 2015-05-17)
CGplvm.h (9566, 2015-05-17)
CGplvm.txt (0, 2015-05-17)
CIvm.cpp (27420, 2015-05-17)
CIvm.h (7325, 2015-05-17)
CIvm.txt (460, 2015-05-17)
CKern.cpp (120848, 2015-05-17)
CKern.h (44021, 2015-05-17)
CKern.txt (1235, 2015-05-17)
CMakeLists.txt (1840, 2015-05-17)
CMatlab.h (7670, 2015-05-17)
CMatrix.cpp (35042, 2015-05-17)
CMatrix.h (42066, 2015-05-17)
CMatrix.txt (794, 2015-05-17)
CMltools.cpp (23799, 2015-05-17)
CMltools.h (5855, 2015-05-17)
CMltools.txt (96, 2015-05-17)
CNdlInterfaces.h (15364, 2015-05-17)
CNoise.cpp (54720, 2015-05-17)
CNoise.h (22148, 2015-05-17)
CNoise.txt (886, 2015-05-17)
COptimisable.cpp (17962, 2015-05-17)
COptimisable.h (5749, 2015-05-17)
COptimisable.txt (0, 2015-05-17)
CPpa.cpp (18385, 2015-05-17)
CPpa.h (5602, 2015-05-17)
CTransform.cpp (4769, 2015-05-17)
CTransform.h (11583, 2015-05-17)
... ...

GPc === Gaussian process code in C++ including some implementations of GP-LVM and IVM. Gaussian Process Software ========================= This page describes how to compile and gives some examples of use of the C++ Gaussian Process code. ### Release Information Current release is 0.001. ### Design Philosophy The software is written in C++ to try and get a degree of flexibility in the models that can be used without a serious performance hit. This was difficult to do in MATLAB as users who have tried version 1 (which was fast but inflexible) and version 2 (which was flexible but slow) of the MATLAB software will appreciate. The software is mainly written in C++ but relies for some functions on FORTRAN code by other authors and the LAPACK and BLAS libraries. As well as the C++ code some utilities are supplied in the corresponding MATLAB code for visualising the results. ## Compiling the Software The software was written with gcc on ubuntu. Part of the reason for using gcc is the ease of interoperability with FORTRAN. The code base makes fairly extensive use of FORTRAN so you need to have g77 installed. The software is compiled by writing ```sh $ make gp ``` at the command line. Architecture specific options are included in the `make.ARCHITECTURE` files. Rename the file with the relevant architecture to `make.inc` for it to be included. ### Optimisation One of the advantages of interfacing to the LAPACK and BLAS libraries is that they are often optimised for particular architectures. ## General Information The way the software operates is through the command line. There is one executable, `gp`. Help can be obtained by writing ```sh $ ./gp -h ``` which lists the commands available under the software. Help for each command can then be obtained by writing, for example, ```sh $ ./gp learn -h ``` All the tutorial optimisations suggested take less than 1/2 hour to run on my less than 2GHz Pentium IV machine. The first oil example runs in a couple of minutes. Below I suggest using the highest verbosity options `-v 3` in each of the examples so that you can track the iterations. ## Examples The software loads in data in the SVM light format. This is to provide compatibility with other Gaussian Process software. Anton Schwaighofer has written a package which can write from MATLAB to the SVM light format. ## One Dimensional Data Data Provided with the software, in the `examples` directory, is a one dimensional regression problem. The file is called `spgp1d.svml`. First we will learn the data using the following command, ```sh $ ./gp -v 3 learn -# 100 examples/sinc.svml sinc.model ``` The flag `-v 3` sets the verbosity level to 3 (the highest level) which causes the iterations of the scaled conjugate gradient algorithm to be shown. The flag `-# 100` terminates the optimisation after 100 iterations so that you can quickly move on with the rest of the tutorial. The software will load the data in `sinc.svml`. The labels are included in this file but they are not used in the optimisation of the model. They are for visualisation purposes only. ### Gnuplot The learned model is saved in a file called `sinc.model`. This file has a plain text format to make it human readable. Once training is complete, the learned covariance function parameters of the model can be displayed using ```sh $ ./gp display sinc.model ``` ```sh Loading model file. ... done. Standard GP Model: Optimiser: scg Data Set Size: 40 Kernel Type: Scales learnt: 0 X learnt: 0 Bias: 0.106658 Scale: 1 Gaussian Noise: Bias on process 0: 0 Variance: 1e-06 compound kernel: rbfinverseWidth: 0.1***511 rbfvariance: 0.0751124 biasvariance: 1.6755e-05 whitevariance: 0.00204124 ``` Notice the fact that the covariance function is composed of an RBF kernel, also known as squared exponential kernel or Gaussian kernel; a bias kernel, which is just a constant, and a white noise kernel, which is a diagonal term. This is the default setting, it can be changed with flags to other covariance function types, see `./gp learn -h` for details. For your convenience a `gnuplot` file may generated to visualise the data. First run ```sh $ ./gp gnuplot -r 400 examples/sinc.svml sinc.model sinc ``` The `sinc` supplied as the last argument acts as a stub for gnuplot to create names from, so for example (using gnuplot vs 4.0 or above), you can write ```sh $ gnuplot sinc_plot.gp ``` Note: for this to work on OSX you may have to ```sh $ brew install gnuplot --with-x ``` Then you should obtain the plot shown below

Gaussian process applied to sinc data.

The other files created are `sinc_error_bar_data.dat`, which produces the error bars and `sinc_line_data.dat` which produces the mean as well as `sinc_scatter_data.dat` which shows the training data. ### Other Data You might also want to try a larger data set. ```sh $ ./gp -v 3 learn -# 100 examples/spgp1d.svml spgp1d.model ``` ### MATLAB and OCTAVE While MATLAB can be slow (and very expensive for non-academic users) it can still be a lot easier to code the visualisation routines by building on MATLAB's graphics facilities. To this end you can load in the results from the MATLAB/OCTAVE GPmat toolbox for further manipulation. You can download the toolbox from here. Once the relevant toolboxes (you need all the dependent toolboxes) are downloaded you can visualise the results in MATLAB using ```matlab >> [y, X] = svmlread('sinc.svml') >> gpReadFromFile('sinc.model', X, y) >> ``` where we have used the SVML toolbox of Anton Schwaighofer to load in the data. IVM Software ============ This page describes how to compile and gives some examples of use of the C++ Informative Vector Machine Software (IVM). ### Design Philosophy The software is written in C++ to try and get a degree of flexibility in the models that can be used without a serious performance hit. The software is mainly written in C++ but relies for some functions on FORTRAN code by other authors and the LAPACK and BLAS libraries. ## Compiling the Software The software was written with gcc vs 3.2.2. There are definitely Standard Template Library issues on Solaris with gcc 2.95, so I suggest that at least version 3.2 or above is used. Part of the reason for using gcc is the ease of interoperability with FORTRAN. The code base makes fairly extensive use of FORTRAN so you need to have g77 installed. The software is compiled by writing ```sh $ make ivm ``` at the command line. Architecture specific options are included in the `make.ARCHITECTURE` files. Rename the file with the relevant architecture to `make.inc` for it to be included. ### Optimisation One of the advantages of interfacing to the LAPACK and BLAS libraries is that they are often optimised for particular architectures. The file `make.atlas` includes options for compiling the ATLAS optimised versions of lapack and blas that are available on a server I have access to. These options may vary for particular machines. ### Cygwin For Windows users the code compiles under cygwin. However you will need version s of the lapack and blas libraries available (see www.netlib.org). This can take some time to compile, and in the absence of any pre-compiled versions on the web I've provided some pre-compiled versions you may want to make use of (see the cygwin directory). Note that these pre-compiled versions are not optimised for the specific architecture and therefore do not give the speed up you would hope for from using lapack and blas. ### Microsoft Visual C++ As of Release 0.101 the code compiles under Microsoft Visual Studio 7.1. A project file is provided in the current release in the directory `MSVC/ivm`. The compilation makes use of f2c versions of the FORTRAN code and the C version of LAPACK/BLAS, CLAPACK. Detailed instructions on how to compile are in the readme.msvc file. Much of the work to convert the code (which included ironing out several bugs) was done by William V. Baxter for the GPLVM code. ## General Information The way the software operates is through the command line. There is one executable, `ivm`. Help can be obtained by writing ```sh $ ./ivm -h ``` which lists the commands available under the software. Help for each command can then be obtained by writing, for example, ```sh $ ./ivm learn -h ``` All the tutorial optimisations are suggested take less than 1/2 hour to run on my less than 2GHz Pentium IV machine. The first oil example runs in a couple of minutes. Below I suggest using the highest verbosity options `-v 3` in each of the examples so that you can track the iterations. ## Bugs Victor Cheng writes: " ... I've tested your IVM C++ Gaussian Process tool (IVMCPP0p12 version). It is quite useful. However, the gnuplot function seems has a problem. Every time I type the command: "Ivm gnuplot traindata name.model", an error comes out as: "Unknown noise model!". When I test this function with IVMCPP0p11 IVM, its fine, but IVMCPP0p11 has another problem that it gives "out of memory" error in test mode! So I use two vesions simultaneously. " I'm working (as of 31/12/2007) on a major rewrite, so it's unlikely that these bugs will be fixed in the near future, however if anyone makes a fix I'll be happy to incorporate it! Please let me know. ## Examples The software loads in data in the SVM light format. Anton Schwaighofer has written a package which can write from MATLAB to the SVM light format. ## Toy Data Sets In this section we present some simple examples. The results will be visualised using `gnuplot`. It is suggested that you have access to `gnuplot` vs 4.0 or above. Provided with the software, in the `examples` directory, are some simple two dimensional problems. We will first try classification with these examples. The first example is data sampled from a Gaussian process with an RBF kernel function with inverse width of 10. The input data is sampled uniformly from the unit square. This data can be learnt with the following command. ```sh $ ./ivm -v 3 learn -a 200 -k rbf examples/unitsquaregp.svml unitsquaregp.model ``` The flag `-v 3` sets the verbosity level to 3 (the highest level) which causes the iterations of the scaled conjugate gradient algorithm to be shown. The flag `-a 200` sets the active set size. The kernel type is selected with the flag `-k rbf`. ### Gnuplot The learned model is saved in a file called `unitsquaregp.model`. This file has a plain text format to make it human readable. Once training is complete, the learned kernel parameters of the model can be displayed using ```sh $ ./ivm display unitsquaregp.model ``` ```sh Loading model file. ... done. IVM Model: Active Set Size: 200 Kernel Type: compound kernel: rbfinverseWidth: 12.1211 rbfvariance: 0.136772 biasvariance: 0.000229177 whitevariance: 0.0784375 Noise Type: Probit noise: Bias on process 0: 0.237516 ``` Notice the fact that the kernel is composed of an RBF kernel, also known as squared exponential kernel or Gaussian kernel; a bias kernel, which is just a constant, and a white noise kernel, which is a diagonal term. The bias kernel and the white kernel are automatically added to the rbf kernel. Other kernels may also be used, see `ivm learn -h` for details. For this model the input data is two dimensional, you can therefore visualise the decision boundary using ```sh $ ./ivm gnuplot examples/unitsquaregp.svml unitsquaregp.model unitsquaregp ``` The `unitsquaregp` supplied as the last argument acts as a stub for gnuplot to create names from, so for example (using gnuplot vs 4.0 or above), you can write ```sh $ gnuplot unitsquaregp_plot.gp ``` and obtain the plot shown below

The decision boundary learnt for the data sampled from a Gaussian process classification. Note the active points (blue stars) typically lie along the decision boundary.

The other files created are `oil100_variance_matrix.dat`, which produces the grayscale map of the log precisions and `oil100_latent_data1-3.dat` which are files containing the latent data positions associated with each label. ### Feature Selection Next we consider a simple ARD kernel. The toy data in this case is sampled from three Gaussian distributions. To separate the data only one input dimension is necessary. The command is run as follows, ```sh $ ./ivm learn -a 100 -k rbf -i 1 examples/ard_gaussian_clusters.svml ard_gaussian_clusters.model ``` Displaying the model it is clear that it has selected one of the input dimensions, ```sh Loading model file.
... done.
IVM Model:
Active Set Size: 100
Kernel Type:
compound kernel:
rbfardinverseWidth: 0.12293
rbfardvariance: 2.25369
rbfardinputScale: 5.88538e-08
rbfardinputScale: 0.935148
biasvariance: 9.10663e-07
whitevariance: 2.75252e-08
Noise Type:
Probit noise:
Bias on process 0: 0.7450*** ``` Once again the results can be displayed as a two dimensional plot, ```sh $ ./ivm gnuplot examples/ard_gaussian_clusters.svml ard_gaussian_clusters.model ard_gaussian_clusters ```

The IVM learnt with an ARD RBF kernel. One of the input directions has been recognised as not relevant.
## Semi-Supervised Learning The software also provides an implementation of the null category noise model described in Lawrence and Jordan. The toy example given in the paper is reconstructed here. To run it type ```sh $ ./ivm learn -a 100 -k rbf examples/semisupercrescent.svml semisupercrescent.model ``` The result of learning is ```sh Loading model file. ... done. IVM Model: Active Set Size: 100 Kernel Type: compound kernel: rbfinverseWidth: 0.0716589 rbfvariance: 2.58166 biasvariance: 2.03635e-05 whitevariance: 3.9588e-06 Noise Type: Ncnm noise: Bias on process 0: 0.237009 Missing label probability for -ve class: 0.9075 Missing label probability for +ve class: 0.9075 ``` and can be visualised using ```sh $ ./ivm gnuplot examples/semisupercrescent.svml semisupercrescent.model semisupercrescent ``` followed by ```sh $ gnuplot semisupercrescent_plot.gp ``` The result of the visualisation being,

The result of semi-supervised learning on the crescent data. At the top is the result from the null category noise model. The bottom shows the result from training only on the labelled data only with the standard probit noise model. Purple squares are unlabelled data, blue stars are the active set.
GP-LVM Software =============== This page describes how to compile and gives some examples of use of the C++ Gaussian Process Latent Variable Model Software (GP-LVM) available for download here. ### Release Information #### Release 0.201 Fixed bug which meant that back constraint wasn't working due to failure to initialise lwork properly for dsysv. Fixed bug in gplvm.cpp which meant dynamics wasn't working properly because initialization of dynamics model learning parameter wasn't set to zero. Thanks to Gustav Henter for pointing out these problems. #### Release 0.2 In this release we updated the class structure of the gplvm model and made some changes in the way in which files are stored. This release is intended as a stopgap before a release version in which fitc, dtc and variational dtc approximations will be available. In this release the dynamics model of Wang et al. has been included. The initial work was done by William V. Baxter, with modifications by me to include the unusual prior Wang suggests in his MSc thesis, scaling of the dynamics likelihood and the ability to set the signal noise ratio. A new example has been introduced for this model below. As part of the dynamics introduction a new MATLAB toolbox for the GP-LVM has been released. This toolbox, download here, is expected to be the main development toolbox for the GP-LVM in MATLAB. Version 0.101 was released 21st October 2005. This release contained modifications by William V. Baxter to enable the code to work with Visual Studio 7.1. Version 0.1, was released in late July 2005. This was the original release of the code. ### Design Philosophy The software is written in C++ to try and get a degree of flexibility in the models that can be used without a serious performance hit. This was difficult to do in MATLAB as users who have tried version 1 (which was fast but inflexible) and version 2 (which was flexible but slow) of the MATLAB software will appreciate. The sparsification algorithm has not been implemented in the C++ software so this software is mainly for smaller data sets (up to around a thousand points). The software is mainly written in C++ but relies for some functions on FORTRAN code by other authors and the LAPACK and BLAS libraries. As well as the C++ code some utilities are supplied in MATLAB code for visualising the results. ## Compiling the Software The software was written with gcc vs 3.2.2. There are definitely Standard Template Library issues on Solaris with gcc 2.95, so I suggest that at least version 3.2 or above is used. Part of the reason for using gcc is the ease of interoperability with FORTRAN. The code base makes fairly extensive use of FORTRAN so you need to have g77 installed. The software is compiled by writing ```sh $ make gplvm ``` at the command line. Architecture specific options are included in the `make.ARCHITECTURE` files. Rename the file with the relevant architecture to `make.inc` for it to be included. ### Optimisation One of the advantages of interfacing to the LAPACK and BLAS libraries is that they are often optimised for particular architectures. The file `make.atlas` includes options for compiling the ATLAS optimised versions of lapack and blas that are available on a server I have access to. These options may vary for particular machines. ### Cygwin For Windows users the code compiles under cygwin. However you will need version s of the lapack and blas libraries available (see www.netlib.org. This can take some time to compile, and in the absence of any pre-compiled versions on the web I've provided some pre-compiled versions you may want to make use of (see the cygwin directory). Note that these pre-compiled versions are not optimised for the specific architecture and therefore do not give the speed up you would hope for from using lapack and blas. ### Microsoft Visual C++ As of Release 0.101 the code compiles under Microsoft Visua ... ...

近期下载者

相关文件


收藏者