jsgd-61
所属分类:matlab编程
开发工具:matlab
文件大小:171KB
下载次数:11
上传日期:2013-09-03 16:42:11
上 传 者:
diediezhuangzhuang
说明: 该代码采用matlab和C混合编写,是采用随机梯度的方法训练线性分类器,它可用于数据、图像等的多类分类情况。参考论文:Good Practice in Large-Scale Learning for Image Classification
(This a Stochastic Gradient Descent algorithm used to train linear multiclass classifiers. It is biased towards large classification problems (many classes, many examples, high dimensional data).)
文件列表:
jsgd-61 (0, 2013-08-23)
jsgd-61\c (0, 2013-08-23)
jsgd-61\c\jsgd.c (62127, 2013-08-23)
jsgd-61\c\jsgd.h (6545, 2013-08-23)
jsgd-61\c\x_matrix.c (13116, 2013-08-23)
jsgd-61\c\x_matrix.h (5452, 2013-08-23)
jsgd-61\Makefile (2372, 2013-08-23)
jsgd-61\matlab (0, 2013-08-23)
jsgd-61\matlab\jsgd_train.c (7215, 2013-08-23)
jsgd-61\matlab\jsgd_train.m (671, 2013-08-23)
jsgd-61\matlab\jsgd_train.mexa64 (210646, 2013-08-23)
jsgd-61\matlab\pq_decode.m (226, 2013-08-23)
jsgd-61\matlab\test_jsgd_train.m (2817, 2013-08-23)
jsgd-61\matlab\test_pq_encoded.m (2411, 2013-08-23)
jsgd-61\python (0, 2013-08-23)
jsgd-61\python\crossval.py (7644, 2013-08-23)
jsgd-61\python\example_BOW.py (4069, 2013-08-23)
jsgd-61\python\jsgd.py (5982, 2013-08-23)
jsgd-61\python\jsgd_wrap.py (31056, 2013-08-23)
jsgd-61\python\jsgd_wrap.swg (5957, 2013-08-23)
jsgd-61\python\jsgd_wrap_wrap.c (342538, 2013-08-23)
jsgd-61\python\test_jsgd_train.py (2019, 2013-08-23)
jsgd-61\ref (0, 2013-08-23)
jsgd-61\ref\generate_toy_data.m (791, 2013-08-23)
jsgd-61\ref\plot_with_labels.m (396, 2013-08-23)
jsgd-61\ref\read_svmlight.m (671, 2013-08-23)
jsgd-61\ref\sgd_simple.m (4182, 2013-08-23)
jsgd-61\ref\sgd_simple_toy_data.m (1498, 2013-08-23)
jsgd-61\ref\write_svmlight.m (668, 2013-08-23)
=============================================================
==== README for JSGD
=============================================================
What is this?
=============
This a Stochastic Gradient Descent algorithm used to train linear
multiclass classifiers. It is biased towards large classification
problems (many classes, many examples, high dimensional
data).
The reference paper is:
@article{akata:hal-00835810,
AUTHOR = {Akata, Zeynep and Perronnin, Florent and Harchaoui, Zaid and Schmid, Cordelia},
TITLE = {{Good Practice in Large-Scale Learning for Image Classification}},
JOURNAL = {{IEEE Transactions on Pattern Analysis and Machine Intelligence}},
PUBLISHER = {David A. Forsyth},
YEAR = {2013},
MONTH = Jun,
URL = {http://hal.inria.fr/hal-00835810}
}
@inproceedings{perronnin:hal-00690014,
AUTHOR = {Perronnin, Florent and Akata, Zeynep and Harchaoui, Zaid and Schmid, Cordelia},
TITLE = {{Towards Good Practice in Large-Scale Learning for Image Classification}},
BOOKTITLE = {{Computer Vision and Pattern Recognition}},
YEAR = {2012},
MONTH = Jun,
ADDRESS = {Providence, Rhode Island, United States},
URL = {http://hal.inria.fr/hal-00690014}
}
The code uses similar acceleration tricks, yet adapted to large-scale image classification, than the ones used in L. Bottou's package, available at
http://leon.bottou.org/projects/sgd
Files
=====
The directories in the package are:
ref/ contains a basic Matlab implementation of the SGD algorithm based on the paper.
c/ contains the C implementation.
matlab/ contains the MEX wrappers for Octave/Matlab. There is a
precompiled version for ***-bit Linux
python/ contains the Python wrappers, and the optimization code that
uses cross-validation to find the best parameters of the
method.
Compiling / requirements
========================
Requirements:
- ***-bit Linux. The code should be reasonably portable, but no other
platform was tested.
- an implementation of the Basic Linear Algebra Subroutines (BLAS),
that are interfaced with the Fortran conventions.
- SSE vector operations. They are declared using the gcc intrinsics.
The library is compiled using a makefile. First try the following
commands to compile. If they don't work, edit the makefile to adjust
the configuration.
To compile the Matlab interface, do
make mex
For the Octave (Matlab work-alike) interface:
make octave
For the Python interface:
make py
On Ubuntu, sudo apt-get install liboctave-dev should install both
octave and atlas. Be sure to change the line in the Makefile
concerning octave to the required version.
Additional software
--------------------
Although the library is self contained, the examples rely on:
- Yael (https://gforge.inria.fr/projects/yael/), and its Matlab
interface.
- H. Jegou's Matlab implementation of PQcodes
(http://www.irisa.fr/texmex/people/jegou/ann.php) for the PQ encoding.
- some datafiles. See the JSGD homepage
(http://lear.inrialpes.fr/src/jsgd/). All examples are provided in a
simplistic format: a file containing the descriptor matrix + one with
the label vector, for train and test (4 files in total). The files are
in the .fvecs and .ivecs formats.
The examples assume that they are installed in the jsgd directory,
next to c/, matlab/ etc. On a normal Linux installation, the following
commands download and install all the required files (run from the
jsgd-xx directory):
# get Yael (need svn version...)
svn checkout --username anonsvn --password anonsvn --no-auth-cache https://scm.gforge.inria.fr/svn/yael/trunk yael
# compile
cd yael
./configure.sh --enable-numpy
make
# compile matlab interface of Yael
cd matlab
make
cd ../..
# get pqcodes
wget http://www.irisa.fr/texmex/people/jegou/src/pqsearch_2188.tar.gz
tar xvzf pqsearch_2188.tar.gz
Interfaces
==========
There are two high-level interfaces to jsgd_train, a Matlab and a
Python interface. They provide a Matlab/Python jsgd_train function
that takes Matlab/Python matrices as input. The algorithm parameters
are passed as optional arguments to the functions.
Matlab interface
----------------
The Matlab interface uses single matrices for floating-point data and
int32 matrices for indices, labels, etc.
Python interface
----------------
The Python interface uses numpy matrices for i/o. The matrices should
be "c_compact", which means that all matrix dimensions in the
documentation should be transposed. The matrices should be of type
numpy.float32 or numpuy.int32 for floating-point and integer matrices
respectively. You should also set the PYTHONPATH to point on
jsgd-xx/yael, eg.
export PYTHONPATH=../yael
Examples
========
Examples are provided in Matlab and Python for readability.
After installing the package (see "Compiling" above) and its optional
dependencies (see "Additional software" above), you can get started
with some examples. Download a dataset available on
lear.inrialpes.fr/src/jsgd/.
E.g. wget http://pascal.inrialpes.fr/data2/douze/jsgd_data/groupFungus_k***_nclass10_nex10.tgz.
Uncompress it to your local folder:
E.g. tar xzf groupFungus_k***_nclass10_nex10.tgz
To train a One-versus-Rest classifier, use
cd matlab;
octave --eval "test_jsgd_train"
matlab -nojvm -nodesktop -r "test_jsgd_train; exit"
Other useful options and datasets are shown commented in the
test_jsgd_train.m file.
Product quantizer encoding
--------------------------
The script
matlab/test_pq_encoded.m
shows how to learn a product quantizer, how to encode the training set
and how to train a classifier on the encoded training data.
#Follow the additional software steps
wget http://pascal.inrialpes.fr/data2/douze/jsgd_data/groupFungus_k***_nclass134_nex50.tgz
tar xzf groupFungus_k***_nclass134_nex50.tgz
cd matlab
matlab -nojvm -nodesktop -r "test_pq_encoded; exit;"
Cross-validation of parameters
------------------------------
The classifier output by SGD is only as good as the parameters it is
computed with. The default parameters will produce a suboptimal
classifier. The main parameters are:
lambda penalty term for W norm
bias_term this term is appended to each training vector
eta0 intial value of the gradient step
beta for OVR, how many negatives to sample per training vector
temperature for STF, the stiffness of the loss
The script
python/crossval.py
Optimizes the parameters by stepping from a parameter set to another,
monitoring the cross-validated accuracy in the meantime. Unfortunately,
this relatively exaustive exploration of the parameter space is slow
(see below for a few tips on determining optimal parameters).
Comparison with supplementary material of the CVPR paper
--------------------------------------------------------
The script
python/example_BOW.py
reproduces the results from the supplementary material of the CVPR
paper (table 2, Ungulate with 50 examples per class). It shows how to
optimize a subet of parameters and how to use sparse descriptors (it
is based on a BOW in 4096D).
#Additional softwares
wget http://pascal.inrialpes.fr/data2/douze/jsgd_data/groupUngulate_BOW4096.tgz
tar xzf groupUngulate_BOW4096.tgz
cd python
python example_BOW.py
Library layout
==============
The main entry point for the library, both in C, Matlab and Python is
jsgd_train
There are a few mandatory parameters to jsgd_train, and many optional
ones. Dense matrices are stored contiguously in memory with a
column-major (Fortran/Matlab) convention, hence
x(d, n)
means that matrix x has d lines and n columns.
Mandatory parameters
--------------------
The examples should already be shuffled randomly on input.
x(d, n) is a floating-point matrix containing the n example
vectors of dimension d. It can be encoded with a product
quantizer if necessary (see below).
y(n) is an interger array that contains the label associated with
each column of x. Labels are 0-based in C and Python,
1-based in Matlab.
Return values
-------------
The function returns
w(d + 1, nclass) a floating-point array such that
max(w(:, i)' * [x ; 1])
gives the class of vector x
stats a structure (Matlab) or class (Python) that contain
statistics on the resolution (timings, evaluations
on the validation set, etc.)
Optional parameters
-------------------
In Matlab, optional parameters are passed as
'option_name', option_value
pairs. In Python, use
option_name = option_value
In C, just set the fields of the jsgd_params_t structure.
The most important ones:
lambda penalty term for W norm (_lambda in Python)
bias_term this term is appended to each training vector
eta0 intial value of the gradient step
beta for OVR, how many negatives to sample per training vector
temperature for STF, the stiffness of the loss (beta on the website)
fixed_eta (boolean) use a fixed step
verbose verbosity level (integer, default 0)
n_thread if non-zero, try to use threads (does not set the number of threads)
random_seed running with the same random seed should give the same results
algo string, 'ovr', 'mul', 'rnk', 'war', etc, selecting the
algorithm (in C, use the JSGD_ALGO_* constants)
n_epoch number of passes through the data
valid matrix of validation examples
valid_labels corresponding labels
eval_freq evaluate on validation set every that many epochs
stop_valid_threshold
stop if the accuracy is lower than the accuracy
at the previous evaluation + this much. Set to
something like -0.05 to stop iterating when all
hope is lost to obtain a better operating point.
Documentation
=============
Matlab codes for didactical purpose are provided for better understanding of the mex-files in the /ref folder.
They provide a straightforward translation of the methods described in both
articles.
Do not use these codes to run your large-scale experiments. Use the codes in the matlab folder instead.
Comprehensive Matlab implementation
-----------------------------------
sgd_simple.m: SGD implementation
generate_toy_data.m: randomly generate a training and a test set.
sgd_simple_toy_data.m: runs both, computes the accuracy and shows
results graphically.
Determining optimal parameters of SGD
=====================================
Here a few practical remarks on setting the parameters.
The optimization routine implemented in crossval.py normally outputs
optimal parameters. Unfortunately, it is very slow. Below are a few
notes on how to choose reasonable parameters in finite time.
It is possible to fix some parameters and optimize on the others,
which speeds up the estimation. See example_BOW.py for an
example.
An intermediate result of optimize() can be used (see the "keep"
messages of the cross-validation).
For OVR, beta is an important parameter. Setting it to the square root
of the number of classes seems a good starting point (if beta = 0 on
input, the algorithm sets this automatically).
L. Bottou's optimization of eta0 (choosing eta0 that minimizes the
loss on a subset) is implemented in jsgd_train. It will be used if the
passed-in eta0 is 0.
It is not useful to fine-tune the parameters very precisely. The
optimization criterion (top-1 accuracy on cross-validated datasets) is
quite noisy, so choosing small steps on the parameters leads to local
minima. Therefore, crossval.py optimizes lambda, eta0, temperature and
bias_term by powers of 10.
If a validation set is available, do early stopping, and evaluate
every 10 or so epochs. For this, set stop_valid_threshold = -0.05,
which will abandon when the current accuracy is 5 point below the best
one seen so far.
The CVPR paper is also a good source on the influence of the various
parameters.
Low-level library
=================
The C library has one main function:
int jsgd_train(int nclass,
const x_matrix_t *x,
const int *y,
float * w,
float * bias,
jsgd_params_t *params);
Where:
nclass nb of classes
x matrix of train vectors
y(n) labels of each vector. Should be in 0:nclass-1
w(d,nclass) output w matrix
bias(nclass) output bias coefficients
params all other parameters
A matrix a(m, n) of n float vectors of dimension m is represented as
float *a
The (i, j)'th element of the matrix, where 0 <= i < m and 0 <= j < n
can be accessed with
a[ j * m + i ]
x_matrix_t
----------
The matrix of training vectors is a x_matrix_t structure, because it
is not necessarily a dense matrix. Several encodings are used for x_matrix_t's:
** encoding = JSGD_X_FULL
This is a plain dense matrix. The array
data(d, n)
contains the elements.
** encoding = JSGD_X_PQ
This is a product quantizer matrix. There are nsq
subquantizers. Quantization centroids are given by the 3D matrix
centroids(d / nsq, 256, nsq)
The corresponding codes are in array
codes(nsq, n)
** encoding = JSGD_X_SPARSE
The matrix is a sparse column matrix (a la Matlab). Compressed Sparse Columuns (CSC)
indptr(n + 1) points to the the first non-0 cell in column j
indices(nnz) indices[indptr[j]] .. indices[indptr[j]] are
the non-0 rows in column j (nnz = indptr[n])
sparse_data(nnz) sparse_data[indptr[j]] .. sparse_data[indptr[j]] are
the values of the cells in column j.
This is similar to the Matlab encoding and that of CSR in
scipy.sparse. See http://www.mathworks.fr/help/techdoc/apiref/mxsetir.
jsgd_params_t
-------------
The jsgd_params_t contains all the parameters of the training
algorithm, as well as output statistics produced when running it. See
above and c/jsgd.h for an explanation of its fields.
Optimizations
=============
All algorithms can be expressed as (see ref/sgd_simple.m)
for t = 1 to n
# draw a_set_of_classes more or less randomly
for c in a_set_of_classes
# a dot product
score(t, c) := W(:, c)' * x(:, t)
end for
# compute factor from the score table
for c in a_subset_of_classes
# a W update
W(:, c) := scalar * W(:, c) + factor(c) * x(:, t)
end for
end for
The operations that depend on d (the dimension of the vectors) are the
dot products and the W updates. Depending on the algorithm, each
iteration does
Nb of dot products Nb of W updates
OVR 1 + beta 1 + beta
MUL C 0 or 2 (1.08)
RNK 2 0 or 2 (0.43)
WAR 2 to C (6) 0 or 2 (0.72)
where C is the number of classes, the numbers in parentheses are
measured on a run with C = 10 classes.
The library (especially the OVR version) is optimized in several ways:
- by using multipliers. In this case, w is represented as (w_factor *
w). Updates like
w := scalar * w
are never actually performed, because only w_factor is updated. This
optimization is borrowed from Leon Bottou's reference implementation.
- by using SSE instructions. Operations are performed on 4 vectors
components at a time for dense and PQ-encoded matrices. If addresses
are not aligned on a 16-byte boundary or the vector size is not a
multiple of 4, the library falls back on the slow implementation. For
the caller, this means that if the vector dimensions are not a
multiple of 4, it is a good idea to pad them with 0s to the next
multiple of 4.
- for dense matrix-matrix and matrix-vector multiplications, BLAS is
used. Unfortunately, most operations cannot be expressed in these
terms (else the Matlab implementation would be fast enough).
- the performance bottleneck that remains is cache accesses. Assuming
that there is 1MB of L3 cache per processor core, with 4096D single
precision descriptors, only *** of them fit in cache. If
a_set_of_classes is all classes (like with MUL) and if C > ***, then
there will be cache misses at each iteration, which cost more than the
operations performed on the vectors. Therefore, for OVR, the code is
laid out in blocks like
for t0 = 1 to n by t_step
for c = 1 to C
for t = t_block to t_block + t_step - 1
# compute all scores depending on t, c
# perform updates on W(:, c)
end for
end for
end for
where t_step is chosen so that all x(:, t) accessed in the inner loop
fit in cache.
A split in blocks around the d dimension is also implemented. It works
by precomputing
self_dp(t1, t2) = x(:, t1)' * x(:, t2)
for all t0 <= t1, t2 < t0 + t_step. Then, at each t0 block start, we
compute
score2(t, c) := W_t0(:, c)' * x(:, t)
for all c and t0 <= t < t0 + t_step
where W_t0 means "the state of W at t = t0". This operation is easily
split around the dimension d. The score2 table is not the same as the
score table because it is computed with W's state at the start of the
t block. However W_t can be represented as
W_t(:, c) = W_t0(:, c) + alpha_t0 * x(:, t0) + ... + alpha_{t-1} * x(:, t - 1)
Where the alpha_t's are computed at each W update. Therefore the
current score expands to
score(t, c) = W_t(:, c)' * x(:, t)
= score2(t, c) + alpha_t0 * self_dp(t0, t) + ... + alpha_{t-1} * self_dp(t, t-1)
which makes it possible to compute alpha_t. Computations of scores and
alpha's do not involve any operation in d. The actual update of W is
performed at the end of the t block using the operation above.
This method is implemented (use_self_dotprods = 1 and set
d_step). Unfortunately, it is quite complicated, and even worse, it
does not seem to improve the speed, so it is disabled by default.
- when possible, multithreading is used. This is not easy when there
are dependencies between the factors and the scores across
classes. This is not the case for OVR: for OVR multithreading is
peformed along the classes (ie. each thread takes care of a subset of
classes).
Legal, contact
==============
This code is copyright (C) INRIA 2012
Homepage: http://lear.inrialpes.fr/src/jsgd/
License: Cecill (see http://www.cecill.info/licences/Licence_CeCILL_V1.1-US.txt), similar to GPL
Contact:
Mattis Paulin, mattis.paulin@inria.fr
Matthijs Douze, matthijs.douze@inria.fr
Last update 2013-06-17
近期下载者:
相关文件:
收藏者: