gcforest
所属分类:人工智能/神经网络/深度学习
开发工具:Python
文件大小:100KB
下载次数:74
上传日期:2017-06-01 18:23:31
上 传 者:
沙霍特
说明: 周志华教授深度森林算法代码,用于分类精度接近深度学习算法
(Professor zhihua s deep forest algorithm code is used to classify precision approach to deep learning algorithm)
文件列表:
datasets (0, 2017-05-22)
datasets\gtzan (0, 2017-05-22)
datasets\gtzan\get_data.sh (1033, 2017-05-26)
datasets\gtzan\splits (0, 2017-05-22)
datasets\gtzan\splits\blues.train (1680, 2017-05-22)
datasets\gtzan\splits\blues.trainval (2400, 2017-05-22)
datasets\gtzan\splits\blues.val (720, 2017-05-22)
datasets\gtzan\splits\classical.train (2240, 2017-05-22)
datasets\gtzan\splits\classical.trainval (3200, 2017-05-22)
datasets\gtzan\splits\classical.val (960, 2017-05-22)
datasets\gtzan\splits\country.train (1960, 2017-05-22)
datasets\gtzan\splits\country.trainval (2800, 2017-05-22)
datasets\gtzan\splits\country.val (840, 2017-05-22)
datasets\gtzan\splits\disco.train (1680, 2017-05-22)
datasets\gtzan\splits\disco.trainval (2400, 2017-05-22)
datasets\gtzan\splits\disco.val (720, 2017-05-22)
datasets\gtzan\splits\genre.train (17360, 2017-05-22)
datasets\gtzan\splits\genre.trainval (24800, 2017-05-22)
datasets\gtzan\splits\genre.val (7440, 2017-05-22)
datasets\gtzan\splits\genres.trainval (22800, 2017-05-22)
datasets\gtzan\splits\hiphop.train (1820, 2017-05-22)
datasets\gtzan\splits\hiphop.trainval (2600, 2017-05-22)
datasets\gtzan\splits\hiphop.val (780, 2017-05-22)
datasets\gtzan\splits\jazz.train (1540, 2017-05-22)
datasets\gtzan\splits\jazz.trainval (2200, 2017-05-22)
datasets\gtzan\splits\jazz.val (660, 2017-05-22)
datasets\gtzan\splits\metal.train (1680, 2017-05-22)
datasets\gtzan\splits\metal.trainval (2400, 2017-05-22)
datasets\gtzan\splits\metal.val (720, 2017-05-22)
datasets\gtzan\splits\pop.train (1400, 2017-05-22)
datasets\gtzan\splits\pop.trainval (2000, 2017-05-22)
datasets\gtzan\splits\pop.val (600, 2017-05-22)
datasets\gtzan\splits\reggae.train (1820, 2017-05-22)
datasets\gtzan\splits\reggae.trainval (2600, 2017-05-22)
datasets\gtzan\splits\reggae.val (780, 2017-05-22)
datasets\gtzan\splits\rock.train (1540, 2017-05-22)
datasets\gtzan\splits\rock.trainval (2200, 2017-05-26)
datasets\gtzan\splits\rock.val (660, 2017-05-22)
datasets\uci_adult (0, 2017-05-22)
datasets\uci_adult\features (1156, 2017-05-22)
... ...
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Description: A python 2.7 implementation of gcForest proposed in [1]. %
%A demo implementation of gcForest library as well as some demo client scripts to demostrate how to use the code. %
%The implementation is flexible enough for modifying the model or fit your own datasets. %
% %
%Reference: [1] Z.-H. Zhou and J. Feng. Deep Forest: Towards an Alternative to Deep Neural Networks. %
% In IJCAI-2017. (https://arxiv.org/abs/1702.08835v2 ) %
% %
%Requirements: This package is developed with Python 2.7, please make sure all the dependencies are installed, %
%which is specified in requirements.txt %
% %
%ATTN: This package is free for academic usage. %
% You can run it at your own risk. %
% For other purposes, please contact Prof. Zhi-Hua Zhou(zhouzh@lamda.nju.edu.cn) %
% %
%ATTN2: This package was developed by Mr.Ji Feng(fengj@lamda.nju.edu.cn). %
% The readme file and demo roughly explains how to use the codes. %
% For any problem concerning the codes, please feel free to contact Mr.Feng. %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Package Official Website: http://lamda.nju.edu.cn/code_gcForest.ashx
This package is provided "AS IS" and free for academic usage.
You can run it at your own risk. For other purposes, please contact Prof. Zhi-Hua Zhou (zhouzh@lamda.nju.edu.cn).
Before running the demo, make sure all the dependencies are installed, for instance, please
run the following command to install dependencies before running the code:
```pip install -r requirements.txt```
===================================
Outline for README
====================================
* Package Overview
* Notes on Demo Scripts
* Notes on Model Specification Files
* Example and Demos
* Using Own Dataset
==================================
Package Overview
==================================
* lib/gcforest
- code for the implementations for gcforest
* tools/train_fg.py
- the demo script used for training Fine grained Layers
* tools/train_cascade.py
- the demo script used for training Cascade Layers
* models/
- folder to save models which can be used in tools/train_fg.py and tools/train_cascade.py
- the gcForest structure is saved in json format
* logs
- folder logs/gcforest is used to save the logfiles produced by demo scripts
============================
Notes on Demo Scripts
============================
Below is a brief description on the args needed for demo scripts
%%%%%%%%%%%%%%%%%%%%
tools/train_fg.py
%%%%%%%%%%%%%%%%%%%%
* --model: str
- The config filepath for Fine grained models (in json format)
* --save_outputs: bool
- if True. The output predictions produced by Fine Grained Model
will be saved in model_cache_dir which is specified in Model Config.
This output will be used when Training Cascade Layer.
- the default value is false
%%%%%%%%%%%%%%%%%%%%%%
tools/train_cascade.py
%%%%%%%%%%%%%%%%%%%%%%
* --model: str
- The model config filepath for cascade training (in json format)
%%%%%%%%%%%%%%%%%%%%%%
Notes on Config Files
%%%%%%%%%%%%%%%%%%%%%%
Below is a brief introduction on how to use model specification files, namely
* model specification for fine grained scanning structure.
* model specification for cascade forests.
All the model specifications (in json files) are saved in models/
For instance, all the model specification files needed for MNIST is stored in models/mnist/gcforest
* ca is short for cascade structure specifications
* fg is short for fine-grained structure specifications
You can define your own structure by writing similar json files.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
FineGrained model's config (dataset)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
* dataset.train, dataset.test: [dict]
- coresponds to the particular datasets defined in lib/datasets
- type [str]: see lib/datasets/__init__.py for a reference
- You can use your own dataset by writing similar wrappers.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
FineGrained model's config (train)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
* train.keep_model_in_mem: [bool] default=0
- if 0, the forest will be freed in RAM
* train.data_cache : [dict]
- coresponds to the DataCache in lib/dataset/data_cache.py
* train.data_cache.cache_dir (str)
- make sure to change "/mnt/raid/fengji/gcforest/cifar10/fg-tree500-depth100-3folds/datas" to your own path
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
FineGrained model's config (net)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
* net.outputs: [list]
- List of the data names output by this model
* net.layers: [List of Layers]
- Layer's Config, see lib/gcforest/layers for a reference
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Cascade model's config (dataset)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Similar as FineGrained's model config (dataset)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Cascade model's config (cascade)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
see lib/gcforest/cascade/cascade_classifier.py __init__ for a reference
=============================
Examples and Demos
=============================
Before running the scripts, make sure to change
* train.data_cache.cache_dir in the Finegrained Model Config (eg: model/xxx/fg-xxxx.json)
* train.cascade.dataset.{train,test}.data_path in the Finegrained-Cascade Model Config (eg: model/xxx/fg-xxxx-ca.json)
* train.cascade.cascade.data_save_dir in the Finegrained Model Config (eg: model/xxx/ca-xxxx.json and model/xxx/fg-xxxx-ca.json)
To Train a gcForest(with fine grained scanning), you need to run two scripts.
* Fine Grained Scanning: 'tools/train_fg.py'
* Cascade Training: 'tools/train_cascade.py'
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
[UCI Letter](http://archive.ics.uci.edu/ml/datasets/Letter+Recognition)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
* Get Data: you need to download the data by yourself by running the following command:
```Shell
cd dataset/uci_letter
sh get_data.sh
```
* Since we do not need to fine-grained scaning, we only train a Cascade Forest as follows:
- `python tools/train_cascade.py --model models/uci_letter/gcforest/ca-tree500-n4x2-3folds.json --log_dir logs/gcforest/uci_letter/ca`
* Adult, YEAST can be trained with similar procedure.
%%%%%%%%%%%%%%%%%%%%%
MNIST
%%%%%%%%%%%%%%%%%%%%%
* Get the data: The data will be automatically downloaded via 'lib/datasets/mnist.py', you do not need to do it yourself
* First Train the Fine Grained Forest:
- Run `python tools/train_fg.py --model models/mnist/gcforest/fg-tree500-depth100-3folds.json --log_dir logs/gcforest/mnist/fg --save_outputs`
- This means:
1. Train a fine grained model for MNIST dataset,
2. Using the structure defined in models/mnist/gcforest/fg-tree500-depth100-3folds.json
3. save the log files in logs/gcforest/mnist/fg
4. The output for the fine grained scanning predictions is saved in train.data_cache.cache_dir
* Then, train the cascade forest (Note: make sure you run the train_fg.py first)
- run `python tools/train_cascade.py --model models/mnist/gcforest/fg-tree500-depth100-3folds-ca.json`
- This means:
1. Train the fine grained scaning results with cascade structure.
2. The cascade model specification is defined in 'models/mnist/gcforest/fg-tree500-depth100-3folds-ca.json'
* You could also train a Cascade Forest without fine-grained scanning (but the accuracy will be much lower):
- Run `python tools/train_cascade.py --model models/mnist/gcforest/ca-tree500-n4x2-3folds.json --log_dir logs/gcforest/mnist/ca`
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
[UCI sEMG](http://archive.ics.uci.edu/ml/datasets/sEMG+for+Basic+Hand+movements)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
* Get Data
```Shell
cd dataset/uci_semg
sh get_data.sh
```
* First Train the Fine Grained Forest:
- `python tools/train_fg.py --model models/uci_semg/gcforest/fg-tree500-depth100-3folds.json --save_outputs --log_dir logs/gcforest/uci_semg/fg`
* Then, train the cascade forest (Note: make sure you run the train_fg.py first)
- `python tools/train_cascade.py --model models/uci_semg/gcforest/fg-tree500-depth100-3folds-ca.json --log_dir logs/gcforest/uci_semg/gc`
* You could also training a Cascade Forest without fine-grained scanning(but the accuracy will be much lower):
- `python tools/train_cascade.py --model models/uci_semg/gcforest/ca-tree500-n4x2-3folds.json --log_dir logs/gcforest/uci_semg/ca`
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
[GTZAN](http://marsyasweb.appspot.com/download/data_sets/)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
* Requirements(you need to install the following package)
librosa
* Get Data by yourself by running the following command
```Shell
cd dataset/gtzan
sh get_data.sh
cd ../..
python tools/audio/cache_feature.py --dataset gtzan --feature mfcc --split genre.trainval
```
* First Train the Fine Grained Forest:
- `python tools/train_fg.py --model models/gtzan/gcforest/fg-tree500-depth100-3folds.json --save_outputs --log_dir logs/gcforest/gtzan/fg`
* Then, train the cascade forest (Note: make sure you run the train_fg.py first)
- `python tools/train_cascade.py --model models/gtzan/gcforest/fg-tree500-depth100-3folds-ca.json --log_dir logs/gcforest/gtzan/gc`
* You could also training a Cascade Forest without fine-grained scanning(but the accuracy will be much lower):
- `python tools/train_cascade.py --model models/gtzan/gcforest/ca-tree500-n4x2-3folds.json --log_dir logs/gcforest/gtzan/ca --save_outputs`
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
IMDB
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
* Cascade Forest:
- `python tools/train_cascade.py --model models/imdb/gcforest/ca-tree500-n4x2-3folds.json --log_dir logs/gcforest/imdb/ca`
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
CIFAR10
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
* First Train the Fine Grained Forest:
- `python tools/train_fg.py --model models/cifar10/gcforest/fg-tree500-depth100-3folds.json --save_outputs`
* Then, train the cascade forest (Note: make sure you run the train_fg.py first)
- `python tools/train_cascade.py --model models/cifar10/gcforest/fg-tree500-depth100-3folds-ca.json`
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
For You Own Datasets
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
* Data Format:
0. Please refer lib/datasets/mnist.py as an example
1. the dataset should has attribute X,y to represent the data and label
2. y should be 1-d array
3. For fine-grained scanning, X should be 4-d array (N x channel x H x W).
(e.g. cifar10 shoud be Nx3x32x32, mnist should be Nx1x28x28, uci_semg should be Nx1x3000x1)
* Model Specifications:
1. Save the json file in models/$dataset_name (recommended)
2. for a detailed description, see section 'Config Files'
* If you only need to train a cascade forest, run tools/train_cascade.py.
Happy Hacking.
Reference:
[1] Z.-H. Zhou and J. Feng. Deep Forest: Towards an Alternative to Deep Neural Networks. In IJCAI-2017.
(https://arxiv.org/abs/1702.08835v2 )
近期下载者:
相关文件:
收藏者: