20newsgroups_GBDT

所属分类:数值算法/人工智能
开发工具:Python
文件大小:69098KB
下载次数:0
上传日期:2019-05-25 08:45:05
上 传 者sh-1993
说明:  20个新闻组_GBDT,,
(20newsgroups_GBDT,,)

文件列表:
.idea (0, 2019-05-25)
.idea\20newsgroups_GBDT.iml (398, 2019-05-25)
.idea\misc.xml (288, 2019-05-25)
.idea\modules.xml (286, 2019-05-25)
.idea\vcs.xml (180, 2019-05-25)
.idea\workspace.xml (18144, 2019-05-25)
__pycache__ (0, 2019-05-25)
__pycache__\data_process.cpython-36.pyc (4278, 2019-05-25)
__pycache__\gbdt_model.cpython-36.pyc (1058, 2019-05-25)
data (0, 2019-05-25)
data\x_test.pkl (9600162, 2019-05-25)
data\x_train.pkl (38392962, 2019-05-25)
data\y_test.pkl (32159, 2019-05-25)
data\y_train.pkl (128135, 2019-05-25)
data_process.py (4606, 2019-05-25)
data_tfidf (0, 2019-05-25)
data_tfidf\x_test.pkl (6642354, 2019-05-25)
data_tfidf\x_train.pkl (27135198, 2019-05-25)
data_tfidf\y_test.pkl (32159, 2019-05-25)
data_tfidf\y_train.pkl (128135, 2019-05-25)
gbdt_model.py (1631, 2019-05-25)
img (0, 2019-05-25)
img\default_fasttext.png (196190, 2019-05-25)
img\default_tfidf.png (198918, 2019-05-25)
img\default_tfidf_new.png (76846, 2019-05-25)
main.py (1455, 2019-05-25)

# 20newspaper_GBDT This is a implemetation of GBDT multi-classification with sklearn,the dataset is [20news-19997.tar.gz](http://qwone.com/~jason/20Newsgroups/20news-19997.tar.gz) ## Usage ### Requirements - python 3.6 - nltk - scikit-learn - numpy - pickle ## Run ### 1. Generate train data & test data need download and unzip 20news-19997.tar.gz to root dir(eg. './20_newsgroups'), and download crawl-300d-2M.vec, put it in './vectors/'. ``` python data_process.py ``` you will get 2 dirs, each with 4 pickle file: - ./data_tfidf: word used TD-IDF - ./data: word embedding using fasttext, pretrained file: [crawl-300d-2M.vec](https://www.kaggle.com/yekenot/fasttext-crawl-300d-2m) ### 2. Start training GBDT ``` python main.py ``` ## Result ### 1. GBDT + fasttext: ![image](https://github.com/hanyc0914/20newsgroups_GBDT/blob/master/img/default_fasttext.png) ### 2. GBDT + TF-IDF: ![image](https://github.com/hanyc0914/20newsgroups_GBDT/blob/master/img/default_tfidf.png) ### 3. GBDT(n_estimators: 60, max_depth: 6) + TF-IDF ![image](https://github.com/hanyc0914/20newsgroups_GBDT/blob/master/img/default_tfidf_new.png)

近期下载者

相关文件


收藏者