10kGNAD

所属分类:聚类算法
开发工具:Python
文件大小:20315KB
下载次数:0
上传日期:2022-11-07 22:18:43
上 传 者sh-1993
说明:  一万篇德国新闻文章主题分类数据集
(Ten Thousand German News Articles Dataset for Topic Classification)

文件列表:
LICENSE (1067, 2019-10-17)
articles.csv (27160809, 2019-10-17)
code (0, 2019-10-17)
code\extract_dataset_from_sqlite.py (2448, 2019-10-17)
code\generate_lowshot_sets.py (2506, 2019-10-17)
code\split_articles_into_train_test.py (1964, 2019-10-17)
requirements.txt (56, 2019-10-17)
test.csv (2755020, 2019-10-17)
train.csv (24405789, 2019-10-17)

# Ten Thousand German News Articles Dataset For more information visit the detailed [project page](https://tblock.github.io/10kGNAD/). 1. Install the required python packages `pip install -r requirements.txt`. 2. Download the `corpus.sqlite3` file into the project root from [here (compressed)](https://github.com/OFAI/million-post-corpus/releases/download/v1.0.0/million_post_corpus.tar.bz2) or directly from [here](https://github.com/tblock/10kGNAD/releases/download/v1.0/corpus.sqlite3). 3. Run `python code/extract_dataset_from_sqlite.py corpus.sqlite3 articles.csv` to extract the articles. 4. Run `python code/split_articles_into_train_test.py` to split the dataset. ## License All code in this repository is licensed under a MIT License. The dataset is licensed under a [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-nc-sa/4.0/).

近期下载者

相关文件


收藏者