
开发工具:Jupyter Notebook
上传日期:2022-09-06 16:51:33
上 传 者sh-1993
说明:  新闻分类神经网络模型用于分级新闻数据集的分类。检测类重叠和
(News classification neural network model to classify a hierarchical news dataset. Algorithm to detect class overlap and)

LICENSE (1069, 2022-09-07)
RNNModel (0, 2022-09-07)
RNNModel\ (9936, 2022-09-07)
RNNModel\RNN_model.ipynb (79762, 2022-09-07) (5266, 2022-09-07)
data_figs_base (0, 2022-09-07)
data_figs_base\F1-bert.png (27221, 2022-09-07)
data_figs_base\Number_of_articles_per_category.png (49322, 2022-09-07)
data_figs_base\Words_per_category.png (46668, 2022-09-07)
data_figs_base\acc-bert.png (24417, 2022-09-07)
data_figs_base\confusion_matrix_unweighted.png (410637, 2022-09-07)
data_figs_base\jenshan_hist_full.png (41867, 2022-09-07)
data_figs_base\jsdiv_full.png (638558, 2022-09-07)
data_figs_base\loss-bert.png (33954, 2022-09-07)
data_figs_new (0, 2022-09-07)
data_figs_new\F1-bert.png (24486, 2022-09-07)
data_figs_new\Number_of_articles_per_category.png (49322, 2022-09-07)
data_figs_new\Words_per_category.png (39325, 2022-09-07)
data_figs_new\acc-bert.png (23215, 2022-09-07)
data_figs_new\loss-bert.png (27259, 2022-09-07) (7337, 2022-09-07) (17246, 2022-09-07) (14679, 2022-09-07) (2631, 2022-09-07) (5244, 2022-09-07) (2270, 2022-09-07)
requirements.txt (161, 2022-09-07)
slides_final.pdf (2194809, 2022-09-07)
tSNE_plots (0, 2022-09-07)
tSNE_plots\ARTS & CULTURE.png (97162, 2022-09-07)
tSNE_plots\BLACK VOICES.png (109844, 2022-09-07)
tSNE_plots\BUSINESS.png (130398, 2022-09-07)
tSNE_plots\COLLEGE.png (71567, 2022-09-07)
... ...

# Attentive-neural-networks-for-news-classification * The goal of this this project is to build a news classification neural network model which can classify a hierarchial news dataset. Along with this, we propose a novel algorithm to detect class overlap in news categories, which can be learnt from computation on our model's intermediate stages. ## Requirments to run : Inside the created Python environment (conda, venv, py-env etc.), run $ pip install -r requirements.txt ## GPU requirements : * A GPU (CUDA enabled) is required to run the project. Average running time for main model on a single 12 GB GPU for us was ~7 hours. * We provide a detailed method to setup PyTorch and Torchvision inside a Conda environment with CUDA support. * Follow these steps: + Use the script [``]( + Make it executable with `chmod +x ./` + Run with `./` * The installation creates the following two directories in the install location: + `conda`: Contains the miniconda installation + `conda_pkgs`: Contains the cache for downloaded and decompressed packages * Creating the first environment creates an additional directory in the install location: + `conda_envs`: Contains the created environment(s) * In `conda/bin/` run `./conda init` * Now run `conda create --name pytorch torchvision cudatoolkit=10.1 --channel pytorch` (this is our specific version) * Run `conda activate ` to activate the environment * Run `conda deactivate` to deactivate the current environment * Install missing requirements mentioned abobe with `pip install >= ` ## Dataset * [``]( generates the required data files. * Base dataset with all news categories is stored in `news_data_test` in form of `.txt` files for every category. * Updated dataset with less categories is stored in `news_data_test2` in the same way. * These folders are not available in the repository anymore, to reduce storage size. They can be downloaded from [`here`]( on request. ## Training and validating the model: * To train the base model on all news classes , run [``]( If there is no saved model, it will train and automatically save the model. * To train the base model on reduced news classes , run [``]( If there is no saved model, it will train and automatically save the model. In the next run, it will automatically load the saved model instead of retraining. * Training and validation metrics are saved in [`values_base.txt`]( for the base model. * Training and and validation metrics are saved in [`values_new.txt`]( for the new model. * Plots for the base model are in [`data_figs_base`]( * Plots for the updated model are in [`data_figs_new`]( ## Visualizing the similarity algorithm: * Run [``]( to visualize the class overlap on the base dataset. tSNE values are generated from `` and will be stored in `tSNE_values.csv`. The results are available [`here`]( * Thresholds can be changed in the file to adapt the algorithm. ## Other files: * [``]( contains our transformer model defintion. * [``]( contains utility methods for dataset loading for the model, evaluation metric methods etc. ## Presentation slides: * [`slides_final`]( ## Written thesis: * [`thesis.pdf`]( ## Note : * To change the dataset from base to updated, you need to change the dataset source in `` from `news_data_test` to `news_data_test2` and vice-versa. * We have the preliminary implementation of our RNN model in `RNNMODEL`. This requires some pre-requisites to be installed which are not `Pypi` standard. To install those, we recommend you visit the original source [InferSent](


