ExplainableNRS
所属分类:数值算法/人工智能
开发工具:Python
文件大小:0KB
下载次数:0
上传日期:2023-10-02 14:22:09
上 传 者:
sh-1993
说明: 在Pytork.上实现的新闻推荐框架。,
(News Recommendation Framework implemented on Pytorch.,)
文件列表:
analysis/ (0, 2023-11-15)
analysis/__init__.py (52, 2023-11-15)
analysis/coherence_scores_plot.py (12616, 2023-11-15)
analysis/convert_data.py (1717, 2023-11-15)
analysis/perform_plot.py (2157, 2023-11-15)
analysis/perform_stat.py (3968, 2023-11-15)
analysis/plot_all.py (3608, 2023-11-15)
analysis/simple_baselines.py (7727, 2023-11-15)
analysis/stack_df.py (1348, 2023-11-15)
analysis/to_latex.py (1017, 2023-11-15)
analysis/topic_coherence_plot.py (10897, 2023-11-15)
analysis/trash/ (0, 2023-11-15)
analysis/trash/__init__.py (0, 2023-11-15)
analysis/trash/perform_plot.py (5382, 2023-11-15)
config.yaml (260, 2023-11-15)
dataset/ (0, 2023-11-15)
dataset/utils/ (0, 2023-11-15)
dataset/utils/MIND/ (0, 2023-11-15)
dataset/utils/MIND/nid_small.json (1347788, 2023-11-15)
dataset/utils/MIND/uid_small.json (1952987, 2023-11-15)
dataset/utils/embed_dict/ (0, 2023-11-15)
dataset/utils/embed_dict/MIND_40910.npy (98184128, 2023-11-15)
dataset/utils/word_dict/ (0, 2023-11-15)
dataset/utils/word_dict/MIND_40910.json (892634, 2023-11-15)
dataset/utils/word_dict/post_process/ (0, 2023-11-15)
dataset/utils/word_dict/post_process/PP10.json (713037, 2023-11-15)
dataset/utils/word_dict/post_process/PP100.json (372692, 2023-11-15)
dataset/utils/word_dict/post_process/PP30.json (578984, 2023-11-15)
dataset/utils/word_dict/post_process/PP60.json (464909, 2023-11-15)
modules/ (0, 2023-11-15)
modules/__init__.py (0, 2023-11-15)
modules/base/ (0, 2023-11-15)
modules/base/__init__.py (0, 2023-11-15)
modules/base/base_model.py (838, 2023-11-15)
modules/base/base_trainer.py (8767, 2023-11-15)
modules/base/nc_dataset.py (3161, 2023-11-15)
modules/case_study.py (13222, 2023-11-15)
modules/commom/ (0, 2023-11-15)
modules/commom/__init__.py (48, 2023-11-15)
... ...
# Explainable News Recommender System
## Introduction
This repository contains the code for the papers [A Novel Perspective to Look At Attention: Bi-level Attention-based
Explainable Topic Modeling for News Classification](https://arxiv.org/pdf/2203.07216.pdf)
and [Topic-Centric Explanations for News Recommendation](https://arxiv.org/pdf/2306.07506.pdf).
The implementation is based on Pytorch and includes _news classification_ and _news recommendation_ tasks. We also
provide a procedure to
generate _explainable topic_ for both news classification task and news recommender system.
## Usage
Clone the repository and install the dependencies.
```bash
git clone https://github.com/Ruixinhua/ExplainableNRS
cd ExplainableNRS
pip install -r requirements.txt
```
Download the MIND dataset from [here](https://msnews.github.io/). The dataset is licensed under the
[Microsoft Research License Terms](https://go.microsoft.com/fwlink/?LinkID=206977). We suggest to use the following
commands to download the processed dataset and pre-trained GloVe word embedding.
```bash
cd dataset
# Download GloVe pre-trained word embedding and preprocessed MIND dataset
wget https://nlp.stanford.edu/data/glove.840B.300d.zip
apt install unzip
unzip glove.840B.300d.zip -d glove
rm glove.840B.300d.zip
mkdir MIND && cd MIND
gdown https://drive.google.com/uc?id=1JUq6UzeGVYifpyupjOD11aKZjaG0Elur
unzip small.zip -d small
rm small.zip
cd ../utils
gdown https://drive.google.com/uc?id=1bC4WgcVrDOAmjVu2o2jR-aETGqbLbveI
gdown https://drive.google.com/uc?id=15vV-dBLXnleRvT_011VDGizmSjk6o-yF
```
To run the code for training the basic BATMRS model and evaluating its performance, follow these commands:
```bash
cd ../../ # back to the root directory
export PYTHONPATH=PYTHONPATH:./:./modules # set current directory and the module directory as PYTHONPATH
accelerate launch modules/experiment/runner/run_baseline.py --task=RS_BATM --arch_type=BATMRSModel --subset_type=small --news_info=use_all --news_lengths=100 --word_dict_file=MIND_40910.json --ref_data_path=dataset/utils/ref.dtm.npz --topic_evaluation_method=fast_eval,w2v_sim
# use default configuration of accelerate to launch the training script; add "--config_file config.yaml" after launch to use the configuration file
# check [accelerate documentation](https://huggingface.co/docs/accelerate/) for more details
```
## Evaluation
The performance of baselines are summarized in the following table. The results can be obtained by running the code with
the default configuration:
![Baselines performance](./plots/baselines_performance.png)
Recommendation performance and Topic quality evaluation results are shown in the following table with
different configurations of the BATM-ATT model in a NR task:
| #Topic | variant | test\_group\_auc | test\_mean\_mrr | test\_ndcg\_5 | test\_ndcg\_10 |
|:-------|:--------|:-----------------|:----------------|:--------------|:---------------|
| 10 | base | 67.43±0.21 | 32.12±0.34 | 35.78±0.32 | 42.04±0.27 |
| 30 | base | 67.51±0.14 | 32.16±0.22 | 35.88±0.21 | 42.10±0.18 |
| 50 | base | 67.65±0.22 | 32.36±0.15 | 36.04±0.21 | 42.26±0.19 |
| 70 | base | 67.56±0.17 | 32.16±0.31 | 35.85±0.33 | 42.10±0.29 |
| 100 | base | 67.47±0.19 | 32.14±0.26 | 35.90±0.26 | 42.01±0.25 |
| 150 | base | 67.38±0.39 | 32.24±0.30 | 35.88±0.37 | 42.08±0.34 |
| 200 | base | 67.10±0.37 | 32.01±0.44 | 35.56±0.54 | 41.85±0.47 |
| 300 | base | 66.97±0.39 | 31.94±0.28 | 35.42±0.33 | 41.64±0.34 |
| 500 | base | 67.22±0.37 | 32.15±0.25 | 35.71±0.30 | 41.94±0.22 |
| #Topic | variant | original\_c\_npmi | PP60\_c\_npmi | original\_w2v\_sim | PP60\_w2v\_sim |
|:-------|:--------|:------------------|:------------------|:-------------------|:------------------|
| 10 | base | 0.0852±0.0075 | 0.1179±0.0221 | 0.2579±0.0184 | 0.2743±0.0218 |
| 30 | base | 0.0735±0.0273 | 0.1039±0.025 | 0.2360±0.0268 | 0.2526±0.0272 |
| 50 | base | 0.0796±0.0162 | 0.1116±0.0161 | 0.2444±0.0125 | 0.2589±0.0183 |
| 70 | base | 0.0838±0.0261 | 0.1114±0.0311 | 0.2427±0.0259 | 0.2560±0.0324 |
| 100 | base | 0.0681±0.0059 | 0.1002±0.0073 | 0.2199±0.0057 | 0.2364±0.0095 |
| 150 | base | 0.0872±0.0126 | 0.1172±0.0146 | 0.2536±0.0083 | 0.2659±0.0117 |
| 200 | base | 0.0988±0.0101 | 0.1248±0.0110 | 0.2778±0.0066 | 0.2874±0.0079 |
| 300 | base | 0.1044±0.0085 | 0.1270±0.0101 | 0.2978±0.0048 | 0.2989±0.0062 |
| 500 | base | **0.1140±0.0054** | **0.1382±0.0057** | **0.3035±0.0078** | **0.3108±0.0064** |
BATM-ATT model with variational inference:
| #Topic | variant | test\_group\_auc | test\_mean\_mrr | test\_ndcg\_5 | test\_ndcg\_10 |
|:-------|:------------|:-----------------|:----------------|:---------------|:---------------|
| 30 | variational | **67.63±0.09** | **32.44±0.06** | **36.14±0.03** | **42.31±0.02** |
| 50 | variational | 67.51±0.24 | 32.16±0.07 | 35.89±0.22 | 42.06±0.13 |
| 70 | variational | 67.16±0.36 | 31.79±0.33 | 35.46±0.37 | 41.71±0.33 |
| 100 | variational | 66.35±0.30 | 31.34±0.29 | 34.83±0.37 | 41.07±0.32 |
| #Topic | variant | original\_c\_npmi | PP60\_c\_npmi | original\_w2v\_sim | PP60\_w2v\_sim |
|:-------|:------------|:------------------|:--------------|:-------------------|:---------------|
| 30 | variational | 0.0624±0.0157 | 0.0917±0.0148 | 0.2274±0.0132 | 0.2412±0.0208 |
| 50 | variational | 0.0886±0.0284 | 0.1178±0.0358 | 0.2549±0.0231 | 0.2648±0.0342 |
| 70 | variational | 0.1022±0.0025 | 0.1392±0.0072 | 0.2675±0.0023 | 0.2903±0.0047 |
| 100 | variational | 0.1022±0.0092 | 0.1271±0.0018 | 0.2748±0.0181 | 0.2805±0.0114 |
BATM-ATT model with entropy regularization:
| #Topic | variant | test\_group\_auc | test\_mean\_mrr | test\_ndcg\_5 | test\_ndcg\_10 |
|:-------|:-------------------------|:-----------------|:----------------|:--------------|:---------------|
| 30 | with-entropy-static-1e-5 | 67.50±0.21 | 32.01±0.24 | 35.65±0.27 | 41.81±0.26 |
| 50 | with-entropy-static-1e-5 | 67.21±0.18 | 32.09±0.28 | 35.65±0.26 | 41.80±0.28 |
| 70 | with-entropy-static-1e-5 | 67.18±0.25 | 32.32±0.13 | 35.80±0.22 | 41.92±0.20 |
| #Topic | variant | original\_c\_npmi | PP60\_c\_npmi | original\_w2v\_sim | PP60\_w2v\_sim |
|:-------|:-------------------------|:------------------|:--------------|:-------------------|:---------------|
| 30 | with-entropy-static-1e-3 | 0.0800±0.0228 | 0.1027±0.0198 | 0.3411±0.0262 | 0.3555±0.0368 |
| 50 | with-entropy-static-1e-3 | 0.0913±0.0100 | 0.1130±0.0051 | 0.3379±0.0211 | 0.3348±0.0281 |
| 70 | with-entropy-static-1e-3 | 0.0997±0.0052 | 0.1149±0.0043 | 0.3310±0.0143 | 0.3213±0.0141 |
| #Topic | variant | test\_group\_auc | test\_mean\_mrr | test\_ndcg\_5 | test\_ndcg\_10 |
|:-------|:--------------------------|:-----------------|:----------------|:--------------|:---------------|
| 30 | with-entropy-dynamic-1e-3 | 67.62±0.21 | 31.96±0.29 | 35.72±0.36 | 41.87±0.26 |
| 50 | with-entropy-dynamic-1e-3 | 67.61±0.14 | 32.06±0.30 | 35.83±0.30 | 41.91±0.27 |
| 70 | with-entropy-dynamic-1e-3 | 67.62±0.12 | 32.12±0.08 | 35.86±0.20 | 42.03±0.10 |
| 100 | with-entropy-dynamic-1e-3 | 67.55±0.18 | 32.22±0.23 | 35.90±0.28 | 42.13±0.21 |
| 150 | with-entropy-dynamic-1e-3 | 67.41±0.24 | 32.20±0.40 | 35.87±0.40 | 42.01±0.35 |
| 200 | with-entropy-dynamic-1e-3 | 67.29±0.04 | 32.16±0.17 | 35.78±0.17 | 41.86±0.13 |
| #Topic | variant | original\_c\_npmi | PP60\_c\_npmi | original\_w2v\_sim | PP60\_w2v\_sim |
|:-------|:--------------------------|:------------------|:--------------|:-------------------|:---------------|
| 30 | with-entropy-dynamic-1e-3 | 0.0634±0.0164 | 0.0875±0.0167 | 0.2393±0.0194 | 0.2501±0.0161 |
| 50 | with-entropy-dynamic-1e-3 | 0.0581±0.0072 | 0.0881±0.0028 | 0.2186±0.0118 | 0.2255±0.0143 |
| 70 | with-entropy-dynamic-1e-3 | 0.0625±0.0070 | 0.0917±0.0114 | 0.2258±0.0095 | 0.2392±0.0199 |
| 100 | with-entropy-dynamic-1e-3 | 0.0943±0.0140 | 0.1201±0.0121 | 0.2580±0.0044 | 0.2671±0.0045 |
| 150 | with-entropy-dynamic-1e-3 | 0.0962±0.0039 | 0.1212±0.0067 | 0.2903±0.0250 | 0.2946±0.0151 |
| 200 | with-entropy-dynamic-1e-3 | 0.1054±0.0095 | 0.1217±0.0135 | 0.3068±0.0065 | 0.3000±0.0059 |
## Credits
- Dataset by **MI**crosoft **N**ews **D**ataset (MIND), see .
- Reference news recommendation repository by _yusanshi_, see
近期下载者:
相关文件:
收藏者: