LDA-HAN
所属分类:特征抽取
开发工具:Python
文件大小:487KB
下载次数:0
上传日期:2019-11-09 09:54:40
上 传 者:
sh-1993
说明: 用于新闻偏见检测的主题感知分层文档表示
(Topic-Aware Hierarchical Document Representation for News Biased Detection)
文件列表:
app (0, 2019-11-09)
app\__init__.py (0, 2019-11-09)
app\__pycache__ (0, 2019-11-09)
app\__pycache__\__init__.cpython-36.pyc (141, 2019-11-09)
app\__pycache__\app.cpython-36.pyc (1529, 2019-11-09)
app\app.py (1779, 2019-11-09)
app\templates (0, 2019-11-09)
app\templates\index.html (5806, 2019-11-09)
images (0, 2019-11-09)
images\Discursive.png (252255, 2019-11-09)
images\Tendentious.png (236814, 2019-11-09)
lda_han.py (10584, 2019-11-09)
main.py (469, 2019-11-09)
topic_gen.py (2406, 2019-11-09)
utils (0, 2019-11-09)
utils\__pycache__ (0, 2019-11-09)
utils\__pycache__\datahelper.cpython-36.pyc (4039, 2019-11-09)
utils\__pycache__\lda_gen.cpython-36.pyc (2739, 2019-11-09)
utils\__pycache__\load_embedding.cpython-36.pyc (1170, 2019-11-09)
utils\__pycache__\normalize.cpython-36.pyc (769, 2019-11-09)
utils\datahelper.py (4671, 2019-11-09)
utils\lda_gen.py (3704, 2019-11-09)
utils\load_embedding.py (1029, 2019-11-09)
utils\normalize.py (704, 2019-11-09)
visulization.py (127, 2019-11-09)
# Topic-Aware Hierarchical Document Representation for News Biased Detection
This is Keras implementation of the Hierarchical Network with Attention architecture [(Yang et al, 2016)](http://www.cs.cmu.edu/~./hovy/papers/16HLT-hierarchical-attention-networks.pdf).
Instead of using standard word embedding alone, this work applys a topic-aware word embedding which combines word vectors with transposed topic-word distribution and such distribution is a global weighting of the dimensions of the word-topic vector. Once the model got the document representation, it also combines with the document-topic distribution before feeding to softmax at the final prediction.
## Experiments
All results are calculated by the mean of 10-fold cross validation five times on the [Hyperpartisan News Detection](https://pan.webis.de/semeval19/semeval19-web/) by-article training set data (***5 news articles in total).
| Model | Accuracy |
| --- | --- |
| Transformer | 72.12% |
| LDA-Transformer | 71.56% |
| Kim-CNN | 72.95% |
| LDA-Kim-CNN | **73.47%** |
| RNN-Attention | 73.63% |
| LDA-RNN-Attention | **73.75%** |
| ESRC| 71.81% |
| LDA-ESRC | **73.69%** |
| HAN | 75.69% |
| LDA-HAN | **76.52%** |
## Preparation / Requirements
* Python 3.6 (Anaconda will work best)
* Tensorflow version 1.13.0
* Keras version 2.2.4
* Gensim 3.8.0
* Spacy version 2.1.16
* flask 1.1.1
Preparation steps:
1. `mkdir checkpoints data embeddings history lda_stuff` for store trained models, training data set, glove & lda embeddings, training logs, gensim models respectively.
2. Convert original data xml file to tsv format. (see [this](https://github.com/GateNLP/semeval2019-hyperpartisan-bertha-von-suttner/tree/4b1d74b73247a06ed79e8e7af30923ce6828574a#training) for how it works).
3. `python ./utils/lda_gen.py -train True -H True -T 425` for generating lda topic embedding with 425 topics.
4. `python main.py -train True` for training the model.
5. `python visulization.py` for building the web application and visualize attention distribution and predictions.
## Example 1:
![alt text](https://github.com/yjiang123/LDA_HAN-for-News-Biased-Detection/blob/master/images/Discursive.png "Discursive News")
## Example 2:
![alt text](https://github.com/yjiang123/LDA_HAN-for-News-Biased-Detection/blob/master/images/Tendentious.png "Tendentious News")
近期下载者:
相关文件:
收藏者: