LDA-HAN

所属分类:特征抽取
开发工具:Python
文件大小:487KB
下载次数:0
上传日期:2019-11-09 09:54:40
上 传 者sh-1993
说明:  用于新闻偏见检测的主题感知分层文档表示
(Topic-Aware Hierarchical Document Representation for News Biased Detection)

文件列表:
app (0, 2019-11-09)
app\__init__.py (0, 2019-11-09)
app\__pycache__ (0, 2019-11-09)
app\__pycache__\__init__.cpython-36.pyc (141, 2019-11-09)
app\__pycache__\app.cpython-36.pyc (1529, 2019-11-09)
app\app.py (1779, 2019-11-09)
app\templates (0, 2019-11-09)
app\templates\index.html (5806, 2019-11-09)
images (0, 2019-11-09)
images\Discursive.png (252255, 2019-11-09)
images\Tendentious.png (236814, 2019-11-09)
lda_han.py (10584, 2019-11-09)
main.py (469, 2019-11-09)
topic_gen.py (2406, 2019-11-09)
utils (0, 2019-11-09)
utils\__pycache__ (0, 2019-11-09)
utils\__pycache__\datahelper.cpython-36.pyc (4039, 2019-11-09)
utils\__pycache__\lda_gen.cpython-36.pyc (2739, 2019-11-09)
utils\__pycache__\load_embedding.cpython-36.pyc (1170, 2019-11-09)
utils\__pycache__\normalize.cpython-36.pyc (769, 2019-11-09)
utils\datahelper.py (4671, 2019-11-09)
utils\lda_gen.py (3704, 2019-11-09)
utils\load_embedding.py (1029, 2019-11-09)
utils\normalize.py (704, 2019-11-09)
visulization.py (127, 2019-11-09)

# Topic-Aware Hierarchical Document Representation for News Biased Detection This is Keras implementation of the Hierarchical Network with Attention architecture [(Yang et al, 2016)](http://www.cs.cmu.edu/~./hovy/papers/16HLT-hierarchical-attention-networks.pdf). Instead of using standard word embedding alone, this work applys a topic-aware word embedding which combines word vectors with transposed topic-word distribution and such distribution is a global weighting of the dimensions of the word-topic vector. Once the model got the document representation, it also combines with the document-topic distribution before feeding to softmax at the final prediction. ## Experiments All results are calculated by the mean of 10-fold cross validation five times on the [Hyperpartisan News Detection](https://pan.webis.de/semeval19/semeval19-web/) by-article training set data (***5 news articles in total). | Model | Accuracy | | --- | --- | | Transformer | 72.12% | | LDA-Transformer | 71.56% | | Kim-CNN | 72.95% | | LDA-Kim-CNN | **73.47%** | | RNN-Attention | 73.63% | | LDA-RNN-Attention | **73.75%** | | ESRC| 71.81% | | LDA-ESRC | **73.69%** | | HAN | 75.69% | | LDA-HAN | **76.52%** | ## Preparation / Requirements * Python 3.6 (Anaconda will work best) * Tensorflow version 1.13.0 * Keras version 2.2.4 * Gensim 3.8.0 * Spacy version 2.1.16 * flask 1.1.1 Preparation steps: 1. `mkdir checkpoints data embeddings history lda_stuff` for store trained models, training data set, glove & lda embeddings, training logs, gensim models respectively. 2. Convert original data xml file to tsv format. (see [this](https://github.com/GateNLP/semeval2019-hyperpartisan-bertha-von-suttner/tree/4b1d74b73247a06ed79e8e7af30923ce6828574a#training) for how it works). 3. `python ./utils/lda_gen.py -train True -H True -T 425` for generating lda topic embedding with 425 topics. 4. `python main.py -train True` for training the model. 5. `python visulization.py` for building the web application and visualize attention distribution and predictions. ## Example 1: ![alt text](https://github.com/yjiang123/LDA_HAN-for-News-Biased-Detection/blob/master/images/Discursive.png "Discursive News") ## Example 2: ![alt text](https://github.com/yjiang123/LDA_HAN-for-News-Biased-Detection/blob/master/images/Tendentious.png "Tendentious News")

近期下载者

相关文件


收藏者