fakenews
所属分类:内容生成
开发工具:Jupyter Notebook
文件大小:4597KB
下载次数:0
上传日期:2019-06-18 16:24:40
上 传 者:
sh-1993
说明: 基于深度学习的假新闻检测
(Fake news detection using Deep Learning)
文件列表:
notebooks (0, 2019-06-19)
notebooks\BayesianOpt.ipynb (476544, 2019-06-19)
notebooks\FakeNewsCorpus.ipynb (126542, 2019-06-19)
notebooks\GettingRealAboutFake.ipynb (59774, 2019-06-19)
notebooks\Processing_test_dataset.ipynb (59595, 2019-06-19)
notebooks\TI-CNN-Dataset.ipynb (61989, 2019-06-19)
notebooks\Test_Colab_Categorical.ipynb (49067, 2019-06-19)
notebooks\Train-Colab-Categorical.ipynb (291582, 2019-06-19)
notebooks\Train_Colab_Binary.ipynb (268786, 2019-06-19)
notebooks\data_analysis (0, 2019-06-19)
notebooks\data_analysis\Data_analysis-FNC.ipynb (2274781, 2019-06-19)
notebooks\data_analysis\Data_analysis-GettingReal.ipynb (1961941, 2019-06-19)
notebooks\data_analysis\Data_analysis-TI-CNN.ipynb (1234855, 2019-06-19)
src (0, 2019-06-19)
src\bert_class (0, 2019-06-19)
src\bert_class\bert (0, 2019-06-19)
src\bert_class\bert_class.py (13201, 2019-06-19)
src\data (0, 2019-06-19)
src\data\extractor.py (1750, 2019-06-19)
src\data\subsample.py (1698, 2019-06-19)
# Fake news detection using deep learning
## Final master thesis project
This repository is focused on finding fake news using deep learning
There are multiple methods focused on achieving this goal, but the objective
of this work is discriminating the fake ones by only looking at the text. No graphs,
no social network analysis neither images.
In this work three deep learning architectures are proposed and later tested over two datasets (Fake news corpus and TI-CNN), obtaining state of the art results.
1. **LSTM Based architecture**: $91\%$ accuracy (TI-CNN) || $76\%$ accuracy (FNC)
2. **CNN Based architecture**: $97\%$ accuracy (TI-CNN) || $82\%$ accuracy (FNC)
3. **BERT Based architecture**: $97\%$ accuracy (TI-CNN) || $76\%$ accuracy (FNC)
This repository contains several Python notebooks with the developed code
### Data sources
* Fake News Corpus: https://github.com/several27/FakeNewsCorpus
* Getting Real About Fake News: https://www.kaggle.com/mrisdal/fake-news
* Fake News Detection: https://www.kaggle.com/jruvika/fake-news-detection
* News Dataset from TI-CNN: https://arxiv.org/abs/1806.00749
### Folder structure
* **data**: This directory must be created with the necessary data for scripts to work.
(Not uploaded to GH due to filesize restrictions).
- GoogleNews-vectors-negative300.bin.gz: Word2Vec news trained model weights
- Other_datasets
- GettingRealAboutFake/
- ```fake.csv``` (*Getting Real About Fake News Dataset*)
- ``all_data.csv`` (*TI-CNN dataset*)
- ``real_or_fake.csv``
- `FakeNewsCorpus.csv` (*Fake News Corpus*)
* **notebooks**: Notebooks for prototyping
* **src**: Code with utils
* **data**: Code to generate datasets / process data
* **bert_class**: Fine-tuned classifier built over Google's BERT to detect fake/true news.
### Notebook explanation
* `FakeNewsCorpus.ipynb:` Cleaning and preprocessing the dataset 'Fake News Corpus'.
* `GettingRealAboutFake.ipynb:` Cleaning and preprocessing the dataset 'Getting Real
About Fake News' from Kaggle.
* `TI_CNN-Dataset:` Cleaning and preprocessing the tadaset 'TI-CNN'.
* `Processing_test_dataset.ipynb:` Cleaning and preprocessing the dataset 'True or Fake' from Kaggle.
* `BayesianOpt.pynb:**` Obtaining model hyperparameters using Bayesian Optimization
* `Train-Colab-Categorical.ipynb:` Train DNN to categorize 4 types of news.
* `Train\_Colab_Binary.ipynb:` Train DNN to categorize only **True** or **Fake**
classes.
* `Test\_Colab_Categorical.ipynb:` Testing the previously trained categorical models on TI-CNN.
* `Test\_Colab_Binary.ipynb:` Testing the previously trained binary models on FNC.
* `data\_analysis/Data_analysis-FNC.ipynb:` EDA of the Fake News Corpus.
* `data\_analysis/Data_analysis-TI-CNN.ipynb:` EDA of the TI-CNN Dataset
* `data\_analysis/Data_analysis-Getting-Real.ipynb:` EDA of the Getting Real About
fake news Dataset
近期下载者:
相关文件:
收藏者: