Fake-News-Detection
所属分类:其他
开发工具:Jupyter Notebook
文件大小:0KB
下载次数:0
上传日期:2023-12-21 20:31:05
上 传 者:
sh-1993
说明: 假新闻检测
(Fake News Detection)
# Fake-News-Detection
Domain-Specific Fake News Detection
## Installation Instructions
1. Install latest version of CUDA if running on GPU (RECOMMENDED):
https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html
2. Install latest version of Python:
https://www.python.org/downloads/
3. Install latest version of Python pip
``curl https://bootst/rap.pypa.io/get-pip.py -o get-pip.py``
``python get-pip.py``
4. Install jupyter notebook
``pip install jupyter``
5. Clone repo locally
``git clone https://github.com/howardt12345/Fake-News-Detection.git``
### To Run Data Pre-Processing ###
WIP
### To Run Classification ###
WIP
### To Run Detection ###
This will train and test models
1. Navigate to ``Fake-News-Detection/detection/run_detection.ipynb``
2. Change parameters according to anticipated results
3. Run ``run_detection.ipynb`` and wait for completion
4. View log in category folder to see results, if I run ``politics`` models, I navigate to ``Fake-News-Detection/results/politics`` and find the log file ``training_log_DATE_TIME.log``
To analyze results
Note: there must be a valid log file to read from
1. Navigate to ``Fake-News-Detection/detection/result_analysis.ipynb``
2. Run the notebook and view cell outputs
## Proposed Architecture
The fake news detection will have the following stages:
1. Categorization stage: Categorize the input and pass the input to the model trained for the specific domain
- https://towardsdatascience.com/topics-per-class-using-bertopic-252314f2640
3. Detection stage: With models trained to detect fake news within a specific domain, detect whether the input is fake news or not
## Potential Datasets
- Politics: (both)
- [x] LIAR Dataset: Contains statements made by political figures, labeled by their truthfulness. (https://huggingface.co/datasets/liar)
- [x] ISOT Dataset (multipurpose dataset) (https://onlineacademiccommunity.uvic.ca/isot/2022/11/27/fake-news-detection-datasets/)
- [x] Fake News Dataset (https://www.kaggle.com/competitions/fake-news/data)
- [x] PolitiFact Dataset: Includes fact-checks of political statements. (https://www.kaggle.com/datasets/rmisra/politifact-fact-check-dataset) (https://github.com/KaiDMML/FakeNewsNet)
- Healthcare and Medicine: (one is enough - choose the larger)
- [ ] HealthMisinfo Dataset: Focuses on COVID-19 misinformation in healthcare. (https://trec-health-misinfo.github.io/)
- [ ] FakeHealth Dataset: Contains fake health-related articles. (https://github.com/EnyanDai/FakeHealth)
- Science and Technology: (one is enough - choose the larger)
- [ ] BuzzFeedNews Dataset: Includes news articles from BuzzFeed News. (https://www.kaggle.com/code/sohamohajeri/buzzfeed-news-analysis-and-classification)
- [ ] FakeTech: Contains technology-related fake news.
- Finance and Stock Markets: (one is enough - choose the larger)
- [ ] Stock Market Fake News Dataset: Focuses on news related to stock markets.
- [ ] Financial News Dataset: Contains financial news articles.
- Entertainment and Celebrity News: (one is enough - choose the larger)
- [ ] CelebA Dataset: Contains celebrity images (for image-based fake news). (https://mmlab.ie.cuhk.edu.hk/projects/CelebA.html)
- [ ] Entertainment News Dataset: Focuses on entertainment news.
- Sports: (one is enough - choose the larger)
- [ ] SportFake Dataset: Contains sports-related fake news articles.
- [ ] Sports News Dataset: Contains sports news articles.
- Environment and Climate Change: (one is enough - choose the larger)
- [ ] Climate Change Misinformation Dataset: Focuses on climate change misinformation. (https://huggingface.co/datasets/climate_fever/viewer/default/test?row=0)
- [ ] Environmental News Dataset: Contains news related to the environment.
- Education:
- [ ] Education Misinformation Dataset: Focuses on educational misinformation.
- Crime and Security: (FA-KES is important)
- [x] Crime News Dataset: Contains news articles related to crime.
- [x] FA-KES (https://zenodo.org/record/2607278)
- Travel and Tourism:
- [ ] Travel Misinformation Dataset: Focuses on travel-related misinformation.
- Religion and Spirituality:
- [ ] Religion Misinformation Dataset: Contains religious misinformation. (https://data.mendeley.com/datasets/5ykks3psks/5)
- Transportation:
- [ ] Transportation Misinformation Dataset: Focuses on transportation-related misinformation.
- Social Media and Online Discussions:
- [x] PHEME (https://figshare.com/articles/dataset/PHEME_rumour_scheme_dataset_journalism_use_case/2068650)
- [x] GossipCop (https://github.com/KaiDMML/FakeNewsNet)
- [x] Fake News Detection (https://www.kaggle.com/jruvika/fake-news-detection/version/1)
近期下载者:
相关文件:
收藏者: