gpt2-fake-news

所属分类:GPT/ChatGPT
开发工具:Jupyter Notebook
文件大小:872KB
下载次数:0
上传日期:2021-01-25 14:11:16
上 传 者sh-1993
说明:  检测人工智能生成的假新闻
(Detecting AI-generated fake news ,)

文件列表:
Big_Data_Project.ipynb (1381534, 2021-01-25)

# gpt2-fake-news This was the final project for my master's course, Big Data Analytics. I completed this project with a classmate, Mrigank Saksena. I recommend openining the .ipynb file with Colab for better formatting. **Our aim for this project was to better understand how well current machine learning tools can detect misinformation generated by sophisticated AI, and to observe it's compared performance on human-authored fake news.** In other words, how well does AI stack up against humans in creating misinformation? Our data consists of 3 groups: Real news, human-authored fake news, and AI-generated fake news. The 2 human-authored groups (real and human-authored fake) came from a publicly available Kaggle dataset (https://www.kaggle.com/clmentbisaillon/fake-and-real-news-dataset/). The AI text consisted of outputs generated by GPT-2, a sophisticated AI developed by one of the world's leading AI labs, OpenAI (https://github.com/openai/gpt-2-output-dataset/). To make our groups as difficult to distinguish from one another as possible, we employed a number of pre-processing methods to control for unwanted differences between groups (such as article length, news content, average words per sentence). After pre-processing, we used a variety of ML models to classify our text data into predicted groups. Our models acheived 90-93% accuracy for all comparisons we tested on. AI-generated fake news proved to be as difficult to detect as the human-authored counterpart. **See notebook for more details and further analysis.** # **Introduction** The spread of online misinformation poses a serious threat to democracies in the 21st century. It erodes trust in public institutions and increases political polarization, weakening the foundation that democratic systems are built upon. Previous research has shown that machine-learning classifiers can be useful for identifying misinformation, but this work is largely based on fake news written by humans. Although it's safe to assume that the vast majority of the web's fake news has been authored by humans, this will likely change. As machine-learning technologies advance, particularly in the sub-field of *Natural Language Generation* (NLG), it will become increasingly cheap and easy to generate and spread AI-authored misinformation at scale. We suspect this will result in an influx of AI-generated fake news on the web. With ML algorithms becoming a predominant tool for mitigating the spread of misinformation, it's critical that we understand how well they perform against AI-authored text. **This is the aim of our project -- to better understand how well current ML tools detect misinformation generated by sophisticated AI, and to observe it's compared performance on human-authored fake news.** In other words, how well does AI stack up against humans in creating misinformation? In February 2019, one the of the world's leading AI labs, OpenAI, released a generative text language model known as GPT-2. By priming the model with a few words or phrases such as an article headline, GPT-2 is able to generate coherent paragraphs of text based off the input with high robustness. At the time of release, GPT-2 was the most sophisticated model of its kind (it has since been surpassed by its successor GPT-3). To test how well ML classifiers can detect AI-generated fake news compared to human-authored articles, we used a dataset of GPT-2's text outputs published by OpenAI on Github (Note: We would have used GPT-3, but there are restrictions on accessing the API). For samples of human-authored real and fake news we used articles belonging to a dataset on Kaggle. The human-authored articles were individually labeled as Fake or Real based on fact-checking conducted by Poltifact.com.

近期下载者

相关文件


收藏者