Final-Project

所属分类:NumPy
开发工具:HTML
文件大小:156615KB
下载次数:0
上传日期:2021-04-03 19:03:00
上 传 者sh-1993
说明:  假新闻!Detector旨在找到一种从假新闻中剔除真实新闻的方法。我们创建了这个机器学习模型,从社交媒体上的假新闻中识别出真实的新闻。
(Fake News! Detector aims to find a way to weed out the real news from the fake news. We create this machine learning model to identify authentic news from the swam of fake news on social media. ,)

文件列表:
Code (0, 2021-04-04)
Code\Final-Visualization-Data-Cleaning-Alicia .ipynb (3628963, 2021-04-04)
Code\Medha (0, 2021-04-04)
Code\Medha\JupyterNotebook (0, 2021-04-04)
Code\Medha\JupyterNotebook\Categorizing.ipynb (34575, 2021-04-04)
Code\Medha\JupyterNotebook\combined_news_data.csv (711070, 2021-04-04)
Code\Medha\JupyterNotebook\combined_news_data2.csv (738723, 2021-04-04)
Code\Medha\JupyterNotebook\fakedata.csv (1073330, 2021-04-04)
Code\Medha\JupyterNotebook\fb1_test_data.csv (721294, 2021-04-04)
Code\Medha\JupyterNotebook\fb2_test_data.csv (750866, 2021-04-04)
Code\Medha\JupyterNotebook\final_categorized_data.csv (1412369, 2021-04-04)
Code\Medha\JupyterNotebook\new_test_data.csv (30696131, 2021-04-04)
Code\Medha\JupyterNotebook\realdata.csv (354709, 2021-04-04)
Code\Medha\JupyterNotebook\wordclouds.ipynb (311241, 2021-04-04)
Code\Medha\TSNE (0, 2021-04-04)
Code\Medha\TSNE\TSNE.ipynb (202409, 2021-04-04)
Code\Medha\TSNE\TSNE_fake.jpg (83069, 2021-04-04)
Code\Medha\TSNE\TSNE_true.jpg (79768, 2021-04-04)
Code\Medha\checking.jpg (51545, 2021-04-04)
Code\Medha\fakedata.csv (1073330, 2021-04-04)
Code\Medha\realdata.csv (354709, 2021-04-04)
Code\Medha\true_words.jpg (51417, 2021-04-04)
Code\Medha\wc_fake.json (68597, 2021-04-04)
Code\Medha\wc_true.json (31499, 2021-04-04)
Code\Medha\worldcloud_trial.py (4846, 2021-04-04)
Code\all_cleaned_news.html (1870585, 2021-04-04)
Data (0, 2021-04-04)
Data\Alicia (0, 2021-04-04)
Data\Alicia\fb1_test_data.csv (721294, 2021-04-04)
Data\Alicia\fb2_test_data.csv (750866, 2021-04-04)
Data\Alicia\final_categorized_data.csv (1414351, 2021-04-04)
Data\Alicia\final_cleaned_data (1414849, 2021-04-04)
Data\Alicia\final_cleaned_data.csv (1414849, 2021-04-04)
Data\Alicia\~$Combined-NewsData.xlsx (165, 2021-04-04)
Data\Alicia\~$Facebook-FoxNews-Final-Revision.xlsx (165, 2021-04-04)
Data\Medha (0, 2021-04-04)
Data\Medha\final_categorized_data.csv (1414351, 2021-04-04)
Data\Medha\news.csv (30696129, 2021-04-04)
... ...

# “Fake News!” Detector [Try out our Fake News Detector!](https://fake-news-detector-project.herokuapp.com/) ![Logo](Image/new_fake_logo.gif) ## Background and Motivation: In the age of social media, it has become more important than ever to be wary of what we read on all platforms. Social media has made it easier to connect with the world and has great influence on the masses as anyone can share anything. What we wanted to do was to find a way to weed out the real news from the fake news. And, more importantly, we wanted to find a way to measure the impact that fake news has on society as a whole. In doing so, we created a machine learning model that predicts if articles are "real" and "fake". With the model created, we ran through thousands of articles posted on Facebook from top news sources (BBC, Buzzfeed Politics, Conservative Post, CNN, Daily Mail, and Fox News). Read our [analysis](https://docs.google.com/document/d/1i3ppTAVp6_3ZUh6Dt7e_Zl4LL0Ynhm-JyUPqZeueA9c/edit?usp=sharing) here! ## Questions to Answer: * Which of these news sites contribute the most “Fake" news? * Which of these news sites contribute to the most "Real" news? * What are the top 5 most liked "Real"/"Fake" news articles? * What are the top 5 most commented "Real"/"Fake" news articles? * What are the top 5 most shared "Real"/"Fake" news articles? * What do the most engaged "Real"/"Fake" news articles have in common? * What are the most-used words in "Fake" news Vs. "Real" news? ## Technology * Machine Learning: sklearn (train_test_split, TfidfVectorizer, PassiveAggressiveClassifier, accuracy score, confusion_matrix, Pipeline, MultinomialNB) * Python/Pandas/Numpy * Flask (request/jsonify) * CORS * Newspaper (Article) * urllib * NLTK (punkt) * HTML/CSS/Bootstrap * Octoparse (webscraping tool) * Heroku ## Outline * Create machine learning model * Transform, Train & Test the Data * Analyze findings from Facebook Data * Visualizations * Website Creation * Create Flask app * Deployment to Heroku ## Data * [Training Data](https://www.kaggle.com/hassanamin/textdb3) * [BBC](https://www.bbc.com/) * [BuzzFeed Politics](https://www.buzzfeednews.com/section/politics) * [Conservative Post]() * [CNN](https://www.cnn.com/) * [Daily Mail](https://www.dailymail.co.uk/ushome/index.html) * [Facebook](https://www.facebook.com/) * [Fox News](https://www.foxnews.com/) ## Resources * https://newspaper.readthedocs.io/en/latest/ * https://pypi.org/project/Flask-Cors/ * https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.PassiveAggressiveClassifier.html * https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS * https://web.dev/cross-origin-resource-sharing/ * https://www.kite.com/python/docs/nltk.punkt * https://getbootstrap.com/docs/4.0/components/forms/ * https://getbootstrap.com/docs/4.0/components/buttons/ * https://medium.datadriveninvestor.com/using-python-to-detect-fake-news-7895101aebb8 * https://levelup.gitconnected.com/tweet-analysis-for-covid-19-fake-news-detection-bf7cbea5c12d * https://www.octoparse.com/signup?ref=homepage ### Data Analytics Team: * [Miguel](https://github.com/52Godfrey) - Data collection via webscraping, Data cleaning, Flask creation and Heroku Deployment * [Paola](https://github.com/paola1395) - Data manipulation, Machine learning model creation, Flask creation, Logo Creator * [Alicia](https://github.com/aliciasply) - Data cleaning, Data manipulation, Website creation and Visualizations * [Medha](https://github.com/medha795) - Data cleaning, Data manipulation, Visualizations

近期下载者

相关文件


收藏者