India-WhatsAppFakeNews-Dataset

所属分类:视频/语音聊天
开发工具:Python
文件大小:0KB
下载次数:0
上传日期:2019-04-23 16:01:41
上 传 者sh-1993
说明:  WhatsApps在那段时间内与死亡相关的新闻文章以及印度各地的其他文章
(WhatsApps related deaths News Articles along with other articles across India during that period)

文件列表:
LICENSE (18092, 2019-04-23)
archivelist_finder.py (2875, 2019-04-23)
cite.bib (278, 2019-04-23)
data.csv (63458582, 2019-04-23)
extract_csv_data.py (2758, 2019-04-23)
webscrapper/ (0, 2019-04-23)
webscrapper/__init__.py (0, 2019-04-23)
webscrapper/items.py (291, 2019-04-23)
webscrapper/middlewares.py (3607, 2019-04-23)
webscrapper/pipelines.py (292, 2019-04-23)
webscrapper/scrapy.cfg (265, 2019-04-23)
webscrapper/settings.py (3127, 2019-04-23)
webscrapper/spiders/ (0, 2019-04-23)
webscrapper/spiders/TOI.py (459, 2019-04-23)
webscrapper/spiders/__init__.py (161, 2019-04-23)
webscrapper/spiders/extract.py (1351, 2019-04-23)

# India WhatsApp Fake News Dataset [![DOI](https://zenodo.org/badge/157810232.svg)](https://zenodo.org/badge/latestdoi/157810232) The following repository consists of about !million+ News Articles scraped from the Times of India website, from late 2017 to June 2018. The data was then checked for keyowrds which could point us to news articles which covered stories about WhatsApp fake news cases in India, which was a growing concern at that time. To be clear, this is **not a dataset for fake articles** ### Details - The file `Data.csv` has the following files, with the date, place, and the keywords mentioned. - The `webscrapper` consists of a `scrapy` spider which can be used to extract files from the news site - The files `archivelist_finder.py` and `extract_csv_data.py` can be used for reference in the process ### Labelling Data and Insights After textfiles were preprocessed, keywords were found in the following dataset, which were selected to find news articles which had a good probabilty of being articles **about** Fake News. These were then crosschecked to see if the stories did correspond to them. The following data was used by the BBC in order to help generate useful insights about Fake News trends in India, what type of fake news cases were being spread and how they were being spread. The complete file containing all articles in the form of `.txt` files can be found at the following link. https://drive.google.com/file/d/19IbOlTO18BAXYRQoVkWfQ6paad4v9sfB/view?usp=sharing ### Interesting things which could be done with this dataset A few helper ideas in case you were wondering, how this dataset can be useful. 1. Understanding fake news trends (the initial idea of this project). Could probably be extended to identifying trends and interesting facts, while trying to understand contemporary articles to them 2. Understanding news headlines. If news headlines to these articles definitely portray the matter they have. 3. Is there a particular trend for newspaper articles? News paper websites have been known to put click bait articles to help increase CTR for their website, but however contain matter which is questionable. 4. Understanding quality of news articles. What makes a good news artcile? These are some few good ideas, which I believe have a large potential in the field of Data journalism. Do feel free to open an issue, regarding any query related to this. https://zenodo.org/badge/latestdoi/157810232 Please do cite this repository if you happen to use this dataset in your research. :smiley:

近期下载者

相关文件


收藏者