fake_news_detection_system

所属分类:云计算
开发工具:Python
文件大小:70KB
下载次数:0
上传日期:2022-06-22 02:11:58
上 传 者sh-1993
说明:  系统使用Hadoop、Spark、NLP来抓取、存储、提取新闻,构建知识图。
(System using Hadoop, Spark, NLP to crawl, store, extracting news, to build Knowledge Graph.)

文件列表:
.idea (0, 2020-06-03)
.idea\inspectionProfiles (0, 2020-06-03)
.idea\inspectionProfiles\Project_Default.xml (1105, 2020-06-03)
.idea\inspectionProfiles\profiles_settings.xml (174, 2020-06-03)
.idea\misc.xml (192, 2020-06-03)
.idea\modules.xml (290, 2020-06-03)
.idea\nlp_data_processing.iml (395, 2020-06-03)
.idea\vcs.xml (180, 2020-06-03)
api (0, 2020-06-03)
api\descriptions.txt (106, 2020-06-03)
api_v2 (0, 2020-06-03)
api_v2\descriptions.txt (112, 2020-06-03)
chromedriver (0, 2020-06-03)
chromedriver\descriptions.txt (103, 2020-06-03)
cnn (0, 2020-06-03)
cnn\definition.txt (0, 2020-06-03)
dailymail (0, 2020-06-03)
dailymail\definition.txt (0, 2020-06-03)
kg (0, 2020-06-03)
kg\data (0, 2020-06-03)
kg\data\processor.py (4042, 2020-06-03)
kg\graph (0, 2020-06-03)
kg\graph\__pycache__ (0, 2020-06-03)
kg\graph\__pycache__\graph.cpython-36.pyc (4795, 2020-06-03)
kg\graph\graph.py (4835, 2020-06-03)
kg\model (0, 2020-06-03)
kg\model\__pycache__ (0, 2020-06-03)
kg\model\__pycache__\model.cpython-36.pyc (1041, 2020-06-03)
kg\model\model.py (390, 2020-06-03)
kg\run.py (1101, 2020-06-03)
kg\test_run.txt (1295, 2020-06-03)
local_runner (0, 2020-06-03)
local_runner\crawl_cnn.py (3794, 2020-06-03)
local_runner\crawl_dailymail.py (3024, 2020-06-03)
local_runner\crawl_foxnews.py (3297, 2020-06-03)
local_runner\nltk_wordnetLemmatizer.py (149, 2020-06-03)
... ...

# Fake_news_detection_using_knowledge_graph Python implementation of Fake news detection using knowledge graph ## Data Preparation from local ### Installation 1. Clone this repository. 2. Ensure packages are installed using "pip install -r requirements.txt". 3. Put "ChromeDriver" into "chromedriver" folder (you can download "ChromeDriver" from [ChromeDriver - WebDriver for Chrome](https://chromedriver.chromium.org/downloads)). 4. Etract "stanford-corenlp.zip" into "api_v2" folder (you can download "stanford-corenlp.zip" from [Stanford CoreNLP “ Natural language software](https://stanfordnlp.github.io/CoreNLP/index.html)). ### Dataset We use a dataset that is crawled from CNN, Dailymail and Foxnews (you can download our triples data from [our drive](https://drive.google.com/drive/folders/19YcTQUnNUyMUMxpQEXM-mIpITGr5Uf0x?fbclid=IwAR3A***hFoTVKBitTgwjpHDY1hYcNi8KD7mJCnEf2Lqr3w_y7RD8diChQ3Ck)). ### To crawl data ```shell # Crawl from CNN: (path to save raw data: /Fake_news_detection_using_knowledge_graph/cnn) local_runner/crawl_cnn.py # Crawl from Dailymail: (path to save raw data: /Fake_news_detection_using_knowledge_graph/dailymail) local_runner/crawl_dailymail.py #Crawl from Foxnews: ((path to save to raw data: /Fake_news_detection_using_knowledge_graph/news_fox) local_runner/crawl_foxnews.py Notice: processed data is saved in path_to/Fake_news_detection_using_knowledge_graph/processed_data/ ``` ### Extract triples Please replace "test1" in the "path" variable with folder name which leads to data ("processed_data/cnn"|"processed_data/dailymail"|"processed_data/news_fox"). ```shell # Running directly from the repository local_runner/triples_extraction.py Notice: triples is saved in path_to/Fake_news_detection_using_knowledge_graph/test1.txt In addition: Stanford CoreNLP ships with a built-in server, which requires only the CoreNLP dependencies. To run this server, simply run: # Run the server using all jars in the current directory (e.g., the CoreNLP home directory) java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000 ``` # fake_news_detection_system

近期下载者

相关文件


收藏者