S745-DataLakeEngineering-FakeNewDetection_Project

所属分类:云计算
开发工具:Jupyter Notebook
文件大小:5363KB
下载次数:0
上传日期:2023-05-01 03:32:54
上 传 者sh-1993
说明:  假新闻检测模型提供了在AWS云环境上训练和部署深度学习项目的步骤...
(The Fake news detection model provides steps to train and deploy the deep learning project over AWS Cloud environment. The code is in python. Execution on AWS Sagemaker and Quicksight. Web application deployed over Flask. Tweets are collected using Nifi)

文件列表:
Nifi_Kafka (0, 2023-05-01)
Nifi_Kafka\NiFi_and_Kafka_steps.docx (31189, 2023-05-01)
Nifi_Kafka\docker_nifi.txt (706, 2023-05-01)
Nifi_Kafka\emr_start_commands.txt (1596, 2023-05-01)
Nifi_Kafka\nifi-files (0, 2023-05-01)
Nifi_Kafka\nifi-files\nifi-social-media-nar-1.9.2.nar (3071000, 2023-05-01)
Nifi_Kafka\nifi-files\start-nifi.sh (281, 2023-05-01)
Nifi_Kafka\template.json (5614, 2023-05-01)
SEIS-745_DLE_Project_ImageClassification.ipynb (7003, 2023-05-01)
SEIS-745_DLE_Project_SemanticAnalysis.ipynb (1242158, 2023-05-01)
SEIS-745_DLE_Project_TextClassification.ipynb (5743, 2023-05-01)
flask_server (0, 2023-05-01)
flask_server\main-app.py (1091, 2023-05-01)
flask_server\requirements.txt (206, 2023-05-01)
flask_server\rnn_model_fake_real_text_classification.hdf5 (818848, 2023-05-01)
flask_server\templates (0, 2023-05-01)
flask_server\templates\index.html (1056, 2023-05-01)
rnn_model_fake_real_text_classification.hdf5 (818848, 2023-05-01)

# Fake News Detection model deployment over AWS Cloud Fake news detection project is taken to complete a part of course - Data Lake Engineering The project provides step by step documentation and code to train and deploy the machine learning model over AWS Cloud environment. The code is in python. Model training through AWS Sagemaker Notebook. Visualization through Tableau. Web application for user interface is deployed over Flask. All the required code files are provided. Datasets for initial training:
Fake images dataset – CASIA 2.0 https://www.kaggle.com/datasets/divg07/casia-20-image-tampering-detection-dataset
Fake news/images dataset – MediaEval 2016 http://www.multimediaeval.org/datasets/
Sarcasm text – SARC https://metatext.io/datasets/self-annotated-reddit-corpus-(sarc)
### STEPS: 1. Create an EC2 instance on AWS - Provide proper Inbound rules for SSH, http and https ports
2. Create an S3 bucket
3. Upload all data files (tweets files, image files ) on this S3 bucket
4. Open AWS Sagemaker and create a Notebook Instance (set properties : connect to any S3 bucket)
5. click 'Open Jupyter'
6. Copy paste and execute commands as in file: SEIS-745_DLE_Project_SemanticAnalysis.ipynb
[This will generate the semantic analysis file analysis_sheet.csv - This file is used for further visualization]
*You might need to !pip3 install library-name for errors "library-name module not found"
7. In new notebook Copy paste and execute commands as in file: SEIS-745_DLE_Project_TextClassification.ipynb
[ This is for text classification model. It will generate the model file: rnn_model_fake_real_text_classification.hdf5]
8. In new notebook Copy paste and execute commands as in file: SEIS-745_DLE_Project_ImageClassification.ipynb
[ This is for image classification model. It will generate the model file: vgg16_model_fake_real_image.hdf5]
9. For flask server installation and web interface follow the steps in flask_server folder 10. For NiFi and Kafka pipeline refer - Nifi_Kafka folder **Note: AS the hdf5 file of image model (vgg16_model_fake_real_image.hdf5) is >25MB, so issue over GitHub, thus sharing the file over google drive : https://drive.google.com/drive/folders/19cwJBIqxtdfzeiBqV80flSlSKyu6BJ91?usp=sharing

近期下载者

相关文件


收藏者