Fishing-The-Phish

所属分类:自然语言处理
开发工具:Jupyter Notebook
文件大小:5956KB
下载次数:1
上传日期:2020-12-07 00:20:40
上 传 者sh-1993
说明:  ShellHacks 2019“Akamai安全挑战”第一名获奖项目-数据科学、自然语言处理和...
(ShellHacks 2019 "Akamai Security Challenge" 1st place winning project - Data Science, Natural Language Procesing and Machine Learning techniques for detection of phishing links.)

文件列表:
.ipynb_checkpoints (0, 2020-12-07)
.ipynb_checkpoints\Introduction - FIU-checkpoint.ipynb (6243, 2020-12-07)
.ipynb_checkpoints\Is_it_a_phish-checkpoint.ipynb (22195, 2020-12-07)
.ipynb_checkpoints\Is_it_a_phish_final-checkpoint.ipynb (21116, 2020-12-07)
.ipynb_checkpoints\Is_it_a_phish_shellhacks-checkpoint.ipynb (18397, 2020-12-07)
.ipynb_checkpoints\fishing_the_phish-checkpoint.ipynb (17133, 2020-12-07)
.ipynb_checkpoints\nlp_ref-checkpoint.ipynb (67181, 2020-12-07)
.ipynb_checkpoints\nlp_ref_2-checkpoint.ipynb (49032, 2020-12-07)
.ipynb_checkpoints\phishing_data_visualization-checkpoint.ipynb (483922, 2020-12-07)
.ipynb_checkpoints\phishing_detector-checkpoint.ipynb (459494, 2020-12-07)
.ipynb_checkpoints\phishing_detector_for_functions-checkpoint.ipynb (140279, 2020-12-07)
.ipynb_checkpoints\trials_FIU-checkpoint.ipynb (735776, 2020-12-07)
.ipynb_checkpoints\trials_with_nlp-checkpoint.ipynb (526800, 2020-12-07)
FIU_Phishing_Mitre_Dataset.csv (309086, 2020-12-07)
FIU_Phishing_Mitre_Dataset_split_1.csv (276726, 2020-12-07)
FIU_Phishing_Mitre_Dataset_split_2.csv (32431, 2020-12-07)
Introduction - FIU.ipynb (6243, 2020-12-07)
best_model_1.sav (218710, 2020-12-07)
best_model_2.sav (420614, 2020-12-07)
fishing.jpeg (28137, 2020-12-07)
fishing_the_phish.ipynb (17133, 2020-12-07)
histograms.png (585231, 2020-12-07)
images (0, 2020-12-07)
images\.DS_Store (6148, 2020-12-07)
lstm_url.h5 (330792, 2020-12-07)
phishing_data_visualization.ipynb (483922, 2020-12-07)
scatter_plots.png (3209815, 2020-12-07)

# Fishing The Phish: Data science and Machine learning driven phishing link detection model Winner of "Akamai Security Challenge" and "MITRE Cyber Phishing Challenge" at ShellHacks 2019. https://devpost.com/software/fish-the-phish The world has increasingly become dependent on the internet for a majority of its tasks and has become an indispensable part of our daily lives. Therefore it comes as no surprise that many institutions implement various security measures to mitigate and also prevent the cyber-security attacks. Phishing is one of the many security attacks of this spectrum. In the above attack, a user wants to gain customer-sensitive information, by masquerading as a legitimate website. We completed this project as a part of MITRE cyber phishing challenge at ShellHacks 2019. We were given a small dataset of less than 5000 samples which had URLs, and lifetime details (3 features) of certain websites. Our goal was to perform exploratory data analysis to explain feature relevance, extract more features through feature engineering and train machine learning models to detect phishing links in an unseen test dataset. We used an ensemble of machine learning models such as Random Forests, K-nearest neighbors, Decision trees, Logistic regression and Adaptive Boosting with decision trees to build our model and developed an algorithm for dynamic optimal model selection for two different set of features and combined the results from various models to build final phishing detection model. Our approach resulted in an F1-score of more than 92% on an unseen test dataset that MITRE used to judge a number of submissions in their track.

近期下载者

相关文件


收藏者