News-Recommendation-system-using-Bert4Rec-model

所属分类:推荐系统
开发工具:Python
文件大小:0KB
下载次数:0
上传日期:2022-02-17 06:07:46
上 传 者sh-1993
说明:  新闻-建议-系统-使用-Bert4Rec-model,,
(News-Recommendation-system-using-Bert4Rec-model,,)

文件列表:
final_submission.py (1209, 2022-02-16)
main.py (8006, 2022-02-16)
pretrain_Bert4Rec_Model.py (5489, 2022-02-16)
test.py (3330, 2022-02-16)
utils/ (0, 2022-02-16)
utils/Early_stopping.py (2095, 2022-02-16)
utils/Transformer_Blocks.py (6339, 2022-02-16)
utils/model_trainer.py (962, 2022-02-16)
utils/news_rep.py (2886, 2022-02-16)
utils/options.py (958, 2022-02-16)
utils/w2v_rep.py (2204, 2022-02-16)

# News-Recommendation-system-using-Bert4Rec-model # Bert4rec for news Recommendation ## Dataset used: Microsoft News Dataset is a huge dataset for news recommendation research.It was collected from anonymous behavior logs of Microsoft News website.The purpose of MIND is to serve as a benchmark dataset for news recommendation and facilitate the research in news recommendation and recommender systems area. MIND contains about 160k English news articles and more than 15 million impression logs generated by 1 million users.We randomly sampled 1 million users who had at least 5 news click records during 6 weeks from October 12 to November 22, 2019. Every news article contains textual content including title, abstract, body, category and entities. Each impression log contains the click events, non-clicked events and historical news click behaviors of this user before this impression. There are 2,186,683 samples in the training set, 365,200 samples in the validation set, and 2,341,619 samples in the test set, which can empower the training of data-intensive news recommendation models. [MIND Dataset] https://msnews.github.io/assets/doc/ACL2020_MIND.pdf ## Model Description: Bert4Rec is a model used for products recommendation. In this project we have used the same Model for training a sequence of new articles. BERT4Rec uses a transformer model to learn the sequential representation of elements in a sequence. In this model we assume the news articles to be arranged in a chronological order in historical data. This we do using the script pretrain_Bert4Rec_Model.py. Thus we use masked sequences and train the model in such a way that the model is able to predict the masked elements. We use the output of the pretrained BERT4Rec model for getting the user representation by summing up the output of this model. Later we use this user representation to rank the candidate news. [BERT4Rec Sequential Recommendation with Bidirectional Encoder Representations from Transformer] https://arxiv.org/pdf/1904.06690.pdf ## Implementation: Taking the news titles in history which are arranged in chronological order we mask some news IDs in random from sequence. we train the Bert4Rec model which tries to identify the represenatation of the masked sequence. (change paths to access dataset) we run the following code python pretrain_Bert4Rec_Model.py later we finetune a CNN model for news representation. the CNN representation of candidate news and mean of Bert4Rec output passed on to a sigmoid layer after doing a dot product. this is done using python main.py ## Testing python test.py Before submission pass the result.txt file to prediction.txt for proper formatting. python final_submission.py ```python cleaner(".../MIND_dataset/result.txt",".../MINDlarge_test/behaviors.tsv","..../MIND_dataset/prediction.txt") ``` Reference: [BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer] https://github.com/FeiSun/BERT4Rec

近期下载者

相关文件


收藏者