TLDR_News_Summarization

所属分类:特征抽取
开发工具:Jupyter Notebook
文件大小:12485KB
下载次数:0
上传日期:2021-02-13 00:58:14
上 传 者sh-1993
说明:  TLDR_新闻_摘要,,
(TLDR_News_Summarization,,)

文件列表:
code (0, 2021-02-13)
code\ROUGE_evaluation.ipynb (29325, 2021-02-13)
code\data (0, 2021-02-13)
code\data\sample_cnn_dataset.pkl (1906012, 2021-02-13)
code\finetune_bart-base_transformers.ipynb (44960, 2021-02-13)
code\many_transformer.ipynb (157101, 2021-02-13)
code\old_code (0, 2021-02-13)
code\old_code\best_cnn_seq2seq.ipynb (144899, 2021-02-13)
code\old_code\cnn_seq2seq.ipynb (171660, 2021-02-13)
code\old_code\frequency_and_bert.ipynb (83639, 2021-02-13)
code\old_code\test.py (439, 2021-02-13)
code\old_code\text_rank.ipynb (12965, 2021-02-13)
code\old_code\transformer.ipynb (145167, 2021-02-13)
code\old_code\transformer.py (3477, 2021-02-13)
code\process_data.py (7058, 2021-02-13)
code\project.ipynb (221653, 2021-02-13)
code\seq2seq_decoding.ipynb (25674, 2021-02-13)
code\seq2seq_training.ipynb (160187, 2021-02-13)
code\trained_seq2seq_model (0, 2021-02-13)
code\trained_seq2seq_model\decoder_model.json (3127, 2021-02-13)
code\trained_seq2seq_model\encoder_model.json (3589, 2021-02-13)
code\transformer.ipynb (67611, 2021-02-13)
code\zip (0, 2021-02-13)
code\zip\output.zip (4723127, 2021-02-13)
code\zip\source (0, 2021-02-13)
code\zip\source\ROUGE_evaluation.ipynb (29325, 2021-02-13)
code\zip\source\finetune_bart-base_transformers.ipynb (44960, 2021-02-13)
code\zip\source\many_transformer.ipynb (157101, 2021-02-13)
code\zip\source\process_data.py (7058, 2021-02-13)
code\zip\source\project.ipynb (221653, 2021-02-13)
code\zip\source\seq2seq_decoding.ipynb (25674, 2021-02-13)
code\zip\source\seq2seq_training.ipynb (160187, 2021-02-13)
code\zip\source\source.zip (164450, 2021-02-13)
code\zip\source\transformer.ipynb (67611, 2021-02-13)
milestone.pdf (338030, 2021-02-13)
proposal.pdf (185284, 2021-02-13)
... ...

**Final Project: TL;DR Text Summarization from basic to advanced approaches** **Group name**: Oasis **Group members**: Siyu Wu, Nattapat Juthaprachakul **Email**: swa246@sfu.ca, njuthapr@sfu.ca **Assignments of works**: 1. Processing dataset : Nattapat 2. Implementing vanilla sequence-to-sequence LSTM model: Nattapat 3. Implementing models with transformers: Nattapat, Siyu 4. Evaluation: Nattapat, Siyu 5. Making video, PPT, and final report: Nattapat, Siyu **Text Summarization from basic to advanced approaches** - The goal of this project is to implement different Text summarization techniques on the same dataset and compare them based on the same evaluation metrics such as ROUGE. - The program is written in Python. - This project is implemented in Google Colab with CPU, GPU, and TPU. - Data preprocessing is implemented in process_data.py file and the results are saved into pickle. - Predicted summaries and evaluation results are saved into CSV files for each model. **Video Presentation:** https://vault.sfu.ca/index.php/s/3CbZSqq0abUshGv **Requirements:** 1. TensorFlow 2.0, Keras (LSTM model) 2. Huggingface Transformers API (Transformer models) 3. Pytorch (Finetuned Transformer model) 4. Python ML Libraries eg. Pandas, Numpy 5. Python 3.6 and above **In this project, we are using the following models:** 1. Vanilla LSTM sequence-to-sequence model (trained from scratch) 2. BART-base model (finetuned) 3. BART-large model 4. T5-small model 5. T5-base model 6. T5-large model - The code is in project.ipynb (or in separated files.) **Datasets:** - CNN and Daily Mail datasets (one online source: https://cs.nyu.edu/~kcho/DMQA/)

近期下载者

相关文件


收藏者