TLDR_News_Summarization
所属分类:特征抽取
开发工具:Jupyter Notebook
文件大小:12485KB
下载次数:0
上传日期:2021-02-13 00:58:14
上 传 者:
sh-1993
说明: TLDR_新闻_摘要,,
(TLDR_News_Summarization,,)
文件列表:
code (0, 2021-02-13)
code\ROUGE_evaluation.ipynb (29325, 2021-02-13)
code\data (0, 2021-02-13)
code\data\sample_cnn_dataset.pkl (1906012, 2021-02-13)
code\finetune_bart-base_transformers.ipynb (44960, 2021-02-13)
code\many_transformer.ipynb (157101, 2021-02-13)
code\old_code (0, 2021-02-13)
code\old_code\best_cnn_seq2seq.ipynb (144899, 2021-02-13)
code\old_code\cnn_seq2seq.ipynb (171660, 2021-02-13)
code\old_code\frequency_and_bert.ipynb (83639, 2021-02-13)
code\old_code\test.py (439, 2021-02-13)
code\old_code\text_rank.ipynb (12965, 2021-02-13)
code\old_code\transformer.ipynb (145167, 2021-02-13)
code\old_code\transformer.py (3477, 2021-02-13)
code\process_data.py (7058, 2021-02-13)
code\project.ipynb (221653, 2021-02-13)
code\seq2seq_decoding.ipynb (25674, 2021-02-13)
code\seq2seq_training.ipynb (160187, 2021-02-13)
code\trained_seq2seq_model (0, 2021-02-13)
code\trained_seq2seq_model\decoder_model.json (3127, 2021-02-13)
code\trained_seq2seq_model\encoder_model.json (3589, 2021-02-13)
code\transformer.ipynb (67611, 2021-02-13)
code\zip (0, 2021-02-13)
code\zip\output.zip (4723127, 2021-02-13)
code\zip\source (0, 2021-02-13)
code\zip\source\ROUGE_evaluation.ipynb (29325, 2021-02-13)
code\zip\source\finetune_bart-base_transformers.ipynb (44960, 2021-02-13)
code\zip\source\many_transformer.ipynb (157101, 2021-02-13)
code\zip\source\process_data.py (7058, 2021-02-13)
code\zip\source\project.ipynb (221653, 2021-02-13)
code\zip\source\seq2seq_decoding.ipynb (25674, 2021-02-13)
code\zip\source\seq2seq_training.ipynb (160187, 2021-02-13)
code\zip\source\source.zip (164450, 2021-02-13)
code\zip\source\transformer.ipynb (67611, 2021-02-13)
milestone.pdf (338030, 2021-02-13)
proposal.pdf (185284, 2021-02-13)
... ...
**Final Project: TL;DR Text Summarization from basic to advanced approaches**
**Group name**: Oasis
**Group members**: Siyu Wu, Nattapat Juthaprachakul
**Email**: swa246@sfu.ca, njuthapr@sfu.ca
**Assignments of works**:
1. Processing dataset : Nattapat
2. Implementing vanilla sequence-to-sequence LSTM model: Nattapat
3. Implementing models with transformers: Nattapat, Siyu
4. Evaluation: Nattapat, Siyu
5. Making video, PPT, and final report: Nattapat, Siyu
**Text Summarization from basic to advanced approaches**
- The goal of this project is to implement different Text summarization techniques on the same dataset and compare them based on the same evaluation metrics such as ROUGE.
- The program is written in Python.
- This project is implemented in Google Colab with CPU, GPU, and TPU.
- Data preprocessing is implemented in process_data.py file and the results are saved into pickle.
- Predicted summaries and evaluation results are saved into CSV files for each model.
**Video Presentation:**
https://vault.sfu.ca/index.php/s/3CbZSqq0abUshGv
**Requirements:**
1. TensorFlow 2.0, Keras (LSTM model)
2. Huggingface Transformers API (Transformer models)
3. Pytorch (Finetuned Transformer model)
4. Python ML Libraries eg. Pandas, Numpy
5. Python 3.6 and above
**In this project, we are using the following models:**
1. Vanilla LSTM sequence-to-sequence model (trained from scratch)
2. BART-base model (finetuned)
3. BART-large model
4. T5-small model
5. T5-base model
6. T5-large model
- The code is in project.ipynb (or in separated files.)
**Datasets:**
- CNN and Daily Mail datasets (one online source: https://cs.nyu.edu/~kcho/DMQA/)
近期下载者:
相关文件:
收藏者: