Translation-with-attention-GRU

所属分类:自然语言处理
开发工具:Jupyter Notebook
文件大小:0KB
下载次数:2
上传日期:2023-08-30 10:33:34
上 传 者sh-1993
说明:  使用基于注意力的GRU进行机器翻译,
(Machine Translation with attention-based GRU,)

文件列表:
Images/ (0, 2023-09-15)
Images/attn1.png (27651, 2023-09-15)
Images/attn2.png (50299, 2023-09-15)
Images/table1.png (20683, 2023-09-15)
Images/table2-1.png (22391, 2023-09-15)
Images/table2-2.png (22122, 2023-09-15)
Images/table2-3.png (21864, 2023-09-15)
Images/table2-4.png (23222, 2023-09-15)
Images/table2.png (21138, 2023-09-15)
Images/table3-1.png (26166, 2023-09-15)
Images/table3-2.png (26727, 2023-09-15)
Images/table3-3.png (26690, 2023-09-15)
Images/table3-4.png (26630, 2023-09-15)
Images/table3.png (21468, 2023-09-15)
Images/table4.png (21228, 2023-09-15)
Model.ipynb (5733600, 2023-09-15)
project report and result analysis.pdf (348524, 2023-09-15)

# Translation-with-attention-GRU Machine Translation with attention-based GRU Conduct comprehensive study and tuning on different parts of the attention-based GRU architecture. ## Dataset

A multilingual parallel dataset from ACL 2014 NINTH WORKSHOP ON STATISTICAL MACHINE TRANSLATION is used in this project. In particular, the News Commentary category is used. There are in total 4 language pairs in this dataset and all are bidirectional:

CS-EN DE-EN FR-EN RU-EN

In this project, all 4 language pairs' performances will be examined, but English will always be fixed as the target language to predict. More information about the dataset can be found here: http://www.statmt.org/wmt14/translation-task.html#download

## Experiment Setup

All the experiments below will be based on the BLEU scores evaluated from the nltk.translate.bleu_score's corpus_bleu() function. BLEU scores generated from this function are typically decimal values all smaller than 1. Each training session takes 75000 iterations.

### Experiment 1: Compare the model performance between the greedy decoding and beam search decoding

In this section, the greedy decoding performance will be used as the comparison baseline, which is clearer to see whether switching to beam search decoding actually improves the performance or not. In terms of the beam search decoding setup, the stopping criterion adopted here is the number of completed hypotheses collected. On top of that, there're some tweaks to the specific detailed settings:

**Default setting**: Apply length normalization when calculating each hypothesis score and No limitation on the max sequence length **N MAX L**: Apply length normalization when calculating each hypothesis score but there's a limitation on the max sequence length **NO N NO MAX L**: Not apply length normalization when calculating each hypothesis score and there's a limitation on the max sequence length

The four language pairs' results are shown as followed:

![table1](https://github.com/zhangziyi1996/Translation-with-attention-GRU/blob/master/Images/table1.png) ![table2](https://github.com/zhangziyi1996/Translation-with-attention-GRU/blob/master/Images/table2.png) ![table3](https://github.com/zhangziyi1996/Translation-with-attention-GRU/blob/master/Images/table3.png) ![table4](https://github.com/zhangziyi1996/Translation-with-attention-GRU/blob/master/Images/table4.png) ### Experiment 2: Compare the model performance between random initialized embedding and pretrained embedding layer

The loaded embedding layer is pretrained by Word2Vec from gensim.models package. The pretraining data is monolingual training data which can be downloaded from the link: http://www.statmt.org/wmt14/translation-task.html#download.

The comparison results are: ![table21](https://github.com/zhangziyi1996/Translation-with-attention-GRU/blob/master/Images/table2-1.png) ![table22](https://github.com/zhangziyi1996/Translation-with-attention-GRU/blob/master/Images/table2-2.png) ![table23](https://github.com/zhangziyi1996/Translation-with-attention-GRU/blob/master/Images/table2-3.png) ![table24](https://github.com/zhangziyi1996/Translation-with-attention-GRU/blob/master/Images/table2-4.png) ### Experiment 3: Compare the model performance between different attention mechanisms In details, besides the default dot-product attention mechanism, two other attention variants are also being examined: The comparison results are: ![table31](https://github.com/zhangziyi1996/Translation-with-attention-GRU/blob/master/Images/table3-1.png) ![table32](https://github.com/zhangziyi1996/Translation-with-attention-GRU/blob/master/Images/table3-2.png) ![table33](https://github.com/zhangziyi1996/Translation-with-attention-GRU/blob/master/Images/table3-3.png) ![table34](https://github.com/zhangziyi1996/Translation-with-attention-GRU/blob/master/Images/table3-4.png)

近期下载者

相关文件


收藏者