attention-is-all-you-need-pytorch-master

所属分类:其他
开发工具:Python
文件大小:27KB
下载次数:2
上传日期:2021-03-11 10:03:31
上 传 者沫啊啊啊啊
说明:  attention is all you need

文件列表:
LICENSE (1069, 2021-02-17)
apply_bpe.py (9134, 2021-02-17)
learn_bpe.py (9239, 2021-02-17)
preprocess.py (12646, 2021-02-17)
requirements.txt (169, 2021-02-17)
train.py (13173, 2021-02-17)
train_multi30k_de_en.sh (468, 2021-02-17)
transformer (0, 2021-02-17)
transformer\Constants.py (75, 2021-02-17)
transformer\Layers.py (1684, 2021-02-17)
transformer\Models.py (7678, 2021-02-17)
transformer\Modules.py (674, 2021-02-17)
transformer\Optim.py (1135, 2021-02-17)
transformer\SubLayers.py (2606, 2021-02-17)
transformer\Translator.py (4562, 2021-02-17)
transformer\__init__.py (367, 2021-02-17)
translate.py (4077, 2021-02-17)

# Attention is all you need: A Pytorch Implementation This is a PyTorch implementation of the Transformer model in "[Attention is All You Need](https://arxiv.org/abs/1706.03762)" (Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, arxiv, 2017). A novel sequence to sequence framework utilizes the **self-attention mechanism**, instead of Convolution operation or Recurrent structure, and achieve the state-of-the-art performance on **WMT 2014 English-to-German translation task**. (2017/06/12) > The official Tensorflow Implementation can be found in: [tensorflow/tensor2tensor](https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/models/transformer.py). > To learn more about self-attention mechanism, you could read "[A Structured Self-attentive Sentence Embedding](https://arxiv.org/abs/1703.03130)".

The project support training and translation with trained model now. Note that this project is still a work in progress. **BPE related parts are not yet fully tested.** If there is any suggestion or error, feel free to fire an issue to let me know. :) # Usage ## WMT'16 Multimodal Translation: de-en An example of training for the WMT'16 Multimodal Translation task (http://www.statmt.org/wmt16/multimodal-task.html). ### 0) Download the spacy language model. ```bash # conda install -c conda-forge spacy python -m spacy download en python -m spacy download de ``` ### 1) Preprocess the data with torchtext and spacy. ```bash python preprocess.py -lang_src de -lang_trg en -share_vocab -save_data m30k_deen_shr.pkl ``` ### 2) Train the model ```bash python train.py -data_pkl m30k_deen_shr.pkl -log m30k_deen_shr -embs_share_weight -proj_share_weight -label_smoothing -output_dir output -b 256 -warmup 128000 -epoch 400 ``` ### 3) Test the model ```bash python translate.py -data_pkl m30k_deen_shr.pkl -model trained.chkpt -output prediction.txt ``` ## [(WIP)] WMT'17 Multimodal Translation: de-en w/ BPE ### 1) Download and preprocess the data with bpe: > Since the interfaces is not unified, you need to switch the main function call from `main_wo_bpe` to `main`. ```bash python preprocess.py -raw_dir /tmp/raw_deen -data_dir ./bpe_deen -save_data bpe_vocab.pkl -codes codes.txt -prefix deen ``` ### 2) Train the model ```bash python train.py -data_pkl ./bpe_deen/bpe_vocab.pkl -train_path ./bpe_deen/deen-train -val_path ./bpe_deen/deen-val -log deen_bpe -embs_share_weight -proj_share_weight -label_smoothing -output_dir output -b 256 -warmup 128000 -epoch 400 ``` ### 3) Test the model (not ready) - TODO: - Load vocabulary. - Perform decoding after the translation. --- # Performance ## Training

- Parameter settings: - batch size 256 - warmup step 4000 - epoch 200 - lr_mul 0.5 - label smoothing - do not apply BPE and shared vocabulary - target embedding / pre-softmax linear layer weight sharing. ## Testing - coming soon. --- # TODO - Evaluation on the generated text. - Attention weight plot. --- # Acknowledgement - The byte pair encoding parts are borrowed from [subword-nmt](https://github.com/rsennrich/subword-nmt/). - The project structure, some scripts and the dataset preprocessing steps are heavily borrowed from [OpenNMT/OpenNMT-py](https://github.com/OpenNMT/OpenNMT-py). - Thanks for the suggestions from @srush, @iamalbert, @Zessay, @JulesGM, @ZiJianZhao, and @huanghoujing.

近期下载者

相关文件


收藏者