CgtGAN
所属分类:内容生成
开发工具:Python
文件大小:0KB
下载次数:0
上传日期:2023-08-27 08:50:59
上 传 者:
sh-1993
说明: 指挥官,,
(CgtGAN,,)
文件列表:
.DS_Store (8196, 2023-09-25)
LICENSE (1064, 2023-09-25)
cgtgan.py (30779, 2023-09-25)
cog.yaml (876, 2023-09-25)
inference.py (40686, 2023-09-25)
initialization.py (5466, 2023-09-25)
models.py (37066, 2023-09-25)
predict.py (10728, 2023-09-25)
preprocess/ (0, 2023-09-25)
preprocess/coco/ (0, 2023-09-25)
preprocess/coco/coco_train_captions.py (3492, 2023-09-25)
preprocess/coco/coco_train_images.py (2386, 2023-09-25)
preprocess/coco/coco_val-test.py (2709, 2023-09-25)
preprocess/external/ (0, 2023-09-25)
preprocess/external/gcc_external_captions.py (4213, 2023-09-25)
preprocess/external/ss_external_captions.py (4961, 2023-09-25)
preprocess/f30k/ (0, 2023-09-25)
preprocess/f30k/f30k_train_captions.py (3286, 2023-09-25)
preprocess/f30k/f30k_train_images.py (2287, 2023-09-25)
preprocess/f30k/f30k_val-test.py (2380, 2023-09-25)
preprocess/generate_embeddings.py (3345, 2023-09-25)
requirements.txt (239, 2023-09-25)
run_inference.py (15371, 2023-09-25)
utils/ (0, 2023-09-25)
utils/__init__.py (22, 2023-09-25)
utils/caption_evaluate.py (13279, 2023-09-25)
utils/cbs.py (39643, 2023-09-25)
utils/cider/ (0, 2023-09-25)
utils/cider/pyciderevalcap/ (0, 2023-09-25)
utils/cider/pyciderevalcap/__init__.py (21, 2023-09-25)
utils/cider/pyciderevalcap/cider/ (0, 2023-09-25)
utils/cider/pyciderevalcap/cider/__init__.py (21, 2023-09-25)
utils/cider/pyciderevalcap/cider/cider.py (1890, 2023-09-25)
utils/cider/pyciderevalcap/cider/cider_scorer.py (8234, 2023-09-25)
utils/cider/pyciderevalcap/ciderD/ (0, 2023-09-25)
utils/cider/pyciderevalcap/ciderD/__init__.py (21, 2023-09-25)
utils/cider/pyciderevalcap/ciderD/ciderD.py (1968, 2023-09-25)
utils/cider/pyciderevalcap/ciderD/ciderD_scorer.py (8860, 2023-09-25)
... ...
# CLIP-guided text GAN
Code for our paper: [CgT-GAN: CLIP-guided Text GAN for Image Captioning](https://arxiv.org/abs/2308.12045)
All pre-processed data and pretrained models are released in [BaiduPan](https://pan.baidu.com/s/1Og1PPOOdDFw7jMnG0W07Jw?pwd=s5wk).
The [paper](https://arxiv.org/abs/2308.12045) has been accepted by ACM MM 2023.
## Dataset
All data will be placed in ~/data as an example.
### MSCOCO Dataset
1. Download COCO images: [train](http://images.cocodataset.org/zips/train2014.zip) & [val/test](http://images.cocodataset.org/zips/val2014.zip) , put train2014 and val2014 folders in
~/data/coco (coco root directory).
2. Download COCO annotations: [annotations](https://pan.baidu.com/s/1FKm4Lud6ihIgR3Kenba5lg?pwd=k6a2), put all json files in ~/data/coco/annotations.
Example of the COCO root directory folder:
```
./data/coco
--train2014
--val2014
--annotations
```
### Flickr30K Dataset
1. Download images: [train/val/test](https://www.kaggle.com/datasets/hsankesara/flickr-image-dataset), put flickr30k_images folder in ~/data/Flickr30k.
2. Download annotations: [annotations](https://cs.stanford.edu/people/karpathy/deepimagesent/caption_datasets.zip), put all json files in ~/data/Flickr30k/annotations.
### Text Corpus
* ShutterStock : download from [UIC](https://github.com/fengyang0317/unsupervised_captioning/issues/42)
* Google Conceptual Captions : download from [Train_GCC-training.tsv](https://ai.google.com/research/ConceptualCaptions/download)
Place all externel data in ~/data/externel for subsequent processing.
## Data_Preprocess
We take MSCOCO Dataset and GCC external corpus as an example.
* Extract clip embeddings for COCO images and captions. All extracted embedding pkl files will be saved in ~/data/coco.
```
python preprocess/coco/coco_train_images.py
python preprocess/coco/coco_train_captions.py
python preprocess/coco/coco_val-test.py
```
* Extract clip embeddings for GCC captions. The pkl files will be saved in ~/data/external.
```
python preprocess/external/gcc_external_captions.py
```
* Then generate aggregated textual embeddings.
```
python preprocess/generate_embeddings.py --image_pkl ./data/coco/coco_ViT-L_14_train_images.pkl --caption_pkl ./data/coco/coco_ViT-L_14_train_captions.pkl --image_dataset coco --caption_corpus coco --t 100
python preprocess/generate_embeddings.py --image_pkl ./data/coco/coco_ViT-L_14_train_images.pkl --caption_pkl ./data/external/gcc_ViT-L_14_external_captions.pkl --image_dataset coco --caption_corpus gcc --t 175
```
## Initialization
* Initialize generator using COCO Captions:
```
python initialization.py --output_dir path/to/save/folder --data ./data/coco/coco_ViT-L_14_train_captions.pkl
```
* Initialize generator using GCC Captions:
```
python initialization.py --output_dir path/to/save/folder --data ./data/external/gcc_ViT-L_14_external_captions.pkl
```
## Training
* Training model under MSCOCO images <-> MSCOCO captions setting:
```
gpus=0,1
CUDA_VISIBLE_DEVICES=$gpus nohup python -m torch.distributed.launch \
--master_port 17527 \
--nproc_per_node 2 cgtgan.py \
--output_dir path/to/save/folder \
--generator_init path/to/init/model.pt \
--data_train ./data/coco/coco_images_coco_captions_ViT-L_14_100.pkl \
--data_val ./data/coco/coco_ViT-L_14_val.pkl \
--data_test ./data/coco/coco_ViT-L_14_test.pkl \
--text_corpus ./data/coco/coco_train_sentences.pkl \
--gt_val ./data/coco/annotations/val_caption_coco_format.json \
--gt_test ./data/coco/annotations/test_caption_coco_format.json \
--do_train \
--epochs 50 \
> coco.out &
```
* Training model under MSCOCO images <-> GCC captions setting:
```
gpus=0,1
mkdir ./output/gcc
CUDA_VISIBLE_DEVICES=$gpus nohup python -m torch.distributed.launch \
--master_port 17528 \
--nproc_per_node 2 cgtgan.py \
--output_dir path/to/save/folder \
--generator_init path/to/init/model.pt \
--data_train ./data/external/coco_images_gcc_captions_ViT-L_14_175.pkl \
--data_val ./data/coco/coco_ViT-L_14_val.pkl \
--data_test ./data/coco/coco_ViT-L_14_test.pkl \
--text_corpus ./data/external/gcc_external_sentences.pkl \
--gt_val ./data/coco/annotations/val_caption_coco_format.json \
--gt_test ./data/coco/annotations/test_caption_coco_format.json \
--do_train \
--epochs 80 \
> gcc.out &
```
## Evaluation
* Test model checkpoint on MSCOCO test split:
```
python -u cgtgan.py \
--output_dir path/to/save/folder \
--generator_init path/to/checkpoint/model.pt \
--data_test ./data/coco/coco_ViT-L_14_test.pkl \
--gt_test ./data/coco/annotations/test_caption_coco_format.json \
--do_eval \
```
## Cite
```
@article{2308.12045,
Author = {Jiarui Yu and Haoran Li and Yanbin Hao and Bin Zhu and Tong Xu and Xiangnan He},
Title = {CgT-GAN: CLIP-guided Text GAN for Image Captioning},
Year = {2023},
Eprint = {arXiv:2308.12045},
Doi = {10.1145/3581783.3611891},
}
```
近期下载者:
相关文件:
收藏者: