CgtGAN

所属分类:内容生成
开发工具:Python
文件大小:0KB
下载次数:0
上传日期:2023-08-27 08:50:59
上 传 者sh-1993
说明:  指挥官,,
(CgtGAN,,)

文件列表:
.DS_Store (8196, 2023-09-25)
LICENSE (1064, 2023-09-25)
cgtgan.py (30779, 2023-09-25)
cog.yaml (876, 2023-09-25)
inference.py (40686, 2023-09-25)
initialization.py (5466, 2023-09-25)
models.py (37066, 2023-09-25)
predict.py (10728, 2023-09-25)
preprocess/ (0, 2023-09-25)
preprocess/coco/ (0, 2023-09-25)
preprocess/coco/coco_train_captions.py (3492, 2023-09-25)
preprocess/coco/coco_train_images.py (2386, 2023-09-25)
preprocess/coco/coco_val-test.py (2709, 2023-09-25)
preprocess/external/ (0, 2023-09-25)
preprocess/external/gcc_external_captions.py (4213, 2023-09-25)
preprocess/external/ss_external_captions.py (4961, 2023-09-25)
preprocess/f30k/ (0, 2023-09-25)
preprocess/f30k/f30k_train_captions.py (3286, 2023-09-25)
preprocess/f30k/f30k_train_images.py (2287, 2023-09-25)
preprocess/f30k/f30k_val-test.py (2380, 2023-09-25)
preprocess/generate_embeddings.py (3345, 2023-09-25)
requirements.txt (239, 2023-09-25)
run_inference.py (15371, 2023-09-25)
utils/ (0, 2023-09-25)
utils/__init__.py (22, 2023-09-25)
utils/caption_evaluate.py (13279, 2023-09-25)
utils/cbs.py (39643, 2023-09-25)
utils/cider/ (0, 2023-09-25)
utils/cider/pyciderevalcap/ (0, 2023-09-25)
utils/cider/pyciderevalcap/__init__.py (21, 2023-09-25)
utils/cider/pyciderevalcap/cider/ (0, 2023-09-25)
utils/cider/pyciderevalcap/cider/__init__.py (21, 2023-09-25)
utils/cider/pyciderevalcap/cider/cider.py (1890, 2023-09-25)
utils/cider/pyciderevalcap/cider/cider_scorer.py (8234, 2023-09-25)
utils/cider/pyciderevalcap/ciderD/ (0, 2023-09-25)
utils/cider/pyciderevalcap/ciderD/__init__.py (21, 2023-09-25)
utils/cider/pyciderevalcap/ciderD/ciderD.py (1968, 2023-09-25)
utils/cider/pyciderevalcap/ciderD/ciderD_scorer.py (8860, 2023-09-25)
... ...

# CLIP-guided text GAN Code for our paper: [CgT-GAN: CLIP-guided Text GAN for Image Captioning](https://arxiv.org/abs/2308.12045) All pre-processed data and pretrained models are released in [BaiduPan](https://pan.baidu.com/s/1Og1PPOOdDFw7jMnG0W07Jw?pwd=s5wk). The [paper](https://arxiv.org/abs/2308.12045) has been accepted by ACM MM 2023. ## Dataset All data will be placed in ~/data as an example. ### MSCOCO Dataset 1. Download COCO images: [train](http://images.cocodataset.org/zips/train2014.zip) & [val/test](http://images.cocodataset.org/zips/val2014.zip) , put train2014 and val2014 folders in ~/data/coco (coco root directory). 2. Download COCO annotations: [annotations](https://pan.baidu.com/s/1FKm4Lud6ihIgR3Kenba5lg?pwd=k6a2), put all json files in ~/data/coco/annotations. Example of the COCO root directory folder: ``` ./data/coco --train2014 --val2014 --annotations ``` ### Flickr30K Dataset 1. Download images: [train/val/test](https://www.kaggle.com/datasets/hsankesara/flickr-image-dataset), put flickr30k_images folder in ~/data/Flickr30k. 2. Download annotations: [annotations](https://cs.stanford.edu/people/karpathy/deepimagesent/caption_datasets.zip), put all json files in ~/data/Flickr30k/annotations. ### Text Corpus * ShutterStock : download from [UIC](https://github.com/fengyang0317/unsupervised_captioning/issues/42) * Google Conceptual Captions : download from [Train_GCC-training.tsv](https://ai.google.com/research/ConceptualCaptions/download) Place all externel data in ~/data/externel for subsequent processing. ## Data_Preprocess We take MSCOCO Dataset and GCC external corpus as an example. * Extract clip embeddings for COCO images and captions. All extracted embedding pkl files will be saved in ~/data/coco. ``` python preprocess/coco/coco_train_images.py python preprocess/coco/coco_train_captions.py python preprocess/coco/coco_val-test.py ``` * Extract clip embeddings for GCC captions. The pkl files will be saved in ~/data/external. ``` python preprocess/external/gcc_external_captions.py ``` * Then generate aggregated textual embeddings. ``` python preprocess/generate_embeddings.py --image_pkl ./data/coco/coco_ViT-L_14_train_images.pkl --caption_pkl ./data/coco/coco_ViT-L_14_train_captions.pkl --image_dataset coco --caption_corpus coco --t 100 python preprocess/generate_embeddings.py --image_pkl ./data/coco/coco_ViT-L_14_train_images.pkl --caption_pkl ./data/external/gcc_ViT-L_14_external_captions.pkl --image_dataset coco --caption_corpus gcc --t 175 ``` ## Initialization * Initialize generator using COCO Captions: ``` python initialization.py --output_dir path/to/save/folder --data ./data/coco/coco_ViT-L_14_train_captions.pkl ``` * Initialize generator using GCC Captions: ``` python initialization.py --output_dir path/to/save/folder --data ./data/external/gcc_ViT-L_14_external_captions.pkl ``` ## Training * Training model under MSCOCO images <-> MSCOCO captions setting: ``` gpus=0,1 CUDA_VISIBLE_DEVICES=$gpus nohup python -m torch.distributed.launch \ --master_port 17527 \ --nproc_per_node 2 cgtgan.py \ --output_dir path/to/save/folder \ --generator_init path/to/init/model.pt \ --data_train ./data/coco/coco_images_coco_captions_ViT-L_14_100.pkl \ --data_val ./data/coco/coco_ViT-L_14_val.pkl \ --data_test ./data/coco/coco_ViT-L_14_test.pkl \ --text_corpus ./data/coco/coco_train_sentences.pkl \ --gt_val ./data/coco/annotations/val_caption_coco_format.json \ --gt_test ./data/coco/annotations/test_caption_coco_format.json \ --do_train \ --epochs 50 \ > coco.out & ``` * Training model under MSCOCO images <-> GCC captions setting: ``` gpus=0,1 mkdir ./output/gcc CUDA_VISIBLE_DEVICES=$gpus nohup python -m torch.distributed.launch \ --master_port 17528 \ --nproc_per_node 2 cgtgan.py \ --output_dir path/to/save/folder \ --generator_init path/to/init/model.pt \ --data_train ./data/external/coco_images_gcc_captions_ViT-L_14_175.pkl \ --data_val ./data/coco/coco_ViT-L_14_val.pkl \ --data_test ./data/coco/coco_ViT-L_14_test.pkl \ --text_corpus ./data/external/gcc_external_sentences.pkl \ --gt_val ./data/coco/annotations/val_caption_coco_format.json \ --gt_test ./data/coco/annotations/test_caption_coco_format.json \ --do_train \ --epochs 80 \ > gcc.out & ``` ## Evaluation * Test model checkpoint on MSCOCO test split: ``` python -u cgtgan.py \ --output_dir path/to/save/folder \ --generator_init path/to/checkpoint/model.pt \ --data_test ./data/coco/coco_ViT-L_14_test.pkl \ --gt_test ./data/coco/annotations/test_caption_coco_format.json \ --do_eval \ ``` ## Cite ``` @article{2308.12045, Author = {Jiarui Yu and Haoran Li and Yanbin Hao and Bin Zhu and Tong Xu and Xiangnan He}, Title = {CgT-GAN: CLIP-guided Text GAN for Image Captioning}, Year = {2023}, Eprint = {arXiv:2308.12045}, Doi = {10.1145/3581783.3611891}, } ```

近期下载者

相关文件


收藏者