ImageToText_Django_ai

所属分类:模式识别(视觉/语音等)
开发工具:Python
文件大小:0KB
下载次数:2
上传日期:2023-09-04 04:05:44
上 传 者sh-1993
说明:  图像ToText_Django_ai,,
(ImageToText_Django_ai,,)

文件列表:
Scripts/ (0, 2023-09-30)
Scripts/Activate.ps1 (23703, 2023-09-30)
Scripts/activate (2029, 2023-09-30)
Scripts/activate.bat (993, 2023-09-30)
Scripts/convert-caffe2-to-onnx.exe (107949, 2023-09-30)
Scripts/convert-onnx-to-caffe2.exe (107949, 2023-09-30)
Scripts/deactivate.bat (371, 2023-09-30)
Scripts/django-admin.exe (107960, 2023-09-30)
Scripts/f2py.exe (107913, 2023-09-30)
Scripts/huggingface-cli.exe (107936, 2023-09-30)
Scripts/isympy.exe (107902, 2023-09-30)
Scripts/normalizer.exe (107941, 2023-09-30)
Scripts/pip.exe (107918, 2023-09-30)
Scripts/pip3.10.exe (107918, 2023-09-30)
Scripts/pip3.exe (107918, 2023-09-30)
Scripts/python.exe (266624, 2023-09-30)
Scripts/pythonw.exe (254848, 2023-09-30)
Scripts/sqlformat.exe (107913, 2023-09-30)
Scripts/torchrun.exe (107917, 2023-09-30)
Scripts/tqdm.exe (107904, 2023-09-30)
Scripts/transformers-cli.exe (107934, 2023-09-30)
imgCaptionGenerator/ (0, 2023-09-30)
imgCaptionGenerator/db.sqlite3 (131072, 2023-09-30)
imgCaptionGenerator/imgCaptionApp/ (0, 2023-09-30)
imgCaptionGenerator/imgCaptionApp/__init__.py (0, 2023-09-30)
imgCaptionGenerator/imgCaptionApp/__pycache__/ (0, 2023-09-30)
imgCaptionGenerator/imgCaptionApp/__pycache__/__init__.cpython-310.pyc (191, 2023-09-30)
imgCaptionGenerator/imgCaptionApp/__pycache__/admin.cpython-310.pyc (232, 2023-09-30)
imgCaptionGenerator/imgCaptionApp/__pycache__/apps.cpython-310.pyc (484, 2023-09-30)
imgCaptionGenerator/imgCaptionApp/__pycache__/models.cpython-310.pyc (229, 2023-09-30)
imgCaptionGenerator/imgCaptionApp/__pycache__/urls.cpython-310.pyc (329, 2023-09-30)
imgCaptionGenerator/imgCaptionApp/__pycache__/views.cpython-310.pyc (1912, 2023-09-30)
imgCaptionGenerator/imgCaptionApp/admin.py (63, 2023-09-30)
imgCaptionGenerator/imgCaptionApp/apps.py (158, 2023-09-30)
imgCaptionGenerator/imgCaptionApp/migrations/ (0, 2023-09-30)
imgCaptionGenerator/imgCaptionApp/migrations/__init__.py (0, 2023-09-30)
imgCaptionGenerator/imgCaptionApp/migrations/__pycache__/ (0, 2023-09-30)
imgCaptionGenerator/imgCaptionApp/migrations/__pycache__/__init__.cpython-310.pyc (202, 2023-09-30)
imgCaptionGenerator/imgCaptionApp/models.py (31, 2023-09-30)
... ...

# ImageToText_Django_ai # image_caption_generator a graphical interface for selecting an image file and generating captions for the selected image using the pre-trained Vision Encoder-Decoder model with a ViT backbone. --># import libraries os:for interacting with the operating system. PIL.Image: The Python Imaging Library module for opening and manipulating images. torch: The PyTorch library for tensor computations. transformers: The Hugging Face Transformers library for natural language processing tasks. tkinter: The standard Python interface for creating GUI applications. --># Load pre-trained models and tokenizer: The VisionEncoderDecoderModel is loaded from the "nlpconnect/vit-gpt2-image-captioning" pre-trained model. This model combines a Vision Transformer (ViT) backbone with a GPT-2 language model head for image captioning. The ViTFeatureExtractor is loaded from the same pre-trained model. It provides the necessary image preprocessing and encoding functionalities for the Vision Encoder-Decoder model. The AutoTokenizer is loaded from the same pre-trained model. It is used to tokenize the generated captions. --># Set device: The code checks if a CUDA-compatible GPU is available. If so, the model will be loaded on the GPU; otherwise, it will be loaded on the CPU. --># Set generation parameters: max_length defines the maximum length of the generated captions. num_beams defines the number of beams to use during caption generation. Beams are used in beam search to explore multiple possible captions. --># Define the predict_step function: This function takes a list of image paths and an optional parameter num_captions to specify the number of captions to generate for each image. It opens and converts the images to RGB format using PIL. The ViTFeatureExtractor is used to preprocess and encode the images into pixel values. The pixel values are then passed to the VisionEncoderDecoderModel for caption generation. The generated caption IDs are decoded using the tokenizer, and the special tokens are removed. The function returns a list of generated captions. --># Define the get_image_caption function: This function takes an image path and calls the predict_step function with the image path to generate captions.

近期下载者

相关文件


收藏者