如何使用开放的 AI Clip 模型计算给定 2 张图像的图像相似度 - 哪种方法/AI 模型最适合计算图像相似度？

Question

我准备了一个小示例代码，但它抛出错误。无法解决问题，因为它应该可以工作。

另外你认为有没有更好的方法来计算图像相似度？我想找到类似的布图像。例如我会给出一件外套的图片，我想找到类似的外套。

此代码还能处理所有尺寸的图像和所有类型的图像吗？

代码在这里

import torch
import torchvision.transforms as transforms
import urllib.request
from transformers import CLIPProcessor, CLIPModel, CLIPTokenizer
from PIL import Image

# Load the CLIP model
device = "cuda" if torch.cuda.is_available() else "cpu"
model_ID = "openai/clip-vit-base-patch32"
model = CLIPModel.from_pretrained(model_ID).to(device)

preprocess = CLIPProcessor.from_pretrained(model_ID)


# Define a function to load an image and preprocess it for CLIP
def load_and_preprocess_image(image_path):
    # Load the image from the specified path
    image = Image.open(image_path)

    # Apply the CLIP preprocessing to the image
    image = preprocess(image).unsqueeze(0).to(device)

    # Return the preprocessed image
    return image

# Load the two images and preprocess them for CLIP
image_a = load_and_preprocess_image('/content/a.png')
image_b = load_and_preprocess_image('/content/b.png')

# Calculate the embeddings for the images using the CLIP model
with torch.no_grad():
    embedding_a = model.encode_image(image_a)
    embedding_b = model.encode_image(image_b)

# Calculate the cosine similarity between the embeddings
similarity_score = torch.nn.functional.cosine_similarity(embedding_a, embedding_b)

# Print the similarity score
print('Similarity score:', similarity_score.item())

这里是错误信息

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[<ipython-input-24-e95a926e1bc8>](https://localhost:8080/#) in <module>
     25 
     26 # Load the two images and preprocess them for CLIP
---> 27 image_a = load_and_preprocess_image('/content/a.png')
     28 image_b = load_and_preprocess_image('/content/b.png')
     29 

3 frames
[/usr/local/lib/python3.9/dist-packages/transformers/tokenization_utils_base.py](https://localhost:8080/#) in _call_one(self, text, text_pair, add_special_tokens, padding, truncation, max_length, stride, is_split_into_words, pad_to_multiple_of, return_tensors, return_token_type_ids, return_attention_mask, return_overflowing_tokens, return_special_tokens_mask, return_offsets_mapping, return_length, verbose, **kwargs)
   2579 
   2580         if not _is_valid_text_input(text):
-> 2581             raise ValueError(
   2582                 "text input must of type `str` (single example), `List[str]` (batch or single pretokenized example) "
   2583                 "or `List[List[str]]` (batch of pretokenized examples)."

ValueError: text input must of type `str` (single example), `List[str]` (batch or single pretokenized example) or `List[List[str]]` (batch of pretokenized examples)

Answer 1

我不确定为什么这段代码应该工作，因为它包含几个错误（

CLIPModel

没有

encode_image

。

CLIPProcessor.__call__

的第一个参数需要文本，第二个参数是图像。）请找到更正的代码下面：

import torch
from transformers import CLIPImageProcessor, CLIPModel, CLIPTokenizer
from PIL import Image

# Load the CLIP model
model_ID = "openai/clip-vit-base-patch32"
model = CLIPModel.from_pretrained(model_ID)

preprocess = CLIPImageProcessor.from_pretrained(model_ID)

# Define a function to load an image and preprocess it for CLIP
def load_and_preprocess_image(image_path):
    # Load the image from the specified path
    image = Image.open(image_path)

    # Apply the CLIP preprocessing to the image
    image = preprocess(image, return_tensors="pt")

    # Return the preprocessed image
    return image

# Load the two images and preprocess them for CLIP
image_a = load_and_preprocess_image('/content/bla.png')["pixel_values"]
image_b = load_and_preprocess_image('/content/bla.png')["pixel_values"]

# Calculate the embeddings for the images using the CLIP model
with torch.no_grad():
    embedding_a = model.get_image_features(image_a)
    embedding_b = model.get_image_features(image_b)

# Calculate the cosine similarity between the embeddings
similarity_score = torch.nn.functional.cosine_similarity(embedding_a, embedding_b)

# Print the similarity score
print('Similarity score:', similarity_score.item())

输出：

Similarity score: 1.0000001192092896

如何使用开放的 AI Clip 模型计算给定 2 张图像的图像相似度 - 哪种方法/AI 模型最适合计算图像相似度？

问题描述投票：0回答：1

1个回答

最新问题

如何使用开放的 AI Clip 模型计算给定 2 张图像的图像相似度 - 哪种方法/AI 模型最适合计算图像相似度？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1