为什么在计算 Jina 嵌入时本地推理与 API 不同?

问题描述 投票:0回答:1

我正在通过 transformers Python 库和 API 计算

Jina v2 嵌入
(请参阅 https://jina.ai/embeddings/)。

使用

transformers
我可以运行类似的东西

from transformers import AutoModel

sentences = ['How is the weather today?']

model = AutoModel.from_pretrained('jinaai/jina-embeddings-v2-base-en', trust_remote_code=True)
embeddings_1 = model.encode(sentences)

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('jinaai/jina-embeddings-v2-base-en')
embeddings_2 = model.encode(sentences)

以及生成的

embeddings_1
embeddings_2
匹配。

但是,如果我使用 Jina API,例如通过

import requests

url = 'https://api.jina.ai/v1/embeddings'

headers = {
  'Content-Type': 'application/json',
  'Authorization': 'Bearer jina_123456...' # visit https://jina.ai/embeddings/ for an API key
}

data = {
  'input': sentences,
  'model': 'jina-embeddings-v2-base-en' # note that the model name matches
}

response = requests.post(url, headers=headers, json=data)
embeddings_3 = eval(response.content)["data"][0]["embedding"]

embeddings_3
与其他两个数组的差异很小,平均绝对值约为 2e-4。我发现 CPU 和 GPU 运行时间都存在这种差异。我做错了什么?

python nlp huggingface-transformers sentence-transformers jina
1个回答
0
投票

API 端通常会应用一些优化(例如半精度 fp16)以获得更高的性能。您可以尝试以下示例:

from transformers import AutoModel

sentences = ['How is the weather today?']

model = AutoModel.from_pretrained('jinaai/jina-embeddings-v2-base-en', trust_remote_code=True,  dtype=torch.float16)
embeddings_1 = model.encode(sentences)
© www.soinside.com 2019 - 2024. All rights reserved.