我正在通过 transformers
Python 库和 API 计算
Jina v2 嵌入(请参阅 https://jina.ai/embeddings/)。
使用
transformers
我可以运行类似的东西
from transformers import AutoModel
sentences = ['How is the weather today?']
model = AutoModel.from_pretrained('jinaai/jina-embeddings-v2-base-en', trust_remote_code=True)
embeddings_1 = model.encode(sentences)
或
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('jinaai/jina-embeddings-v2-base-en')
embeddings_2 = model.encode(sentences)
以及生成的
embeddings_1
和 embeddings_2
匹配。
但是,如果我使用 Jina API,例如通过
import requests
url = 'https://api.jina.ai/v1/embeddings'
headers = {
'Content-Type': 'application/json',
'Authorization': 'Bearer jina_123456...' # visit https://jina.ai/embeddings/ for an API key
}
data = {
'input': sentences,
'model': 'jina-embeddings-v2-base-en' # note that the model name matches
}
response = requests.post(url, headers=headers, json=data)
embeddings_3 = eval(response.content)["data"][0]["embedding"]
embeddings_3
与其他两个数组的差异很小,平均绝对值约为 2e-4。我发现 CPU 和 GPU 运行时间都存在这种差异。我做错了什么?
API 端通常会应用一些优化(例如半精度 fp16)以获得更高的性能。您可以尝试以下示例:
from transformers import AutoModel
sentences = ['How is the weather today?']
model = AutoModel.from_pretrained('jinaai/jina-embeddings-v2-base-en', trust_remote_code=True, dtype=torch.float16)
embeddings_1 = model.encode(sentences)