通过过滤查询高效查询 Solr 9 的相似度分数

Question

我使用 Solr 9 进行最佳查询文档相似度计算。我有一个用例，我必须首先查询特定字段值，然后计算找到的所有文档的文档相似度。

我的问题如下：如果每个文档都有一个字段“embedding”和“id”，我只想检索 id=1,2,3 的文档，并给定查询嵌入，返回每个文档与查询嵌入的相似度分数。

选项 1：使用

fq

查询 id，使用

查询

knn

字段。由于以下限制，并非所有我想要的文件都会被退回。

主要问题记录在here：

When using knn in re-ranking pay attention to the topK parameter.
The second pass score(deriving from knn) is calculated only if the document d from the first pass is within the k-nearest neighbors(in the whole index) of the target vector to search.
This means the second pass knn is executed on the whole index anyway, which is a current limitation.

选项2：使用

fq

查询id，获取字段列表中的

embedding

，并计算内存中的相似度。问题是网络延迟，因为检索嵌入时来自 Solr 的响应大小很大。

剩下以下两个问题：

上述文档中的限制什么时候能解决（如果有的话）？
有没有办法压缩来自 Solr 的响应，以便我可以更快地检索响应？

谢谢！

Answer 1

你可以尝试向量相似度函数，它返回n维空间中两个Knn向量之间的相似度。参考solr 9.4向量相似度函数

通过过滤查询高效查询 Solr 9 的相似度分数

问题描述投票：0回答：1

1个回答

最新问题

通过过滤查询高效查询 Solr 9 的相似度分数

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1