如何对langchain的MongoDBAtlasVectorSearch“similarity_search_with_score”应用过滤器?

问题描述 投票:0回答:1

我正在使用 MongoDBAtlasVectorSearch 并且 ì 想要搜索最相似的文档,因此我使用函数 similarity_search_with_score

但是,我似乎无法在此相似性_search_with_score 函数中添加过滤器。

这是我的代码:

vector_search = MongoDBAtlasVectorSearch(
        collection=client[os.getenv("MONGODB_DB")]["files"],
        embedding=embeddings,
        index_name=os.getenv("ATLAS_VECTOR_SEARCH_INDEX_NAME"),
    )

results = vector_search.similarity_search_with_score(
        query="What are the engagements of the company",
        k=5,
        pre_filter={
            "compound": {
                "filter": [
                    {"equals": {"path": "uploaded_by", "value": chat_owner}},
                    {"in": {"path": "file_name", "values": file_names}},
                ]
            }
        },
    ) 

这是我的索引:

{
  "mappings": {
    "dynamic": true,
    "fields": {
      "embedding": {
        "dimensions": 1536,
        "similarity": "cosine",
        "type": "knnVector"
      },
      "file_name": {
        "normalizer": "lowercase",
        "type": "token"
      },
      "uploaded_by": {
        "normalizer": "lowercase",
        "type": "token"
      }
    }
  }
}

但是,这给了我以下错误:

pymongo.errors.OperationFailure: "knnBeta.filter.compound.filter[1].in.value" is required, full error: {'ok': 0.0, 'errmsg': '"knnBeta.filter.compound.filter[1].in.value" is required', 'code': 8, 'codeName': 'UnknownError', '$clusterTime': {'clusterTime': Timestamp(1704804627, 1), 'signature': {'hash': b'\xfa\x15s+Q\x1d\xa86]R\xb2!\x9d\xc5b-G\xce\xa6S', 'keyId': 7283272637088792583}}, 'operationTime': Timestamp(1704804627, 1)}

我也这样尝试过:

        pre_filter={
            "$and": [
                {"uploaded_by": {"$eq": chat_owner}},
                {"file_name": {"$in": file_names}},
            ]
        },

但是我收到了这个错误:

pymongo.errors.OperationFailure: "knnBeta.filter" one of [autocomplete, compound, embeddedDocument, equals, exists, geoShape, geoWithin, in, knnBeta, moreLikeThis, near, phrase, queryString, range, regex, search, span, term, text, wildcard] must be present, full error: {'ok': 0.0, 'errmsg': '"knnBeta.filter" one of [autocomplete, compound, embeddedDocument, equals, exists, geoShape, geoWithin, in, knnBeta, moreLikeThis, near, phrase, queryString, range, regex, search, span, term, text, wildcard] must be present', 'code': 8, 'codeName': 'UnknownError', '$clusterTime': {'clusterTime': Timestamp(1704802325, 9), 'signature': {'hash': b'`\xd27-\x81+\x16\xd0a\x14\xc7\x99\xa8\x05|Sx?\x0e:', 'keyId': 7283272637088792583}}, 'operationTime': Timestamp(1704802325, 9)}
WARNING:  StatReload detected changes in 'src/routes/chats/chats.py'. Reloading...

如何正确使用similarity_search_with_score中的过滤器?

python mongodb langchain vector-search
1个回答
0
投票

查看您的错误消息

'“knnBeta.filter.compound.filter1.in.value”是必需的'

并且基于 MongoDB 论坛中的这个答案 看起来您的 in

 子句正在使用 
values
 而不是 
value
。举个例子:

"in": { "path": "fileName", "value": model_documents, }
    
© www.soinside.com 2019 - 2024. All rights reserved.