通过Langchain获取信息来源

Question

我正在使用 langchain 库将我公司的信息保存在矢量数据库中，当我查询信息时，结果很好，但也需要一种方法来恢复信息的来源 - 例如来源：“www.site .com/about”或至少“文档 156”。你们有人知道该怎么做吗？

编辑：目前，我正在使用

docsearch.similarity_search(query)

，只返回page_content，但元数据为空

我正在摄取这段代码，但我完全愿意改变。

db = ElasticVectorSearch.from_documents(
        documents,
        embeddings,
        elasticsearch_url="http://localhost:9200",
        index_name="elastic-index",
    )

Answer 1

您可以通过将每个文档上的

document.metadata

设置为字典来向每个文档添加元数据。举一些例子，字典可能类似于

{"source": "www.site.com/about"}

或

{"id": "456"}

。然后，将这些文件传递给

from_documents()

。

稍后，当您从其中一种查询方法获取

Document

对象时，您可以使用

document.metadata

来获取元数据。

Answer 2

作为对Nick ODell的answer的补充，这里有一段代码显示了

metadata

的使用：

import pprint
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings
model = "sentence-transformers/multi-qa-MiniLM-L6-cos-v1"
embeddings = HuggingFaceEmbeddings(model_name = model)

def main():
    # Uncomment the following line if you need to initialize FAISS with no AVX2 optimization
    # os.environ['FAISS_NO_AVX2'] = '1'

    from langchain.docstore.document import Document
    doc1 = Document(page_content="The sky is blue.", metadata={"document_id": "10"})
    doc2 = Document(page_content="The forest is green", metadata={"document_id": "62"})
    docs = []
    docs.append(doc1)
    docs.append(doc2)

    for doc in docs:
        doc.metadata['summary'] = 'hello'

    pprint.pprint(docs)
    db = FAISS.from_documents(docs, embeddings)
    db.save_local("faiss_index")
    new_db = FAISS.load_local("faiss_index", embeddings)

    query = "Which color is the sky?"
    docs = new_db.similarity_search_with_score(query)
    print('Retrieved docs:', docs)

if __name__ == '__main__':
    main()

使用 Python 3.10 进行测试。

通过Langchain获取信息来源

问题描述投票：0回答：2

2个回答

最新问题

通过Langchain获取信息来源

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2