我正在使用 python 和 langchain RetrievalQA 进行 RAG。这是我编写的代码示例:
loader = UnstructuredPDFLoader(filename, mode="elements")
data = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size = 300, chunk_overlap = 0)
splits = text_splitter.split_documents(data)
splits = chromautils.filter_complex_metadata(splits)
vectorstore = Chroma.from_documents(documents=splits, embedding=embedding_function, persist_directory=PERSIST_DIRECTORY)
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0, openai_api_key=OPENAI_KEY)
qa_chain = RetrievalQA.from_chain_type(
llm,
retriever=vectorstore.as_retriever(),
chain_type_kwargs={"prompt": prompt},
return_source_documents=True
)
result = qa_chain({"query": question})
它仅返回问题的答案。如何从
splits
返回用作回答问题的上下文的文档?
返回来源
在问答应用程序中,向用户展示来源通常很重要 用于生成答案。最简单的方法是 让链返回在每个中检索到的文档 一代。
from langchain_core.runnables import RunnableParallel
rag_chain_from_docs = (
RunnablePassthrough.assign(context=(lambda x: format_docs(x["context"])))
| prompt
| llm
| StrOutputParser()
)
rag_chain_with_source = RunnableParallel(
{"context": retriever, "question": RunnablePassthrough()}
).assign(answer=rag_chain_from_docs)
rag_chain_with_source.invoke("What is Task Decomposition")