使用 Langchain 进行检索增强生成的问答

问题描述 投票:0回答:1

I have been doing a POC to implement RAG driven model for my AI/ML use case.

The use case is to "
Find Similar and duplicate controls by comparing each ID with every other ID, Generate similarity scores and list the pairs which exceeds a threshold of 80-87 for similar controls and exceeding above 95 for duplicate controls
"

The code snippet is :

loader = CSVLoader(file_path="control.csv")

data = loader.load()

text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=50)

chunks = text_splitter.split_documents(data)

vectorstore = Chroma.from_documents(documents=chunks, embedding=OpenAIEmbeddings())

retriever = vectorstore.as_retriever()

template = """You are an assistant for question-answering tasks.

Use the following pieces of retrieved context to answer the question.

If you don't know the answer, just say that you don't know.

Use three sentences maximum and keep the answer concise.

Question: {question}

Context: {context}

Answer:

"""

prompt = ChatPromptTemplate.from_template(template)

llm = ChatOpenAI(temperature=0, model="gpt-3.5-turbo",verbose=True)

rag_chain = (  {"context": retriever,  "question": RunnablePassthrough()}      | prompt      | llm     | StrOutputParser()  )

query = "FInd Similar controls by comparing each ID with every other ID in the document, combining their Name and Description. Calculate similarity scores between them and list all the pairs that is exceeding a threshold of 80-87for similar controls and above 95 for duplicate controls."

rag_chain.invoke(query)

The output i got was :

1. There are a total of 6 controls formed by comparing each ID with every other ID in the document. The similarity scores between them can be calculated and pairs exceeding a threshold of 80 can be listed in the output.

2. I don't Know

My expected outcome is to print the list of Similar and Duplicate pairs from the data , it has around 3500+ data.

But i dont find to see the expected output here ? Iam not sure where am wrong. Also would like to know if i have mentioned the right prompt for the scenario.

Also, I have tried the same prompt where i have not implemented RAG , but i could proper results , it just a connection made with Langchain and OpenAI for interaction.

I would like to know where am wrong and what needs to be corrected in order to get the right expected outcome.

langchain retrieval-augmented-generation
1个回答
0
投票

当你说:

块引用 我的预期结果是打印数据中相似和重复对的列表,它有大约 3500 多个数据。

首先,在提示中您需要明确说明您希望输出的方式。

喜欢:

块引用 以 CSV 格式输出结果,仅列出相似和重复的对。

其次,您可以尝试使用不同的输出解析器,例如Pydantic结构化输出解析器

要小心 Pydantic 解析器,因为它对 LangChain 的版本更改很敏感。

第三,你应该实现一个系统提示,向LLM给出准确的指示。你需要这样做,因为在LangChain中如果你不提供系统提示,LangChain会提供一个可能与你的问题冲突的基本提示。

希望这有帮助!

© www.soinside.com 2019 - 2024. All rights reserved.