我有这个要求,我想创建一个知识检索器,它将调用 API 来获取最接近的匹配信息,我知道我们在 langchain 中与多个向量存储进行了这些集成,但我们有要求,我们必须调用用于查找最接近匹配文档的 API 我们如何在 langchain 中创建自定义检索器,它将调用此 API 来获取最接近的匹配信息
我正在尝试在 langchain 中构建自定义检索器,但仍然无法弄清楚
定制检索器:
class URRetrival(BaseRetriever):
def __int__(self):
pass
def _get_relevant_documents(
self, query: str, *, run_manager: CallbackManagerForRetrieverRun
) -> List[Document]:
# response = URAPI(request)
# convert response (json or xml) in to langchain Document like doc = Document(page_content="response docs")
# dump all those result in array of docs and return below
return result_docs
async def _aget_relevant_documents(
self,
query: str,
*,
run_manager: AsyncCallbackManagerForRetrieverRun,
**kwargs: Any,
) -> List[Document]:
raise NotImplementedError()
URRetrival :将是您的检索名称 _get_relevant_documents :当您的链运行并寻找相关文档时会调用
现在您可以在 _get_relevant_documents 方法中添加任何类型的实现并返回与您相关的任何内容
在您的 API 搜索结果中,您可以使用工具进行自我搜索,这可能会帮助您找到完全匹配的内容。最好的选择是矢量相似性搜索,您有很多功能来控制结果。
from langchain import OpenAI, SerpAPIWrapper
from langchain.agents import initialize_agent, Tool
from langchain.agents import AgentType
# you can define a different llm
llm = OpenAI(temperature=0)
search = SerpAPIWrapper()
tools = [
Tool(
name="Intermediate Answer",
func=search.run,
description="useful for when you need to ask with search",
)
]
self_ask_with_search = initialize_agent(
tools, llm, agent=AgentType.SELF_ASK_WITH_SEARCH, verbose=True
)
self_ask_with_search.run(
"You can apply more filters with this query here ?"
)
您可以使用以下实现调用任何 api,并使用该 api 作为检索器。就我而言,我无法访问 redis 矢量存储。但是有一个 API 向我开放,使用它我可以根据阈值获取 n 条记录(文本块)。
python 中的实现:
from langchain.schema.retriever import BaseRetriever, Document
from typing import TYPE_CHECKING, Any, Dict, List, Optional
from langchain.callbacks.manager import CallbackManagerForRetrieverRun
class CustomRetriever(BaseRetriever):
api_token_key:str
api_end_point:str
api_redis_index_name:str
threshold_for_search_api:str
def _get_relevant_documents(self, query: str, *, run_manager: CallbackManagerForRetrieverRun) -> List[Document]:
"""
_get_relevant_documents is function of BaseRetriever implemented here
:param query: String value of the query
"""
result_docs = list()
headers = {
'apiToken': f'{self.api_token_key}',
'Content-Type': 'application/json'
}
data = '{"output":"*","query": "@similarity:' + query + '}","index":"' + self.api_redis_index_name + '"}'
response = requests.post(f'{self.api_end_point}',
headers=headers,
data=data)
list_of_val = response.json()
least_score = 1
least_score_index = 0
threshold_val = float(self.threshold_for_search_api)
for c, i in enumerate(list_of_val):
fetched_score = i["score"]
if least_score < fetched_score:
least_score = fetched_score
least_score_index = c
if fetched_score <= threshold_val:
doc = Document(page_content=i["content"])
result_docs.append(doc)
if len(result_docs) == 0:
i = list_of_val[least_score_index]
doc = Document(page_content=i["content"])
result_docs.append(doc)
return result_docs
然后将检索器初始化为:
retriever_r = CustomRetriever(api_token_key=<your key>,
api_end_point=<your api endpoint>,
api_redis_index_name = <your redis index name>,
threshold_for_search_api= "0.5")
然后使用qa链中的检索器:
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
chain_type='map_reduce',
retriever=retriever_r,
chain_type_kwargs={"question_prompt": question_prompt_template, "combine_prompt": combine_prompt_template},
verbose = True
)
现在您有了一个可用于自定义 QA 链的自定义检索器