我正在尝试使用Langchain的
MapReduceDocumentsChain
结合OpenAI的API,通过以下链(复制自Langchain的文档)来总结一个大文档的内容:
def _create_document_summary_chain(self) -> LLMChain:
"""
Create the summarization chain
"""
map_chain = LLMChain(
llm=self._quick_scan_model.llm, prompt=SummaryPrompt.get_document_summary_map_prompt()
)
reduce_chain = LLMChain(
llm=self._quick_scan_model.llm, prompt=SummaryPrompt.get_document_summary_reduce_prompt()
)
combine_documents_chain = StuffDocumentsChain(
llm_chain=reduce_chain, document_variable_name="docs"
)
reduce_documents_chain = ReduceDocumentsChain(
combine_documents_chain=combine_documents_chain,
collapse_documents_chain=combine_documents_chain,
token_max=4000
)
return MapReduceDocumentsChain(
llm_chain=map_chain,
reduce_documents_chain=reduce_documents_chain,
document_variable_name="docs",
return_intermediate_steps=False,
)
对链的调用是通过以下函数进行的,该函数接收预先分块的 Langchain
Document
对象列表:
async def get_document_summary(self, chunks: list[Document]) -> str:
"""
Get the summary for a given document text. Use the Langchain map reduce summarizer.
"""
for i in range(self._retries):
try:
response = await self._summary_map_reduce_chain.ainvoke(chunks)
return response['output_text']
except Exception as e:
print(
f"Document summarizer attempt {i + 1}/{self._retries} failed...")
print(f"Error: {e}")
continue
return ""
问题是,当在大文档(> 500 块)上运行链时,我收到 TPM 超出错误,如下所示:
An error occurred: RateLimitError: Rate limit reached for default-gpt-3.5-turbo in organization org-xxx on tokens per min. Limit: 90000 / min. Current: 89369 / min. Contact us through our help center at help.openai.com if you continue to have issues.
我尝试环顾四周,但似乎没有人遇到与我相同的问题,或者至少在使用与我相同的链条时没有。
我也尝试过调整块的大小和
token_max
参数,但无济于事。