使用 flan-t5-small 通过 LLM（私人）和 LangChain 或 LlamaIndex 进行摘要和主题提取

Question

有人使用 Langchain 或 LlamaIndex 导入来处理超过 512 个代币的单个文档吗？是的，我知道还有其他方法来处理它，但是很难在网上找到详细说明如何将 LangChain 与可通过 API 调用访问的私有 LLM 一起使用的文档。大多数文档涉及商业化的法学硕士。如果您有的话，我将不胜感激一些策略或示例代码，它们将解释如何使用 langchain 处理 llm 包装器，特别是用于摘要和主题提取。

Answer 1

这里是使用

LangChain

编排 open-source LLM 的示例代码，用于嵌入和 txt2txtGen。文档是否具有 >512 个标记并不重要。您可以使用

loader.load_and_split()

函数加载大文档并将其拆分为较小的块（PDF 文档参考 > https://python.langchain.com/docs/modules/data_connection/document_loaders/pdf）

from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain.llms import HuggingFaceHub
from langchain.prompts import PromptTemplate
from langchain.chains.retrieval_qa.base import RetrievalQA

# embeddings = HuggingFaceEmbeddings(model_name='bert-base-uncased')
embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2")
# docsearch = FAISS.from_documents(texts, embeddings)
docsearch = FAISS.from_texts(
    ["harry potter's owl is in the castle. The book is about 'To Kill A Mocking Swan'. There is another monkey"], embeddings)

llm = HuggingFaceHub(repo_id = "google/flan-t5-base",
                     model_kwargs={"temperature":0.6,"max_length": 500, "max_new_tokens": 200
                                  })

prompt_template = """
Compare the book given in question with others in the retriever based on genre and description.
Return a complete sentence with the full title of the book and describe the similarities between the books.

question: {question}
context: {context}
"""

prompt = PromptTemplate(template=prompt_template, input_variables=["context", "question"])
retriever=docsearch.as_retriever()
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever, chain_type_kwargs = {"prompt": prompt})
print(qa.run({"query": "Which book except 'To Kill A Mocking Bird' is similar to it?"}))

使用 flan-t5-small 通过 LLM（私人）和 LangChain 或 LlamaIndex 进行摘要和主题提取

问题描述投票：0回答：1

1个回答

最新问题

使用 flan-t5-small 通过 LLM（私人）和 LangChain 或 LlamaIndex 进行摘要和主题提取

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1