我的应用程序:将CSV文件加载到知识图中(使用KnowledgeGraphIndex)并使用LLM(HuggingFaceH4/zephyr-7b-beta)从图形存储(SimpleGraphStore)中检索答案。
我的问题:我想将多个 CSV 文件传递到知识图谱中, 我正在使用 CSVLoader ,当我运行 KnowledgeGraphIndex 时, 我收到此错误:AttributeError:“文档”对象没有属性“get_doc_id”
这就是我加载 CSV 的方式:
`from langchain.document_loaders import CSVLoader
from langchain.text_splitter import CharacterTextSplitter
csv_loader = CSVLoader("/content/Train-Set.csv")
data = csv_loader.load()
splitter = CharacterTextSplitter(separator = "\n",
chunk_size=500,
chunk_overlap=0,
length_function=len)
documents = splitter.split_documents(data)`
这是我的知识图索引:
`index = KnowledgeGraphIndex.from_documents(
documents,
storage_context=storage_context,
include_embeddings=True,
max_triplets_per_chunk=2,
embed_model=embed_model,
)``
尝试使用
Document
中的 langchain.docstore
,并为 doc_id
分配唯一的 ID:
from langchain.document_loaders import CSVLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.docstore.document import Document
csv_loader = CSVLoader("/content/Train-Set.csv")
data = csv_loader.load()
splitter = CharacterTextSplitter(separator="\n",
chunk_size=500,
chunk_overlap=0,
length_function=len)
documents = splitter.split_documents(data)
# Assign a unique identifier to each document
for i, doc in enumerate(documents):
doc.metadata["doc_id"] = f"doc_{i}"
index = KnowledgeGraphIndex.from_documents(
documents,
storage_context=storage_context,
include_embeddings=True,
max_triplets_per_chunk=2,
embed_model=embed_model,
)