llamaindex 查询性能缓慢

Question

我有一个与 llamindex 查询相关的问题，我无法找到解决方案。基本上，我正在尝试使用 llamindex 构建一个分类器。我有大约 700 个文档（不是很大 - 每个文档只有几个段落）。我将其分为火车测试并基于火车建立索引。问题是我的查询每个文档大约需要 1 分钟，而测试集中有 100 多个文档，我需要超过 2 小时。有办法解决吗？下面是我如何评估的代码片段。

vector_query_engine = my_index.as_query_engine(similarity_top_k=3,
                                                       text_qa_template=text_qa_template)

df['PredictedOutcome'] = df['doc_text'].apply(lambda x: vector_query_engine.query(x))

Answer 1

在这里您可以找到一种效果更快的方法。这只是一个入门代码，您可以尝试不同的索引、检索器和查询，以便更快地工作。

开始步骤和代码

我从 Kaggle 找到了这个数据集，我认为它与你的类似。该数据集包含来自 5 个不同类别的单独文件中的大约 500 个文本块。
我将所有文件重命名为_.txt。然后将文件分为“训练”和“测试”两部分，每个类别有 200 个文件。结果，我有 1000 个单独的文本文件用于训练和测试数据。
使用 SimpleDirectoryReader 加载文档时，我将文件元数据作为参数。请记住，在上一步中，我们更改了文档名称并向文件名添加了类别。这将导致嵌入和索引每个类别将具有更接近的向量。
加载文档后，我使用 VectorStoreIndex 并使用该索引作为查询引擎。
然后，我查询测试文件夹中的每个文件。

--> 1000 个文件花费的总时间为 9 分钟。

filename_fn = lambda filename: {"file_name": filename}

documents = SimpleDirectoryReader(
    "train",
    file_metadata=filename_fn
).load_data()

index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

file_contents = []
def read_files_in_folder(folder_path):
    for filename in os.listdir(folder_path):
        file_path = os.path.join(folder_path, filename)
        if os.path.isfile(file_path):
            with open(file_path, 'r') as file:
                file_contents.append(file.read()) 
    return file_contents

folder_path = 'test' 
file_contents = read_files_in_folder(folder_path)

start_time = time.time()

content_categories = []
for content in file_contents:
    prompt = f'''
    Take the {content} and tell the category. Possible categories are: [business,entertainment, politics,sport, tech]
    '''
    response = query_engine.query(
     prompt
    )
    content_categories.append(response)

end_time = time.time()
elapsed_time = end_time - start_time
print(f"Elapsed time: {elapsed_time} seconds")

经过时间：599.567540884018秒

进一步发展

使用 Llama-Index 进行文档分类有一个未解决的问题，您可能需要检查一下。
自定义文档
关键字表索引也很有用。
LangChain

如果您还有任何疑问，请随时与我们联系。希望这个回答对你有用🍀

llamaindex 查询性能缓慢

问题描述投票：0回答：1

1个回答

最新问题

llamaindex 查询性能缓慢

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1