如何将 Llama-Index Pandas 查询引擎与多个数据帧连接?

问题描述 投票:0回答:1

根据 Pandas 查询引擎的文档,代码集仅允许连接一个 df。我想连接到多个 dfs。这通过 SmartDataLake 在 PandasAI 上有效,但我更喜欢 Pandas 查询引擎给出的描述性答案,因为它再次通过 LLM 循环结果。有什么办法可以让它发挥作用吗?

文档代码:

df = pd.read_csv("./titanic_train.csv") #Only 1 dataframe
instruction_str = (
    "1. Convert the query to executable Python code using Pandas.\n"
    "2. The final line of code should be a Python expression that can be called with the `eval()` function.\n"
    "3. The code should represent a solution to the query.\n"
    "4. PRINT ONLY THE EXPRESSION.\n"
    "5. Do not quote the expression.\n"
)

pandas_prompt_str = (
    "You are working with a pandas dataframe in Python.\n"
    "The name of the dataframe is `df`.\n"
    "This is the result of `print(df.head())`:\n"
    "{df_str}\n\n"
    "Follow these instructions:\n"
    "{instruction_str}\n"
    "Query: {query_str}\n\n"
    "Expression:"
)
response_synthesis_prompt_str = (
    "Given an input question, synthesize a response from the query results.\n"
    "Query: {query_str}\n\n"
    "Pandas Instructions (optional):\n{pandas_instructions}\n\n"
    "Pandas Output: {pandas_output}\n\n"
    "Response: "
)

pandas_prompt = PromptTemplate(pandas_prompt_str).partial_format(
    instruction_str=instruction_str, df_str=df.head(5)
)

针对多个数据框尝试以下代码

instruction_str = (
    "1. Convert the query to executable Python code using Pandas.\n"
    "2. The final line of code should be a Python expression that can be called with the `eval()` function.\n"
    "3. The code should represent a solution to the query.\n"
    "4. PRINT ONLY THE EXPRESSION.\n"
    "5. Do not quote the expression.\n"
)

pandas_prompt_str = (
    "You are working with 3 pandas dataframes in Python.\n"
    "The name of the dataframes is `df1`, 'df2' and 'df3'.\n"
    "This is the result of `print(df1.head())`:\n"
    "{df1_str}\n\n"
    "This is the result of `print(df2.head())`:\n"
    "{df2_str}\n\n"
    "This is the result of `print(df3.head())`:\n"
    "{df3_str}\n\n"
    "Follow these instructions:\n"
    "{instruction_str}\n"
    "Query: {query_str}\n\n"
    "Expression:"
)
response_synthesis_prompt_str = (
    "Given an input question, synthesize a response from the query results.\n"
    "Query: {query_str}\n\n"
    "Pandas Instructions (optional):\n{pandas_instructions}\n\n"
    "Pandas Output: {pandas_output}\n\n"
    "Response: "
)

pandas_prompt1 = PromptTemplate(pandas_prompt_str).partial_format(
    instruction_str=instruction_str, df1_str=df1.head(1)
)
pandas_output_parser1 = PandasInstructionParser(df1)

pandas_prompt2 = PromptTemplate(pandas_prompt_str).partial_format(
    instruction_str=instruction_str, df2_str=df2.head(1)
)
pandas_output_parser2 = PandasInstructionParser(df2)

pandas_prompt3 = PromptTemplate(pandas_prompt_str).partial_format(
    instruction_str=instruction_str, df3_str=df3.head(1)
)
pandas_output_parser3 = PandasInstructionParser(df3)
response_synthesis_prompt = PromptTemplate(response_synthesis_prompt_str)

我们收到以下错误

ValueError: Module input keys must have exactly one key if dest_key is not specified. Remaining keys: in module: {'df2_str', 'query_str', 'df1_str'}
python pandas large-language-model llama-index
1个回答
0
投票
据我所知,在 pandas 查询引擎中,我认为它不允许一次连接两个或多个数据帧。为了实现搜索大量 DataFrame 并利用 Pandas 查询引擎给出的描述性答案的目标,您可能必须设计一种解决方法。您可以将数据框合并为一个数据框并对其进行操作。下面是合并数据框的代码。

merged_df = pd.merge(df1, df2, on='common_column') OR query_engine.add_dataframe('merged_df', merged_df)
您可以访问我的博客了解有关 pandas 的更多信息,也可以访问 pandas 官方网站

我的博客 或者 熊猫官方网站

© www.soinside.com 2019 - 2024. All rights reserved.