根据 Pandas 查询引擎的文档,代码集仅允许连接一个 df。我想连接到多个 dfs。这通过 SmartDataLake 在 PandasAI 上有效,但我更喜欢 Pandas 查询引擎给出的描述性答案,因为它再次通过 LLM 循环结果。有什么办法可以让它发挥作用吗?
文档代码:
df = pd.read_csv("./titanic_train.csv") #Only 1 dataframe
instruction_str = (
"1. Convert the query to executable Python code using Pandas.\n"
"2. The final line of code should be a Python expression that can be called with the `eval()` function.\n"
"3. The code should represent a solution to the query.\n"
"4. PRINT ONLY THE EXPRESSION.\n"
"5. Do not quote the expression.\n"
)
pandas_prompt_str = (
"You are working with a pandas dataframe in Python.\n"
"The name of the dataframe is `df`.\n"
"This is the result of `print(df.head())`:\n"
"{df_str}\n\n"
"Follow these instructions:\n"
"{instruction_str}\n"
"Query: {query_str}\n\n"
"Expression:"
)
response_synthesis_prompt_str = (
"Given an input question, synthesize a response from the query results.\n"
"Query: {query_str}\n\n"
"Pandas Instructions (optional):\n{pandas_instructions}\n\n"
"Pandas Output: {pandas_output}\n\n"
"Response: "
)
pandas_prompt = PromptTemplate(pandas_prompt_str).partial_format(
instruction_str=instruction_str, df_str=df.head(5)
)
针对多个数据框尝试以下代码
instruction_str = (
"1. Convert the query to executable Python code using Pandas.\n"
"2. The final line of code should be a Python expression that can be called with the `eval()` function.\n"
"3. The code should represent a solution to the query.\n"
"4. PRINT ONLY THE EXPRESSION.\n"
"5. Do not quote the expression.\n"
)
pandas_prompt_str = (
"You are working with 3 pandas dataframes in Python.\n"
"The name of the dataframes is `df1`, 'df2' and 'df3'.\n"
"This is the result of `print(df1.head())`:\n"
"{df1_str}\n\n"
"This is the result of `print(df2.head())`:\n"
"{df2_str}\n\n"
"This is the result of `print(df3.head())`:\n"
"{df3_str}\n\n"
"Follow these instructions:\n"
"{instruction_str}\n"
"Query: {query_str}\n\n"
"Expression:"
)
response_synthesis_prompt_str = (
"Given an input question, synthesize a response from the query results.\n"
"Query: {query_str}\n\n"
"Pandas Instructions (optional):\n{pandas_instructions}\n\n"
"Pandas Output: {pandas_output}\n\n"
"Response: "
)
pandas_prompt1 = PromptTemplate(pandas_prompt_str).partial_format(
instruction_str=instruction_str, df1_str=df1.head(1)
)
pandas_output_parser1 = PandasInstructionParser(df1)
pandas_prompt2 = PromptTemplate(pandas_prompt_str).partial_format(
instruction_str=instruction_str, df2_str=df2.head(1)
)
pandas_output_parser2 = PandasInstructionParser(df2)
pandas_prompt3 = PromptTemplate(pandas_prompt_str).partial_format(
instruction_str=instruction_str, df3_str=df3.head(1)
)
pandas_output_parser3 = PandasInstructionParser(df3)
response_synthesis_prompt = PromptTemplate(response_synthesis_prompt_str)
我们收到以下错误
ValueError: Module input keys must have exactly one key if dest_key is not specified. Remaining keys: in module: {'df2_str', 'query_str', 'df1_str'}
merged_df = pd.merge(df1, df2, on='common_column')
OR
query_engine.add_dataframe('merged_df', merged_df)
您可以访问我的博客了解有关 pandas 的更多信息,也可以访问 pandas 官方网站
我的博客 或者 熊猫官方网站