我有两个不同的数据集。一个数据集描述级别和位置(包含 4 个文件)。第二个数据集描述了技术和位置(包含 3 个文件)。
import os
import pandas as pd
import glob
technology = glob.glob("C:\\path\\*.xlsx", recursive = True)
level = glob.glob("C:\\path\\*.xlsx", recursive = True)
d = {}
for level, technology in zip (level, technology):
d[level technology] = pd.merge(technology, level, how= "inner",left_on=["Location"],right_on=["Location"])
d.to_excel(d[level technology]+ '.xlsx')
我使用的方法正确吗?目前我收到以下错误消息:
类型错误:只能合并 Series 或 DataFrame 对象,传递了
您遇到的问题源于对 pandas.merge 和文件处理在这种情况下如何工作的误解。您的技术和级别变量是文件路径(字符串)列表,而不是 DataFrame 对象。您需要将这些文件加载到 pandas DataFrames 中,然后才能合并它们。
import os
import pandas as pd
import glob
technology_files = glob.glob("C:\\path\\technology*.xlsx", recursive=True)
level_files = glob.glob("C:\\path\\level*.xlsx", recursive=True)
output_dir = "C:\\path\\merged_files\\"
os.makedirs(output_dir, exist_ok=True)
merged_files = {}
for technology_path in technology_files:
for level_path in level_files:
# Load the current technology and level files into DataFrames
technology_df = pd.read_excel(technology_path)
level_df = pd.read_excel(level_path)
# Merge on 'Location'
merged_df = pd.merge(technology_df, level_df, how="inner", on="Location")
# Create a unique key/name for the dictionary and the output file
technology_filename = os.path.splitext(os.path.basename(technology_path))[0]
level_filename = os.path.splitext(os.path.basename(level_path))[0]
merged_key = f"{technology_filename}_{level_filename}"
# Store the merged DataFrame in the dictionary
merged_files[merged_key] = merged_df
# Save the merged DataFrame to an Excel file
output_filepath = os.path.join(output_dir, f"{merged_key}.xlsx")
merged_df.to_excel(output_filepath, index=False)
print("Merging and saving completed.")