我正在尝试找出适用于两个 csv 文件的 python 代码。目标是找到源文件中阈值大于或等于 50% 但目标文件中缺失的所有行,然后可以手动将其复制到目标文件。
这是代码。
import pandas as pd
# Define file paths (replace with your actual paths)
source_file = "C:/Users/sharsa07/Desktop/pipeline/gso_orb_ticket_list.csv"
dest_file = "C:/Users/sharsa07/Desktop/pipeline/gso_pipeline.csv"
# Read excel files into DataFrames
df_source = pd.read_csv(C:/Users/sharsa07/Desktop/pipeline/gso_orb_ticket_list.csv)
df_dest = pd.read_csv(C:/Users/sharsa07/Desktop/pipeline/gso_pipeline.csv)
# Find the common column name (assuming the same column name in both files, case-insensitive)
common_col = "a" # Assuming column names are the same (case-insensitive)
# Merge DataFrames based on the common column (outer join to keep unmatched rows)
merged_df = df_source.merge(df_dest[[common_col]], how="outer", on=common_col.lower())
# Calculate threshold value based on 'T' column mean in the source DataFrame
threshold_value = df_source['T'].mean() * 0.5
# Filter merged DataFrame to rows where 'T' is greater than or equal to the threshold and source column is missing in destination
filtered_df = merged_df[(merged_df['T'] >= threshold_value) & (merged_df[common_col.lower()].isna())]
# Get source column names from the filtered DataFrame (excluding the common column)
source_cols = set(filtered_df.columns) - {common_col.lower()}
# Print the column names that meet the criteria
print("Columns to be checked:", source_cols)
我收到此错误消息。有人可以帮我调试这个吗?谢谢 第 8 行 [2] 中的单元格 df_source = pd.read_csv(C:/Users/sharsa07/Desktop/pipeline/gso_orb_ticket_list.csv) ^ 语法错误:语法无效
有人可以帮我调试这段代码吗?
我发现这是由于 pd.read_csv() 函数调用中的语法不正确造成的。我试图直接传递文件路径,就好像它们是变量一样,但它们应该是字符串。我通过将文件路径括在引号中以使它们成为字符串文字来解决此问题。
df_source = pd.read_csv("C:/Users/sharsa07/Desktop/pipeline/gso_orb_ticket_list.csv") df_dest = pd.read_csv("C:/Users/sharsa07/Desktop/pipeline/gso_pipeline.csv")