将最新的 CSV 与目录中的所有 CSV 进行比较，并从最新的中删除匹配项，并使用 python 在新文件中写入新行

Question

代码将无法正常工作，例如当文件名是别的东西时。

例如当文件名为carre123.csv时，它不会正确比较。但是当我将文件名更改为 test123.csv 时，它工作正常。

这是代码

import os
import pandas as pd

# Set the directory where the CSV files are stored
directory = '/PATH/csv-files'

# Get a list of all the CSV files in the directory
csv_files = [os.path.join(directory, f) for f in os.listdir(directory) if f.endswith('.csv')]
#print(csv_files)

# Sort the CSV files by modification time and select the last file as the latest file
latest_file = sorted(csv_files, key=os.path.getmtime)[-1]
#print(latest_file)

# Read the contents of the latest CSV file into a pandas DataFrame
latest_data = pd.read_csv(latest_file)
#print(latest_data)

# Iterate over all the previous CSV files
for csv_file in csv_files[:-1]:
    # Read the contents of the previous CSV file into a pandas DataFrame
    prev_data = pd.read_csv(csv_file)
    #print(prev_data)

    # Identify the rows in the latest CSV file that match the rows in the previous CSV file
    matches = latest_data.isin(prev_data.to_dict('list')).all(axis=1)
    print(matches)

    # Remove the matching rows from the latest CSV file
    latest_data = latest_data[~matches]

# Write the remaining rows in the latest CSV file to a new file
latest_data.to_csv('/NEWPATH/diff.csv', index=False)

当文件名为carre123.csv 时，无法正确比较。但是当我将文件名更改为 test123.csv 时，它工作正常。

Answer 1

我认为你的代码有一个错误，这可能是导致问题的原因。

for

循环结束了

csv_files[:-1]

，它没有按修改时间排序，因此根据文件名，这可能会导致循环包含

latest_file

。尝试存储排序后的列表，

sorted(csv_files, key=os.path.getmtime)

，然后为

latest_file

选择最后一个并循环遍历剩余的文件。也许还有其他问题，但根据您提供的示例，这看起来是我能看到的唯一问题，这显然是一个问题。

将最新的 CSV 与目录中的所有 CSV 进行比较，并从最新的中删除匹配项，并使用 python 在新文件中写入新行

问题描述投票：0回答：1

1个回答

最新问题

将最新的 CSV 与目录中的所有 CSV 进行比较，并从最新的中删除匹配项，并使用 python 在新文件中写入新行

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1