比较两个 CSV 文件并将第一个文件中未找到的输出行放入 Python 中的第三个文件中

问题描述 投票:0回答:1

问题:比较两个 CSV 文件并将第一个文件中未找到的行输出到第三个文件中

嗨,

我有两个 CSV 文件,分别是 file1.csv 和 file2.csv。两者都有多个列,如下所示:

a.文件1.csv

    SreyTey1998,963229606,7854138709318981862,Smaradey Chan,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
    Srey_Tey_1,2079816779,6921382059939144796,Srey tey,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
    sreytey123,5316691604,668712126044928206,Phat SreyTey,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
    Sreytey168,5455045488,-714912998136226691,Vong Soksreytey,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
    SreyTey99,5653783510,-2575791274366210473,Oun Tey,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
    sreytey1919,5819100400,3174041461521242292,Tey Tey,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
    Sreytey6666,6001252515,1586106578669001327,Srey Tey,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
    SreyTey7777,6026179841,5596849859821333867,Srey Tey,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
    Ahh_Nak86,5637888996,-1267155033181296023,Yìì Ng,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461

b.文件2.csv

    SreyTey1998,963229606,7854138709318981862,Smaradey Chan,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
    Srey_Tey_1,2079816779,6921382059939144796,Srey tey,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
    sreytey123,5316691604,668712126044928206,Phat SreyTey,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
    Sreytey168,5455045488,-714912998136226691,Vong Soksreytey,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
    SreyTey99,5653783510,-2575791274366210473,Oun Tey,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
    sreytey1919,5819100400,3174041461521242292,Tey Tey,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
    AhhLyn1213,808888756,2482753619838480608,Ly-លី🌈â¤ï¸,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
    ahhly09,938983724,-8302570306911018211,方塔莉,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
    ahh_vong,873218908,1743989214734522713,Mek Sreyvong,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
    ahhnitaccd,5420585351,-6331445989210603589,NITA CCD,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461

c.输出文件 file2-nodups.csv 应该是:

    AhhLyn1213,808888756,2482753619838480608,Ly-លី🌈â¤ï¸,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
    ahhly09,938983724,-8302570306911018211,方塔莉,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
    ahh_vong,873218908,1743989214734522713,Mek Sreyvong,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
    ahhnitaccd,5420585351,-6331445989210603589,NITA CCD,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461

我尝试过以下代码:

    with open('file1.csv', 'r', encoding="utf8") as t1:
        fileone = t1.readlines()
    with open('file2.csv', 'r', encoding="utf8") as t2:
        filetwo = t2.readlines()

    # scans through the two files and writes differences to new csv
    with open('file2-nodups.csv', 'w', encoding="utf8") as outFile:
        for line in filetwo:
            if line not in fileone:
                outFile.write(line)
                

以上不起作用 - 因为输出文件(file2-nodups.csv 与 file2.csv 具有相同的内容

非常感谢任何建议。

python csv
1个回答
0
投票

为了确保尾随空格或换行符不会干扰 CSV 文件的比较,您可以修改代码以在比较之前从每行中删除这些字符。这是包含此调整的代码的更新版本:

with open('file1.csv', 'r', encoding="utf8") as t1:
     fileone = [line.strip() for line in t1.readlines()]

with open('file2.csv', 'r', encoding="utf8") as t2:
     filetwo = [line.strip() for line in t2.readlines()]
    
with open('file2-nodups.csv', 'w', encoding="utf8") as outFile:
     for line in filetwo:
         if line not in fileone:
            outFile.write(line + '\n')

在执行比较之前,此代码会删除两个文件中每一行的所有前导或尾随空格。当将第二个文件特有的行写入输出文件时,它确保每行的格式正确,末尾有换行符。

© www.soinside.com 2019 - 2024. All rights reserved.