问题:比较两个 CSV 文件并将第一个文件中未找到的行输出到第三个文件中
嗨,
我有两个 CSV 文件,分别是 file1.csv 和 file2.csv。两者都有多个列,如下所示:
a.文件1.csv
SreyTey1998,963229606,7854138709318981862,Smaradey Chan,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
Srey_Tey_1,2079816779,6921382059939144796,Srey tey,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
sreytey123,5316691604,668712126044928206,Phat SreyTey,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
Sreytey168,5455045488,-714912998136226691,Vong Soksreytey,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
SreyTey99,5653783510,-2575791274366210473,Oun Tey,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
sreytey1919,5819100400,3174041461521242292,Tey Tey,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
Sreytey6666,6001252515,1586106578669001327,Srey Tey,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
SreyTey7777,6026179841,5596849859821333867,Srey Tey,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
Ahh_Nak86,5637888996,-1267155033181296023,Yìì Ng,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
b.文件2.csv
SreyTey1998,963229606,7854138709318981862,Smaradey Chan,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
Srey_Tey_1,2079816779,6921382059939144796,Srey tey,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
sreytey123,5316691604,668712126044928206,Phat SreyTey,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
Sreytey168,5455045488,-714912998136226691,Vong Soksreytey,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
SreyTey99,5653783510,-2575791274366210473,Oun Tey,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
sreytey1919,5819100400,3174041461521242292,Tey Tey,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
AhhLyn1213,808888756,2482753619838480608,Ly-លី🌈â¤ï¸,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
ahhly09,938983724,-8302570306911018211,方塔莉,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
ahh_vong,873218908,1743989214734522713,Mek Sreyvong,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
ahhnitaccd,5420585351,-6331445989210603589,NITA CCD,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
c.输出文件 file2-nodups.csv 应该是:
AhhLyn1213,808888756,2482753619838480608,Ly-លី🌈â¤ï¸,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
ahhly09,938983724,-8302570306911018211,方塔莉,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
ahh_vong,873218908,1743989214734522713,Mek Sreyvong,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
ahhnitaccd,5420585351,-6331445989210603589,NITA CCD,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
我尝试过以下代码:
with open('file1.csv', 'r', encoding="utf8") as t1:
fileone = t1.readlines()
with open('file2.csv', 'r', encoding="utf8") as t2:
filetwo = t2.readlines()
# scans through the two files and writes differences to new csv
with open('file2-nodups.csv', 'w', encoding="utf8") as outFile:
for line in filetwo:
if line not in fileone:
outFile.write(line)
以上不起作用 - 因为输出文件(file2-nodups.csv 与 file2.csv 具有相同的内容
非常感谢任何建议。
为了确保尾随空格或换行符不会干扰 CSV 文件的比较,您可以修改代码以在比较之前从每行中删除这些字符。这是包含此调整的代码的更新版本:
with open('file1.csv', 'r', encoding="utf8") as t1:
fileone = [line.strip() for line in t1.readlines()]
with open('file2.csv', 'r', encoding="utf8") as t2:
filetwo = [line.strip() for line in t2.readlines()]
with open('file2-nodups.csv', 'w', encoding="utf8") as outFile:
for line in filetwo:
if line not in fileone:
outFile.write(line + '\n')
在执行比较之前,此代码会删除两个文件中每一行的所有前导或尾随空格。当将第二个文件特有的行写入输出文件时,它确保每行的格式正确,末尾有换行符。