Python 2.7 CSV文件读/写\ xef \ xbb \ xbf代码

问题描述 投票:1回答:2

我有一个关于Python 2.7读/写csv文件的问题,带有'utf-8-sig'代码,我的csv。标题是

['\xef\xbb\xbfID;timestamp;CustomerID;Email']

有一些代码("\xef\xbb\xbfID")我从文件A.csv读取,我想写相同的代码和标题文件B.csv

我的打印日志显示:

['\xef\xbb\xbfID;timestamp;CustomerID;Email']

但它看起来像实际的输出文件头

ÔªøID;timestamp

enter image description here

这是代码:

def remove_gdpr_info_from_csv(file_path, file_name, temp_folder, original_header):
    new_temp_folder = tempfile.mkdtemp()
    new_temp_file = new_temp_folder + "/" + file_name
    # Blanked new file
    with open(new_temp_file, 'wb') as outfile:
        writer = csv.writer(outfile, delimiter=";")
        print original_header
        writer.writerow(original_header)
        # File from SFTP
        with open(file_path, 'r') as infile:
            reader = csv.reader(infile, delimiter=";")
            first_row = next(reader)
            email = first_row.index('Email')
            contract_detractor1 = first_row.index('Contact Detractor (Q21)')
            contract_detractor2 = first_row.index('Contact Detractor (Q20)')
            contract_detractor3 = first_row.index('Contact Detractor (Q43)')
            contract_detractor4 = first_row.index('Contact Detractor(Q26)')
            contract_detractor5 = first_row.index('Contact Detractor(Q27)')
            contract_detractor6 = first_row.index('Contact Detractor(Q44)')
            indexes = []
            for column_name in header_list:
                ind = first_row.index(column_name)
                indexes.append(ind)

            for row in reader:
                output_row = []
                for ind in indexes:
                    data = row[ind]
                    if ind == email:
                        data = ''
                    elif ind == contract_detractor1:
                        data = ''
                    elif ind == contract_detractor2:
                        data = ''
                    elif ind == contract_detractor3:
                        data = ''
                    elif ind == contract_detractor4:
                        data = ''
                    elif ind == contract_detractor5:
                        data = ''
                    elif ind == contract_detractor6:
                        data = ''
                    output_row.append(data)
                writer.writerow(output_row)
    s3core.upload_files(SPARKY_S3, DESTINATION_PATH, new_temp_file)
    shutil.rmtree(temp_folder)
    shutil.rmtree(new_temp_folder)
python python-2.7 csv file-writing file-read
2个回答
6
投票

'\xef\xbb\xbf'是unicode ZERO WIDTH NO-BREAK SPACE U + FEFF的UTF8编码版本。它通常用作unicode文本文件开头的字节顺序标记:

  • 当你有3个字节:'\xef\xbb\xbf',然后文件是utf8编码
  • 当你有2个字节:'\xff\xfe',然后该文件在utf16小端
  • 当你有2个字节:'\xfe\xff',那么该文件是在utf16大端

qazxsw poi编码明确要求在文件的开头写入此BOMB

要在Python 2中的csv文件的读取时自动处理它,您可以使用编解码器模块:

'utf-8-sig'

with open(file_path, 'r') as infile: reader = csv.reader(codecs.EncodedFile(infile, 'utf8-sig', 'utf8'), delimiter=";") 将通过在EncodedFile中解码来包装原始文件对象,实际上跳过BOM并在utf8-sig中重新编码它而没有BOM。


1
投票

你想使用utf8库中的EncodedFile方法,就像Serge Ballesta的回答一样。

但是,使用Python 2.7编码codecs不是UTF8-sig编码支持的别名,您需要使用utf-8-sig。此外,方法属性的顺序需要首先定义输出数据编码,然后编码第二个文件:utf_8_sig

这是完整的结果:

codecs.EncodedFile(file,datacodec,filecodec=None,errors=’strict')
© www.soinside.com 2019 - 2024. All rights reserved.