Python 3.12 在 Excel CSV 中写入中文 - UTF-8-SIG 不起作用

问题描述 投票:0回答:1

我使用的是 Python 3.12.1 并将其上传到 AWS Lambda。

我正在做的是从 MySQL DB 获取数据(其中有一些中文文本)并导出到 Excel CSV。

这是代码:

# Copied from https://gist.github.com/tobywf/3773a7dc896f780c2216c8f8afbe62fc#file-unicode-csv-excel-py
with open(self.full_csv_path, 'w', encoding='utf-8-sig', newline='') as fp:
    writer = csv.writer(fp)
    writer.writerow(['Row', 'Emoji'])
    for i, emoji in enumerate(['🎅', '🤔', '😎']):
        writer.writerow([str(i), emoji])

结果(我使用Excel:数据>从文本导入,而不是双击)

这也不起作用:

with open(self.full_csv_path, 'w', encoding='utf-8-sig') as csvfile:
    # Did not work
    csvfile.write("許蓋功")
    # Did not work, also tried 'utf-8'
    csvfile.write("許蓋功".encode('utf-8-sig').decode('utf-8-sig'))

试过了,效果不太好

# Write CSV BOM mark
csvfile.write('\ufeff')  # did not work
csvfile.write(u'\ufeff')  # did not work
csvfile.write(u'\ufeff'.encode('utf8').decode("utf8"))  # did not work

它会将以上文本添加到 Excel 文件中,而不是 BOM 标记

看起来很清楚该字符串被视为UTF-8编码,但由于某些未知且奇怪的原因,它无法转换为正确的UTF-8。

大家可以帮忙吗?

非常感谢。

编辑 我想要做的是将这个包含中文字符的 CSV 文件附加到电子邮件中并在 AWS Lambda 中发送出去。

这里是通过SES发送电子邮件的代码:


        # Create a multipart/alternative child container.
        msg_body = MIMEMultipart('alternative')

        # Encode the text and HTML content and set the character encoding. This step is
        # necessary if you're sending a message with characters outside the ASCII range.
        textpart = MIMEText(BODY_TEXT.encode(CHARSET), 'plain', CHARSET)
        htmlpart = MIMEText(BODY_HTML.encode(CHARSET), 'html', CHARSET)

        # Add the text and HTML parts to the child container.
        msg_body.attach(textpart)
        msg_body.attach(htmlpart)

        # Define the attachment part and encode it using MIMEApplication.
        att = MIMEApplication(open(ATTACHMENT, 'r', encoding='utf-8').read())

        # Add a header to tell the email client to treat this part as an attachment,
        # and to give the attachment a name.
        att.add_header('Content-Disposition','attachment',filename=os.path.basename(ATTACHMENT))

        # Attach the multipart/alternative child container to the multipart/mixed
        # parent container.
        msg.attach(msg_body)

        # Add the attachment to the parent container.
        msg.attach(att)

        # print(msg)

        response = ''

        try:
            #Provide the contents of the email.
            response = client.send_raw_email(
                Source=SENDER,
                # Destinations=[ RECIPIENT ],
                Destinations=RECIPIENT,
                RawMessage={
                    'Data':msg.as_string(),
                }
            )
        # Display an error if something goes wrong.
        except ClientError as e:
            print(e.response['Error']['Message'])
        else:
            print("Email sent! Message ID:"),
            print(response['MessageId'])
            print(f'Attachment: {ATTACHMENT}')

我正在考虑这句话:

RawMessage={
    'Data':msg.as_string(),
}

这可能是造成这一切混乱的原因。但我不知道它是如何工作的。

python excel csv utf-8
1个回答
0
投票

问题解决了。

我的实际用例是

  • 生成 CSV 并
  • 通过 AWS SES 电子邮件将其作为附件发送

关键点是这样的:在AWS的示例代码中,

RawMessage
实际上被转换为
string
,这意味着CSV和UTF-8 BOM标记也被转换为字符串,即使它是一个字符串附件。

因此,CSV已正确生成,Python代码是正确的,但触发此错误的是SES Python代码。

解决这个问题很简单:压缩生成的 CSV 文件并将其作为附件发送。

当我添加它时,我会将我的要点放在这里。

谢谢你。

© www.soinside.com 2019 - 2024. All rights reserved.