Python 3.12 在 Excel CSV 中写入中文 - UTF-8-SIG 不起作用

Question

我使用的是 Python 3.12.1 并将其上传到 AWS Lambda。

我正在做的是从 MySQL DB 获取数据（其中有一些中文文本）并导出到 Excel CSV。

这是代码：

# Copied from https://gist.github.com/tobywf/3773a7dc896f780c2216c8f8afbe62fc#file-unicode-csv-excel-py
with open(self.full_csv_path, 'w', encoding='utf-8-sig', newline='') as fp:
    writer = csv.writer(fp)
    writer.writerow(['Row', 'Emoji'])
    for i, emoji in enumerate(['🎅', '🤔', '😎']):
        writer.writerow([str(i), emoji])

结果（我使用Excel：数据>从文本导入，而不是双击）

这也不起作用：

with open(self.full_csv_path, 'w', encoding='utf-8-sig') as csvfile:
    # Did not work
    csvfile.write("許蓋功")
    # Did not work, also tried 'utf-8'
    csvfile.write("許蓋功".encode('utf-8-sig').decode('utf-8-sig'))

试过了，效果不太好

# Write CSV BOM mark
csvfile.write('\ufeff')  # did not work
csvfile.write(u'\ufeff')  # did not work
csvfile.write(u'\ufeff'.encode('utf8').decode("utf8"))  # did not work

它会将以上文本添加到 Excel 文件中，而不是 BOM 标记

看起来很清楚该字符串被视为UTF-8编码，但由于某些未知且奇怪的原因，它无法转换为正确的UTF-8。

大家可以帮忙吗？

非常感谢。

编辑我想要做的是将这个包含中文字符的 CSV 文件附加到电子邮件中并在 AWS Lambda 中发送出去。

这里是通过SES发送电子邮件的代码：


        # Create a multipart/alternative child container.
        msg_body = MIMEMultipart('alternative')

        # Encode the text and HTML content and set the character encoding. This step is
        # necessary if you're sending a message with characters outside the ASCII range.
        textpart = MIMEText(BODY_TEXT.encode(CHARSET), 'plain', CHARSET)
        htmlpart = MIMEText(BODY_HTML.encode(CHARSET), 'html', CHARSET)

        # Add the text and HTML parts to the child container.
        msg_body.attach(textpart)
        msg_body.attach(htmlpart)

        # Define the attachment part and encode it using MIMEApplication.
        att = MIMEApplication(open(ATTACHMENT, 'r', encoding='utf-8').read())

        # Add a header to tell the email client to treat this part as an attachment,
        # and to give the attachment a name.
        att.add_header('Content-Disposition','attachment',filename=os.path.basename(ATTACHMENT))

        # Attach the multipart/alternative child container to the multipart/mixed
        # parent container.
        msg.attach(msg_body)

        # Add the attachment to the parent container.
        msg.attach(att)

        # print(msg)

        response = ''

        try:
            #Provide the contents of the email.
            response = client.send_raw_email(
                Source=SENDER,
                # Destinations=[ RECIPIENT ],
                Destinations=RECIPIENT,
                RawMessage={
                    'Data':msg.as_string(),
                }
            )
        # Display an error if something goes wrong.
        except ClientError as e:
            print(e.response['Error']['Message'])
        else:
            print("Email sent! Message ID:"),
            print(response['MessageId'])
            print(f'Attachment: {ATTACHMENT}')

我正在考虑这句话：

RawMessage={
    'Data':msg.as_string(),
}

这可能是造成这一切混乱的原因。但我不知道它是如何工作的。

Answer 1

问题解决了。

我的实际用例是

生成 CSV 并
通过 AWS SES 电子邮件将其作为附件发送

关键点是这样的：在AWS的示例代码中，

RawMessage

实际上被转换为

string

，这意味着CSV和UTF-8 BOM标记也被转换为字符串，即使它是一个字符串附件。

因此，CSV已正确生成，Python代码是正确的，但触发此错误的是SES Python代码。

解决这个问题很简单：压缩生成的 CSV 文件并将其作为附件发送。

当我添加它时，我会将我的要点放在这里。

谢谢你。

Python 3.12 在 Excel CSV 中写入中文 - UTF-8-SIG 不起作用

问题描述投票：0回答：1

1个回答

最新问题

Python 3.12 在 Excel CSV 中写入中文 - UTF-8-SIG 不起作用

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1