我使用的是 Python 3.12.1 并将其上传到 AWS Lambda。
我正在做的是从 MySQL DB 获取数据(其中有一些中文文本)并导出到 Excel CSV。
这是代码:
# Copied from https://gist.github.com/tobywf/3773a7dc896f780c2216c8f8afbe62fc#file-unicode-csv-excel-py
with open(self.full_csv_path, 'w', encoding='utf-8-sig', newline='') as fp:
writer = csv.writer(fp)
writer.writerow(['Row', 'Emoji'])
for i, emoji in enumerate(['🎅', '🤔', '😎']):
writer.writerow([str(i), emoji])
结果(我使用Excel:数据>从文本导入,而不是双击)
这也不起作用:
with open(self.full_csv_path, 'w', encoding='utf-8-sig') as csvfile:
# Did not work
csvfile.write("許蓋功")
# Did not work, also tried 'utf-8'
csvfile.write("許蓋功".encode('utf-8-sig').decode('utf-8-sig'))
试过了,效果不太好
# Write CSV BOM mark
csvfile.write('\ufeff') # did not work
csvfile.write(u'\ufeff') # did not work
csvfile.write(u'\ufeff'.encode('utf8').decode("utf8")) # did not work
它会将以上文本添加到 Excel 文件中,而不是 BOM 标记
看起来很清楚该字符串被视为UTF-8编码,但由于某些未知且奇怪的原因,它无法转换为正确的UTF-8。
大家可以帮忙吗?
非常感谢。
编辑 我想要做的是将这个包含中文字符的 CSV 文件附加到电子邮件中并在 AWS Lambda 中发送出去。
这里是通过SES发送电子邮件的代码:
# Create a multipart/alternative child container.
msg_body = MIMEMultipart('alternative')
# Encode the text and HTML content and set the character encoding. This step is
# necessary if you're sending a message with characters outside the ASCII range.
textpart = MIMEText(BODY_TEXT.encode(CHARSET), 'plain', CHARSET)
htmlpart = MIMEText(BODY_HTML.encode(CHARSET), 'html', CHARSET)
# Add the text and HTML parts to the child container.
msg_body.attach(textpart)
msg_body.attach(htmlpart)
# Define the attachment part and encode it using MIMEApplication.
att = MIMEApplication(open(ATTACHMENT, 'r', encoding='utf-8').read())
# Add a header to tell the email client to treat this part as an attachment,
# and to give the attachment a name.
att.add_header('Content-Disposition','attachment',filename=os.path.basename(ATTACHMENT))
# Attach the multipart/alternative child container to the multipart/mixed
# parent container.
msg.attach(msg_body)
# Add the attachment to the parent container.
msg.attach(att)
# print(msg)
response = ''
try:
#Provide the contents of the email.
response = client.send_raw_email(
Source=SENDER,
# Destinations=[ RECIPIENT ],
Destinations=RECIPIENT,
RawMessage={
'Data':msg.as_string(),
}
)
# Display an error if something goes wrong.
except ClientError as e:
print(e.response['Error']['Message'])
else:
print("Email sent! Message ID:"),
print(response['MessageId'])
print(f'Attachment: {ATTACHMENT}')
我正在考虑这句话:
RawMessage={
'Data':msg.as_string(),
}
这可能是造成这一切混乱的原因。但我不知道它是如何工作的。
问题解决了。
我的实际用例是
关键点是这样的:在AWS的示例代码中,
RawMessage
实际上被转换为string
,这意味着CSV和UTF-8 BOM标记也被转换为字符串,即使它是一个字符串附件。
因此,CSV已正确生成,Python代码是正确的,但触发此错误的是SES Python代码。
解决这个问题很简单:压缩生成的 CSV 文件并将其作为附件发送。
当我添加它时,我会将我的要点放在这里。
谢谢你。