Python 3.6带有Unicode字符和字节的乱码字符串

Question

所以我使用NewsPlease从Common Crawl新闻报道中获取文章标题，但是当我拿到文章标题时，它们是正常编码字符和Unicode字节的混合，我无法正确编码。选择其中一个标题：

x = articles[800].title

如果我在spyder中调用x，它将返回：

'Las 10 canciones m\\xc3\\xa1s populares de la semana'

当我使用print(x)时，我得到：

Las 10 canciones m\xc3\xa1s populares de la semana

但如果尝试使用以下方法正确编码:(如其他帖子所示）

x.encode('latin1').decode('utf8')

它回来了

'Las 10 canciones m\\xc3\\xa1s populares de la semana'

这显然是不正确的。

有人有什么建议吗？我顺便使用Python 3.6

Answer 1

找到了解决方案：

x = 'this is a test of the Spanish word m\\xc3\\xa1s'
x = x.encode('latin1').decode('unicode_escape').encode('latin1').decode('utf8')
print(x)
'this is a test of the Spanish word más'

Python 3.6带有Unicode字符和字节的乱码字符串

问题描述投票：0回答：1

1个回答

最新问题

Python 3.6带有Unicode字符和字节的乱码字符串

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1