如何从字符串列表中删除\ uXXXX？ [重复]

问题描述投票：0回答：1

我想删除所有以\ u开头的单词。我相信这些是unicode'\ uXXXX'。

原始字符串：

"RT  \u2066als \u2066@WBHoekstra\u2069 zijn poot maar stijf houdt in de Italiaanse kwestie. Leest Mattheus 25, 2-13 '"

所需的输出：

"RT @WBHoekstra zijn poot maar stijf houdt in de Italiaanse kwestie. Leest Mattheus 25, 2-13 '"

我尝试像这样使用正则表达式：

re.sub('\u\w+','',item )

但是出现以下错误：

"SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX escape"

python regex string unicode python-unicode

1个回答

-1
投票

您可以使用.encode('ascii', 'ignore')来完成此操作>

"RT \u2066als \u2066@WBHoekstra\u2069 zijn poot maar stijf houdt in de Italiaanse kwestie. Leest Mattheus 25, 2-13 '".encode('ascii', 'ignore')

输出

 b"RT  als @WBHoekstra zijn poot maar stijf houdt in de Italiaanse kwestie. Leest Mattheus 25, 2-13 '"

最新问题

© www.soinside.com 2019 - 2024. All rights reserved.