在Python 3中，如何从字符串中删除所有非UTF8字符？

问题描述投票：0回答：1

我正在使用Python 3.7。如何从字符串中删除所有非UTF-8字符？我尝试在下面使用“ lambda x：x.decode（'utf-8'，'ignore'）。encode（“ utf-8”）“

coop_types = map(
    lambda x: x.decode('utf-8','ignore').encode("utf-8"),
    filter(None, set(d['type'] for d in input_file))
)

但是这会导致错误...

Traceback (most recent call last):
  File "scripts/parse_coop_csv.py", line 30, in <module>
    for coop_type in coop_types:
  File "scripts/parse_coop_csv.py", line 25, in <lambda>
    lambda x: x.decode('utf-8','ignore').encode("utf-8"),
AttributeError: 'str' object has no attribute 'decode'

如果您具有从字符串中删除所有非UTF8字符的通用方法，那就是我要寻找的全部。

python python-3.x utf-8 decode encode

1个回答

0
投票

您以字符串开头。您不能decode一个str。并且假定UTF-8应该编码任何有效的Unicode文本（str存储的内容），每个str应该可以[[encode-作为UTF-8，所以您的目标似乎很荒谬。

最新问题

© www.soinside.com 2019 - 2024. All rights reserved.