Python中的文本处理 - 如何处理无效的字符串

问题描述 投票:1回答:1

我正在进行文本分类。我看到无效字符如下所示。有人可以帮我解释如何将这些字符解码为实际值。任何指针也应该有所帮助。

"It wouldn\'t take much to do for **Ã\x86sop**,\n\n\n\n\n            would it?**â\x80\x9d** whispered Ivan to Alyosha.\n\n\n\n\n\n\n\n\n\n            **â\x80\x9c**God forbid!**â\x80\x9d** cried Alyosha.\n\n\n\n\n\n\n\n\n\n            **â\x80\x9c**Why should He forbid?**â\x80\x9d** Ivan went on in the\n\n\n\n\n            same whisper, with a malignant grimace. **â\x80\x9c**One reptile will devour the other., And serve them\n\n\n\n\n            both right, too.â\x80\x9d\n\n\n\n\n\n\n\n\n\n            Alyosha\n\n\n\n\n            shuddered.\n\n\n\n\n\n\n\n\n\n            â\x80\x9cOf course I won\'t let him be murdered as I didn\'t\n\n\n\n\n            just now., Stay here, Alyosha, I\'ll go for a turn in the yard., My\n\n\n\n\n            head\'s begun to ache.â\x80\x9d\n\n\n\n\n\n\n\n\n\n            Alyosha went\n\n\n\n\n            to his father\'s bedroom and sat by his bedside behind the screen\n\n\n\n\n            for about an hour., The old man suddenly opened his eyes and gazed\n\n\n\n\n            for a long while at Alyosha, evidently remembering and\n\n\n\n\n            meditating., All at once his face betrayed extraordinary\n\n\n\n\n            excitement.\n\n\n\n\n\n\n\n\n\n            â\x80\x9cAlyosha,â\x80\x9d he whispered apprehensively,\n\n\n\n\n            â\x80\x9cwhere\'s Ivan?â\x80\x9d\n\n\n\n\n\n\n\n\n\n            â\x80\x9cIn the yard., He\'s got a headache., He\'s on the\n\n\n\n\n            watch.â\x80\x9d\n\n\n\n\n\n\n\n\n\n            â\x80\x9cGive me that looking-glass., It stands over there.\n\n\n\n\n            Give it me.â\x80\x9d\n\n\n\n\n\n\n\n\n\n            Alyosha gave\n\n\n\n\n            him a little round folding looking-glass which stood on the chest\n\n\n\n\n            of drawers., The old man looked at himself in it; his nose was\n\n\n\n\n            considerably swollen, and on the left side of his forehead there\n\n\n\n\n            was a rather large crimson bruise.\n\n\n\n\n\n\n\n\n\n            â\x80\x9cWhat does Ivan say?
python dataframe encoding decoding
1个回答
1
投票

看起来数据已被双重编码(你使用的是Python2吗?)。它可以通过编码到latin-1然后从UTF-8解码来修复。

>>> data.encode('latin-1').decode('utf-8')
"It wouldn't take much to do for **Æsop**,\n\n\n\n\n            would it?**”** whispered Ivan to Alyosha.\n\n\n\n\n\n\n\n\n\n            **“**God forbid!**”** cried Alyosha.\n\n\n\n\n\n\n\n\n\n            **“**Why should He forbid?**”** Ivan went on in the\n\n\n\n\n            same whisper, with a malignant grimace. **“**One reptile will devour the other., And serve them\n\n\n\n\n            both right, too.”\n\n\n\n\n\n\n\n\n\n            Alyosha\n\n\n\n\n            shuddered.\n\n\n\n\n\n\n\n\n\n            “Of course I won't let him be murdered as I didn't\n\n\n\n\n            just now., Stay here, Alyosha, I'll go for a turn in the yard., My\n\n\n\n\n            head's begun to ache.”\n\n\n\n\n\n\n\n\n\n            Alyosha went\n\n\n\n\n            to his father's bedroom and sat by his bedside behind the screen\n\n\n\n\n            for about an hour., The old man suddenly opened his eyes and gazed\n\n\n\n\n            for a long while at Alyosha, evidently remembering and\n\n\n\n\n            meditating., All at once his face betrayed extraordinary\n\n\n\n\n            excitement.\n\n\n\n\n\n\n\n\n\n            “Alyosha,” he whispered apprehensively,\n\n\n\n\n            “where's Ivan?”\n\n\n\n\n\n\n\n\n\n            “In the yard., He's got a headache., He's on the\n\n\n\n\n            watch.”\n\n\n\n\n\n\n\n\n\n            “Give me that looking-glass., It stands over there.\n\n\n\n\n            Give it me.”\n\n\n\n\n\n\n\n\n\n            Alyosha gave\n\n\n\n\n            him a little round folding looking-glass which stood on the chest\n\n\n\n\n            of drawers., The old man looked at himself in it; his nose was\n\n\n\n\n            considerably swollen, and on the left side of his forehead there\n\n\n\n\n            was a rather large crimson bruise.\n\n\n\n\n\n\n\n\n\n            “What does Ivan say?"
© www.soinside.com 2019 - 2024. All rights reserved.