Обработка текста в Python - как обрабатывать недопустимые строки символов - PullRequest
0 голосов
/ 18 октября 2018

Я работаю над классификацией текста.Я вижу недопустимые символы, как показано ниже.Может ли кто-нибудь помочь мне, как декодировать эти символы в реальное значение.Любой указатель также должен помочь.

"It wouldn\'t take much to do for **Ã\x86sop**,\n\n\n\n\n            would it?**â\x80\x9d** whispered Ivan to Alyosha.\n\n\n\n\n\n\n\n\n\n            **â\x80\x9c**God forbid!**â\x80\x9d** cried Alyosha.\n\n\n\n\n\n\n\n\n\n            **â\x80\x9c**Why should He forbid?**â\x80\x9d** Ivan went on in the\n\n\n\n\n            same whisper, with a malignant grimace. **â\x80\x9c**One reptile will devour the other., And serve them\n\n\n\n\n            both right, too.â\x80\x9d\n\n\n\n\n\n\n\n\n\n            Alyosha\n\n\n\n\n            shuddered.\n\n\n\n\n\n\n\n\n\n            â\x80\x9cOf course I won\'t let him be murdered as I didn\'t\n\n\n\n\n            just now., Stay here, Alyosha, I\'ll go for a turn in the yard., My\n\n\n\n\n            head\'s begun to ache.â\x80\x9d\n\n\n\n\n\n\n\n\n\n            Alyosha went\n\n\n\n\n            to his father\'s bedroom and sat by his bedside behind the screen\n\n\n\n\n            for about an hour., The old man suddenly opened his eyes and gazed\n\n\n\n\n            for a long while at Alyosha, evidently remembering and\n\n\n\n\n            meditating., All at once his face betrayed extraordinary\n\n\n\n\n            excitement.\n\n\n\n\n\n\n\n\n\n            â\x80\x9cAlyosha,â\x80\x9d he whispered apprehensively,\n\n\n\n\n            â\x80\x9cwhere\'s Ivan?â\x80\x9d\n\n\n\n\n\n\n\n\n\n            â\x80\x9cIn the yard., He\'s got a headache., He\'s on the\n\n\n\n\n            watch.â\x80\x9d\n\n\n\n\n\n\n\n\n\n            â\x80\x9cGive me that looking-glass., It stands over there.\n\n\n\n\n            Give it me.â\x80\x9d\n\n\n\n\n\n\n\n\n\n            Alyosha gave\n\n\n\n\n            him a little round folding looking-glass which stood on the chest\n\n\n\n\n            of drawers., The old man looked at himself in it; his nose was\n\n\n\n\n            considerably swollen, and on the left side of his forehead there\n\n\n\n\n            was a rather large crimson bruise.\n\n\n\n\n\n\n\n\n\n            â\x80\x9cWhat does Ivan say?

1 Ответ

0 голосов
/ 02 марта 2019

Похоже, что данные были закодированы дважды (вы используете Python2?).Это может быть исправлено путем кодирования в латиницу-1 и последующего декодирования из UTF-8.

>>> data.encode('latin-1').decode('utf-8')
"It wouldn't take much to do for **Æsop**,\n\n\n\n\n            would it?**”** whispered Ivan to Alyosha.\n\n\n\n\n\n\n\n\n\n            **“**God forbid!**”** cried Alyosha.\n\n\n\n\n\n\n\n\n\n            **“**Why should He forbid?**”** Ivan went on in the\n\n\n\n\n            same whisper, with a malignant grimace. **“**One reptile will devour the other., And serve them\n\n\n\n\n            both right, too.”\n\n\n\n\n\n\n\n\n\n            Alyosha\n\n\n\n\n            shuddered.\n\n\n\n\n\n\n\n\n\n            “Of course I won't let him be murdered as I didn't\n\n\n\n\n            just now., Stay here, Alyosha, I'll go for a turn in the yard., My\n\n\n\n\n            head's begun to ache.”\n\n\n\n\n\n\n\n\n\n            Alyosha went\n\n\n\n\n            to his father's bedroom and sat by his bedside behind the screen\n\n\n\n\n            for about an hour., The old man suddenly opened his eyes and gazed\n\n\n\n\n            for a long while at Alyosha, evidently remembering and\n\n\n\n\n            meditating., All at once his face betrayed extraordinary\n\n\n\n\n            excitement.\n\n\n\n\n\n\n\n\n\n            “Alyosha,” he whispered apprehensively,\n\n\n\n\n            “where's Ivan?”\n\n\n\n\n\n\n\n\n\n            “In the yard., He's got a headache., He's on the\n\n\n\n\n            watch.”\n\n\n\n\n\n\n\n\n\n            “Give me that looking-glass., It stands over there.\n\n\n\n\n            Give it me.”\n\n\n\n\n\n\n\n\n\n            Alyosha gave\n\n\n\n\n            him a little round folding looking-glass which stood on the chest\n\n\n\n\n            of drawers., The old man looked at himself in it; his nose was\n\n\n\n\n            considerably swollen, and on the left side of his forehead there\n\n\n\n\n            was a rather large crimson bruise.\n\n\n\n\n\n\n\n\n\n            “What does Ivan say?"
...