为什么该string。标点符号代码不适用于剥离标点符号？

Question

我很困惑，为什么这段代码无法按照我想要的方式工作。我正在读取txt文件，并将每个项目（以逗号分隔）打印到新行上。每个项目都用“”括起来，并且还包含标点符号。我正在尝试删除此标点符号。我熟悉string.punctuation，并在示例中使其在测试中起作用，但是在我正在遍历的项目上失败，请参见下文：

def read_word_lists(path):
    import string
    with open(path, encoding='utf-8') as f:
        lines = f.readlines()
        for line in lines[0].split(','):
            line = str(line)
            line = line.strip().lower()
            print(''.join(word.strip(string.punctuation) for word in line))
            print(line)
            print(''.join(word.strip(string.punctuation) for word in '"why, does this work?! and not above?"'))


read_word_lists('file.txt')

结果是这样：

trying to strip punctuation:  “you never”
originial:  “you never”
test:  why does this work and not above
trying to strip punctuation:  “you always
originial:  “you always"
test:  why does this work and not above
trying to strip punctuation:  ” “your problem is”
originial:  ” “your problem is”
test:  why does this work and not above
trying to strip punctuation:  “the trouble with you is”
originial:  “the trouble with you is”
test:  why does this work and not above

有什么想法为什么“尝试删除标点符号”输出不起作用？

编辑

原始文件看起来像这样，如果有用的话：

"YOU NEVER”, “YOU ALWAYS", ” “YOUR PROBLEM IS”, “THE TROUBLE WITH YOU IS”

Answer 1

[您正在尝试剥离Unicode标点，而string.punctuation仅包含ASCII标点。

代替使用string.punctuation，您可以使用下面的代码来生成包含所有Unicode标点符号的字符串：

import unicodedata
import sys

punctuation = "".join((chr(i) for i in range(sys.maxunicode) if unicodedata.category(chr(i)).startswith('P')))

祝你好运！

为什么该string。标点符号代码不适用于剥离标点符号？

问题描述投票：1回答：1

编辑

1个回答

最新问题

为什么该string。标点符号代码不适用于剥离标点符号？

问题描述 投票：1回答：1

编辑

1个回答

最新问题

问题描述投票：1回答：1