我很困惑,为什么这段代码无法按照我想要的方式工作。我正在读取txt文件,并将每个项目(以逗号分隔)打印到新行上。每个项目都用“”括起来,并且还包含标点符号。我正在尝试删除此标点符号。我熟悉string.punctuation,并在示例中使其在测试中起作用,但是在我正在遍历的项目上失败,请参见下文:
def read_word_lists(path):
import string
with open(path, encoding='utf-8') as f:
lines = f.readlines()
for line in lines[0].split(','):
line = str(line)
line = line.strip().lower()
print(''.join(word.strip(string.punctuation) for word in line))
print(line)
print(''.join(word.strip(string.punctuation) for word in '"why, does this work?! and not above?"'))
read_word_lists('file.txt')
结果是这样:
trying to strip punctuation: “you never”
originial: “you never”
test: why does this work and not above
trying to strip punctuation: “you always
originial: “you always"
test: why does this work and not above
trying to strip punctuation: ” “your problem is”
originial: ” “your problem is”
test: why does this work and not above
trying to strip punctuation: “the trouble with you is”
originial: “the trouble with you is”
test: why does this work and not above
有什么想法为什么“尝试删除标点符号”输出不起作用?
原始文件看起来像这样,如果有用的话:
"YOU NEVER”, “YOU ALWAYS", ” “YOUR PROBLEM IS”, “THE TROUBLE WITH YOU IS”
[您正在尝试剥离Unicode标点,而string.punctuation
仅包含ASCII标点。
代替使用string.punctuation
,您可以使用下面的代码来生成包含所有Unicode标点符号的字符串:
import unicodedata
import sys
punctuation = "".join((chr(i) for i in range(sys.maxunicode) if unicodedata.category(chr(i)).startswith('P')))
祝你好运!