我有以下测试字符串:
test_str = `It isn't directed at all,' said the White Rabbit;
我当前的正则表达式使用re.sub
过滤掉标点符号,以便我可以执行自己的操作。
我当前的正则表达式为re.sub(r"[^A-Za-z0-9'\s]", '', test_str)
上面的输出是:
['It', "isn't", 'directed', 'at', "all'", 'said', 'the', 'White', 'Rabbit']
[假设仅存储all'
时,在all
处可以看到错误。
如何存储带有's
的单词,又如何忽略标点符号后出现的'
?在这种情况下,all,'
。
尝试以下操作:
import re
test_str = "`It isn't directed at all,' said the White Rabbit;"
a = re.sub(r"[^A-Za-z0-9'\s]", '', test_str)
a = re.sub(r"'[ ]", ' ', a)
print(a)
尝试使用此正则表达式:
print(re.sub('["!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~''](?!\w+)', '', test_str))
输出:
It isn't directed at all said the White Rabbit