此问题已经在这里有了答案:
我正在尝试从字符串中删除所有标点符号和特殊字符,包括数字,但出现错误:error: bad escape \p at position 2
这是否意味着python的正则表达式无法识别\p{S}
和\p{P}
代码是:
name = "URL-dsds diasa:dksdjsk dskdjs_dskjdks 23232 dsds32 dskdjskds&dsjdsjdhs fddjfd%djshdhjs kdjs¤dskjds öfdfdjfkdj"
re.findall(r'[^\p{P}\p{S}\s\d]+', name.lower())
我希望输出与regex101突出显示的相同:https://regex101.com/r/HJZAUU/1
有帮助吗?
查看regex101.com将样式更改为Python,然后将正则表达式粘贴到顶部的字段中:
在右侧为您提供此信息:
[^\p{P}\p{S}\s\d]+
gm <Python>
Match a single character not present in the list below [^\p{P}\p{S}\s\d]+
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
\p matches the character p literally (case sensitive) <<<<<<<<<<<<<<<<<<<<<<<<<<<<
{P} matches a single character in the list {P} (case sensitive)<<<<<<<<<<<<<<<<<<
\p matches the character p literally (case sensitive)
{S} matches a single character in the list {S} (case sensitive)
\s matches any whitespace character (equal to [\r\n\t\f\v ])
\d matches a digit (equal to [0-9])
pip install regex
import regex as re
name = "URL-dsds diasa:dksdjsk dskdjs_dskjdks 23232 dsds32 dskdjskds&dsjdsjdhs fddjfd%djshdhjs kdjs¤dskjds öfdfdjfkdj"
re.findall(r'[^\p{P}\p{S}\s\d]+', name.lower())
我得到输出:[''url','dsds','diasa','dksdjsk','dskdjs','dskjdks','dsds','dskdjskds','dsjdsjdhs','fddjfd','djshdhjs','kdjs','dskjds','öfdfdjfkdj']