使用正则表达式删除带有单词的数字,但不能正常工作

问题描述 投票:0回答:1
import re
text = """Why is this $[...] when the same product is available for $[...] here?<br />
http://www.amazon.com/VICTOR-FLY-MAGNET-BAIT-REFILL/dp/B00004RBDY<br /><br />
The Victor M380 and M502 traps are unreal, of course -- total fly genocide. 
Pretty stinky, but only right nearby. won't, can't iamwordwith4number 234f  ther was a word withnumber before me"""

sentense1 = re.sub(r"\S*\d+\S*", "", text)  # removes words which has digits in it.
sentense1 = re.sub('[^A-Za-z0-9]+', " ", text)  # removes punctuations.
print(sentense1)

我正在尝试删除其中带有数字的单词。在以上句子的示例中,我们有类似iamwordwith4number或234f的单词。所以我想删除它们。如果我对第二条正则表达式行进行注释,则此命令有效。我不确定是否与此相关。您能在这方面给我建议吗?

python regex preprocessor
1个回答
1
投票

您的第二个正则表达式应如下所示:

sentense1 = re.sub('[^A-Za-z0-9]+', " ", sentense1)  # removes punctuations.

代替此:

sentense1 = re.sub('[^A-Za-z0-9]+', " ", text)  # removes punctuations.
© www.soinside.com 2019 - 2024. All rights reserved.