我具有以下功能,已将打印语句进行测试:
def parse_tag_id(id_string):
if not isinstance(id_string, str):
id_string = str(id_string)
if re.search(f'[0-9]{5}', id_string):
print(f'MATCH: #{id_string}#') # I put the '#' around each to make sure there are no hidden whitespaces.
else:
print(f'NO MATCH: #{id_string}#')
return None
然后将其应用到pandas DataFrame的列,并得到以下结果:
MATCH: #73844 / 73845#
MATCH: #73844 / 73845#
MATCH: #83793 / 84758#
MATCH: #73844 / 73845 / 84122 / 84136#
MATCH: #73844 / 73845 / 84136#
NO MATCH: #Not live yet#
NO MATCH: #83046# INCORRECT
MATCH: #84120 / 82795#
NO MATCH: #Not live yet#
NO MATCH: #Not live yet#
NO MATCH: #84264# INCORRECT
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
NO MATCH: #Not live yet#
NO MATCH: #Not live yet#
MATCH: #73844 / 73845#
NO MATCH: #78787 / 78788# INCORRECT
MATCH: #84856#
MATCH: #82795#
MATCH: #84857 / 82795#
MATCH: #82795#
MATCH: #82795#
NO MATCH: #Not live yet#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #84845#
MATCH: #75891 / 75892#
MATCH: #75891 / 75892#
MATCH: #75891 / 75892#
MATCH: #75891 / 75892#
MATCH: #75891 / 75892#
MATCH: #75891 / 75892#
MATCH: #75891 / 75892#
MATCH: #75891 / 75892#
NO MATCH: #Not live yet#
NO MATCH: #Not live yet#
NO MATCH: #Not live yet#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #83759#
NO MATCH: #Not live yet#
NO MATCH: #Not live yet#
NO MATCH: #84814# INCORRECT
MATCH: #84815#
NO MATCH: #Not live yet#
NO MATCH: #nan#
NO MATCH: #84118# INCORRECT
NO MATCH: #Not live yet#
NO MATCH: #84640# INCORRECT
MATCH: #84591#
NO MATCH: #84660# INCORRECT
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #82795#
MATCH: #75891 / 75892#
我希望所有带有5位数字的数字字符串或由'/'分隔的5位数字的字符串都返回true,但是我在上面用'INCORRECT'标记了不正确的字符串。
为什么这不能按预期工作?
因为此:
>>> f'[0-9]{5}'
'[0-9]5'
>>> r'[0-9]{5}'
'[0-9]{5}'
f字符串仅用于格式化。始终对正则表达式使用r字符串,以避免双重转义。
[我刚刚意识到我不小心在re.search中将搜索字符串作为f字符串而不是正则表达式字符串,所以它对包含'[0-9] 5'的所有字符串都进行存储”>