我有以下函数想要将字符串拆分为单词。分割结果需要分隔字母、数字和一些特殊字符,如 / 或 -
import pandas as pd
import re
def split_string_with_letters_and_non_letters(input_string):
split_strings = re.split(r"([a-zA-Z] )+", input_string)
result = []
count = 0
for x in split_strings:
if x not in result and x != ' ' and x != '':
result.append(x.lower().strip())
count=count+1
delim = "|"
result_string = delim.join([str(ele) for ele in result])
return result_string
teststring= "SPRINTER2500 2WD C E-150"
print(split_string_with_letters_and_non_letters(teststring))`
我的预期回报结果是:
"SPRINTER|2500|2|WD|C|E|-|150"
问题出在你的正则表达式中。这是重新审视的代码:
import re
def split_string_with_letters_and_non_letters(input_string):
#new regex
split_strings = re.findall(r'[A-Za-z]+|\d+|[/|-]', input_string)
# Remove empty elements
result = [x for x in split_strings if x]
# Join with a delimiter
result_string = "|".join(result)
return result_string
teststring = "SPRINTER2500 2WD C E-150"
print(split_string_with_letters_and_non_letters(teststring))
结果:
SPRINTER|2500|2|WD|C|E|-|150