我想用逗号分割字符串,但前提是逗号两边的分割长度至少为3个字。
例如,文本:"I like playing basketball, football, tennis, which are sports."
应转换为:["I like playing basketball, football, tennis", "which are sports."]
尝试用逗号分割字符串然后遍历列表,如果有少于3个字的字符串将它们连接到字符串之前或之后,并从数组中删除它,这样就不会有双精度数。
使用str.split()
例如:
s = ["I like playing basketball, which is a sport." , "Furthermore, I travel a lot."]
result = []
for i in s:
val = i.split(",")
if all(len(n.split())>=3 for n in val): #Check least 3 words
result.extend(val)
else:
result.append(i)
print(result)
输出:
['I like playing basketball',
' which is a sport.',
'Furthermore, I travel a lot.']
只需使用简单的list
理解:
data = ["I like playing basketball, which is a sport.", "Furthermore, I travel a lot."]
result = [sentence.split(',')
if all(len(chunk.split()) >= 3
for chunk in sentence.split(','))
else sentence
for sentence in data]
print(result)
输出:
[['I like playing basketball', ' which is a sport.'],
'Furthermore, I travel a lot.']