我有一个带有行的文件,看起来像这样:
"[36.147315849999998, -86.7978174] 6 2011-08-28 19:45:11 @maryreynolds85 That is my life, lol."
"[37.715399429999998, -89.21166221] 6 2011-08-28 19:45:41 Ate more veggie and fruit than meat for the first time in my life"
我尝试剥离这些行并拆分它们,然后尝试剥离每个列表中带有标点符号的子字符串。
with open('aabb.txt') as t:
for Line in t:
splitline = Line.strip()
splitline2 = splitline.split()
for words in splitline2:
words = words.strip("!#$%&'()*+,-./:;?@[\]^_`{|}~")
words = words.lower()
我将这些行变成两个列表的方式如下:
'["36.147315849999998","-86.7978174","6","2011-08-28","19:45:11","maryreynolds85","that","is","my","life","lol"]'
'["37.715399429999998","-89.21166221","6","2011-08-28","19:45:41","ate","more","veggie","and","fruit","than","meat","for","the","time","in","my","life"]'
您的所有数据都使用相同的格式吗?如果是,请使用re
库中的正则表达式。
import re
your_str="[36.147315849999998, -86.7978174] 6 2011-08-28 19:45:11 @maryreynolds85 That is my life, lol."
reg_data= re.compile(r"\[(.*),(.*)\] (.*)")
your_reg_grp=re.match(reg_data,your_str)
if your_reg_grp:
print(your_reg_grp.groups())
#这应该将所有内容都放在列表中,除了方括号之外的部分,您可以通过split(“”)分割最后一个,然后创建一个新列表。
grp1=your_reg_grp.groups()
grp2=grp1[-1].split(" ")
合并grp1 [:-1]和grp2