我有一个格式如下的输入 txt 文件:
27/04/2023
00:00 0.1
06:00 0.5
23:00 0.9
28/04/2023
00:00 0.1
06:00 0.5
23:00 0.9
29/04/2023
00:00 0.1
06:00 0.5
23:00 0.9
输出应该是这样的:
27/04/2023 00:00 0.1
27/04/2023 06:00 0.5
27/04/2023 23:00 0.9
28/04/2023 00:00 0.1
28/04/2023 06:00 0.5
28/04/2023 23:00 0.9
29/04/2023 00:00 0.1
29/04/2023 06:00 0.5
29/04/2023 23:00 0.9
重新格式化文件的最直接和 pythonic 方式是什么?
我现在在做什么:
代码有点乱。而且它不会读取和重新格式化文件中的最后一天......
from datetime import datetime
data_file = 'data.txt'
dates = []
dates_line_number = []
with open(data_file) as input_file:
for i, line in enumerate(input_file):
# read only the lines with dates, store their line number to list
# store the date to another list
try:
date_object = datetime.strptime(line.strip(), '%d/%m/%Y')
dates.append(date_object)
dates_line_number.append(i)
del date_object
except:
pass
file = open(data_file)
content = file.readlines()
i = 0
f = open("outfile.txt", "w")
for index in range(len(dates_line_number)):
# get pairs of consecutive date line numbers
ls_index = dates_line_number[index:index+2]
if len(ls_index) == 2:
start = ls_index[0]+1
end = ls_index[1]-1
# slice the file content between concecutive date line numbers
ls_out = (content[start:end+1])
# insert corresponding date string
str_date = f"{dates[i].strftime('%d/%m/%Y')} "
ls_out.insert(0, '')
str_out = str_date.join(ls_out)
f.write(str_out)
i = i+1
f.close()
这将检查每一行的日期 (dd/mm/yyyy),如果找到,将其用作以下行的前缀……直到找到另一个日期……
import re
data = """27/04/2023
00:00 0.1
06:00 0.5
23:00 0.9
28/04/2023
00:00 0.1
06:00 0.5
23:00 0.9
29/04/2023
00:00 0.1
06:00 0.5
23:00 0.9
"""
date = ""
for l in data.splitlines():
if re.match(r'^\d{2}/\d{2}/\d{4}$', l):
date = l
continue
print(date.strip(), l.strip())
输出:
27/04/2023 00:00 0.1
27/04/2023 06:00 0.5
27/04/2023 23:00 0.9
28/04/2023 00:00 0.1
28/04/2023 06:00 0.5
28/04/2023 23:00 0.9
29/04/2023 00:00 0.1
29/04/2023 06:00 0.5
29/04/2023 23:00 0.9
首先我不得不说总是使用
with
打开文件。所以你不需要明确关闭文件。
您的目标可以通过以下代码实现:
with open('data.txt', 'r') as f:
all_lines = (f.read().splitlines())
with open('outfile.txt', 'w') as f:
for i, line in enumerate(all_lines):
if i % 4 == 0:
date = line
else:
f.write(f'{date} {line}\n')
我假设每个日期后面跟着另外三行。如果你可以有超过三行,你可以用另一个可以判断它是否是有效日期的条件替换
if i % 4 == 0:
条件。可以通过regex
或者函数来实现
上面的代码产生了你想要的输出。