我有一个数据框,每隔一列填充时间数据,配置方式为:。 ` 年:月:日时:分:秒.fractions_of_second
`
我的目标是将时间列中的所有记录更改为总秒数,然后从任何时间列中减去最小值(这是对 16 个((1 信号 + 1 时间线)*2 = 32 列)通道的测试,其中时间线不同步)。它有效,但我发现将日期时间转换为总秒数的唯一方法是使用 value.timestamp() 逐条记录记录,这非常慢。
您知道有什么方法可以加快这个过程吗? 重现它的代码:
def pzt_time_convert(df):
min_array = []
new_col = np.zeros(df_pzt.shape[0])
for i in range(0, 16):
#df_pzt[f'time{i}'] = df_pzt[f'time{i}'].dt.seconds() #DOESNT WORK
col_num = df_pzt.columns.get_loc(f'time{i}')
for j in range(0, int(df_pzt.shape[0])):
new_col[j] = df_pzt.iloc[j, col_num].timestamp() #WORKS VERY SLOW
min_array.append(new_col[0])
df_pzt[f'time{i}'] = new_col
min_time_df = min(min_array)
for i in range(0, 16):
df_pzt[f'time{i}'] = df_pzt[f'time{i}'] - min_time_df
示例数据框:
values = [[Timestamp('2024-04-12 11:14:24.358056') -0.0006875504968152438
Timestamp('2024-04-12 11:14:24.358056') -0.00041178275313580315
Timestamp('2024-04-12 11:14:24.358056') 7.364900769430663e-05
Timestamp('2024-04-12 11:14:24.358056') -0.0004661741754045164
Timestamp('2024-04-12 11:14:24.358056') -0.0015942548166320398
Timestamp('2024-04-12 11:14:24.358056') -0.0021384553975074957
Timestamp('2024-04-12 11:14:24.358056') -0.00039135882617280515
Timestamp('2024-04-12 11:14:24.358056') 0.0007415602000625972
Timestamp('2024-04-12 11:14:24.357797') 0.0009018262284073795
Timestamp('2024-04-12 11:14:24.357797') -0.0010126974839954682
Timestamp('2024-04-12 11:14:24.357797') -0.0005472086561974499
Timestamp('2024-04-12 11:14:24.357797') -0.0038311580789279387
Timestamp('2024-04-12 11:14:24.357797') -0.00020165785427062911
Timestamp('2024-04-12 11:14:24.357797') 0.00101211173219427
Timestamp('2024-04-12 11:14:24.357797') -0.00036739739321287653
Timestamp('2024-04-12 11:14:24.357797') -0.0014596976004591774]
[Timestamp('2024-04-12 11:14:24.358061') -0.001393217874484563
Timestamp('2024-04-12 11:14:24.358061') 0.000632806043004822
Timestamp('2024-04-12 11:14:24.358061') -0.0009482395579276977
Timestamp('2024-04-12 11:14:24.358061') 0.0009302471556326194
Timestamp('2024-04-12 11:14:24.358061') -0.000901158768599161
Timestamp('2024-04-12 11:14:24.358061') -0.0031716724511709534
Timestamp('2024-04-12 11:14:24.358061') -0.0014356856355145864
Timestamp('2024-04-12 11:14:24.358061') 0.00039756264859636733
Timestamp('2024-04-12 11:14:24.357802') -0.00016250123298466672
Timestamp('2024-04-12 11:14:24.357802') -0.0017167618679096996
Timestamp('2024-04-12 11:14:24.357802') 0.0005116228811787183
Timestamp('2024-04-12 11:14:24.357802') -0.004182242412716671
Timestamp('2024-04-12 11:14:24.357802') -0.0012577073144788766
Timestamp('2024-04-12 11:14:24.357802') 0.00030706346666221993
Timestamp('2024-04-12 11:14:24.357802') -0.00036739739321287653
Timestamp('2024-04-12 11:14:24.357802') -0.0011094467495909785]
[Timestamp('2024-04-12 11:14:24.358066') -0.0017460515633192228
Timestamp('2024-04-12 11:14:24.358066') 0.000632806043004822
Timestamp('2024-04-12 11:14:24.358066') -0.001288869079801699
Timestamp('2024-04-12 11:14:24.358066') -0.0008152795081638005
Timestamp('2024-04-12 11:14:24.358066') -0.002633898888681358
Timestamp('2024-04-12 11:14:24.358066') -0.0028272667666164675
Timestamp('2024-04-12 11:14:24.358066') -0.00039135882617280515
Timestamp('2024-04-12 11:14:24.358066') 0.0007415602000625972
Timestamp('2024-04-12 11:14:24.357807') -0.0005172770534486821
Timestamp('2024-04-12 11:14:24.357807') -0.0006606652920383522
Timestamp('2024-04-12 11:14:24.357807') 0.00015867903538666246
Timestamp('2024-04-12 11:14:24.357807') -0.001724652076195543
Timestamp('2024-04-12 11:14:24.357807') -0.0012577073144788766
Timestamp('2024-04-12 11:14:24.357807') 0.00101211173219427
Timestamp('2024-04-12 11:14:24.357807') -0.00036739739321287653
Timestamp('2024-04-12 11:14:24.357807') -0.0021601993021955757]]
headers = ['time0', 'channel0', 'time1', 'channel1', 'time2', 'channel2', 'time3',
'channel3', 'time4', 'channel4', 'time5', 'channel5', 'time6',
'channel6', 'time7', 'channel7', 'time8', 'channel8', 'time9',
'channel9', 'time10', 'channel10', 'time11', 'channel11', 'time12',
'channel12', 'time13', 'channel13', 'time14', 'channel14', 'time15',
'channel15']
df = pd.DataFrame(values, columns = headers)
有更好的方法。首先过滤时间列,然后计算所有列的最小值,然后从时间列中减去它,最后使用
.dt.total_seconds()
方法计算以秒为单位的差异
cols = df.filter(like='time')
cols = cols.sub(cols.min().min()).apply(lambda s: s.dt.total_seconds())
result = df.assign(**cols)
print(result)
time0 channel0 time1 channel1 time2 channel2 time3 channel3 time4 channel4 time5 channel5 time6 channel6 time7 channel7 time8 channel8 time9 channel9 time10 channel10 time11 channel11 time12 channel12 time13 channel13 time14 channel14 time15 channel15
0 0.000259 -0.000688 0.000259 -0.000412 0.000259 7.364901 0.000259 -0.000466 0.000259 -0.001594 0.000259 -0.002138 0.000259 -0.000391 0.000259 0.000742 0.000000 0.000902 0.000000 -0.001013 0.000000 -0.000547 0.000000 -0.003831 0.000000 -0.000202 0.000000 0.001012 0.000000 -0.000367 0.000000 -0.001460
1 0.000264 -0.001393 0.000264 0.000633 0.000264 -0.000948 0.000264 0.000930 0.000264 -0.000901 0.000264 -0.003172 0.000264 -0.001436 0.000264 0.000398 0.000005 -0.000163 0.000005 -0.001717 0.000005 0.000512 0.000005 -0.004182 0.000005 -0.001258 0.000005 0.000307 0.000005 -0.000367 0.000005 -0.001109
2 0.000269 -0.001746 0.000269 0.000633 0.000269 -0.001289 0.000269 -0.000815 0.000269 -0.002634 0.000269 -0.002827 0.000269 -0.000391 0.000269 0.000742 0.000010 -0.000517 0.000010 -0.000661 0.000010 0.000159 0.000010 -0.001725 0.000010 -0.001258 0.000010 0.001012 0.000010 -0.000367 0.000010 -0.002160