将数据宽转换为长,压缩为 1 分钟间隔

问题描述 投票:0回答:1

寻找Python帮助将数据从宽数据转换为长数据(?)

我的数据看起来像这样:

channelId,utc,scet,val1,val2
A-0001,2024-061T22:00:05.02064,0.03,3,
A-0002,2024-061T22:00:06.02064,0.07,2,
A-0001,2024-061T22:00:11.02064,0.02,2,
A-0002,2024-061T22:00:12.02064,0.05,7,
A-0001,2024-061T22:01:12.365611,0.01,1.5,
A-0002,2024-061T22:01:14.365611,.07,16

我想生成一个包含以下列的表格:

时间,A-0001_val1,A-0001_val2,A-0002_val1,A-0002_val2....

因为并非所有值都共享相同的时间戳,所以我想将间隔缩小到 1 分钟。

到目前为止我有这个:

    import pandas as pd

    # Read the input table into a DataFrame
    df = pd.read_csv('~/Desktop/test_file_1.csv')

    # Convert timestamp columns to datetime format with explicit format specification
    df['utc'] = pd.to_datetime(df['utc'], format='%Y-%jT%H:%M:%S.%f')

    # Round timestamps to the nearest minute
    df['utc'] = df['utc'].dt.round('min')

    # Pivot the DataFrame
    df_pivot = df.pivot_table(index=['utc'], columns='channelId', values=['val1', 'val2'])

    df_reset = df_pivot.reset_index()
    df_reset['utc'] = pd.to_datetime(df_reset['utc'])
    df_reset.set_index('utc', inplace=True)

    # Resample the DataFrame to get values for every minute
    df_resampled = df_reset.resample('T').last().ffill()

    # Flatten multi-level column index
    df_resampled.columns = [f'{col[1]}_{col[0]}' for col in df_resampled.columns.values]

    # Reset index
    df_resampled.reset_index(inplace=True)

    # Rename columns
    df_resampled.rename(columns={'ert': 'Time'}, inplace=True)

    df_final = df_resampled[['Time', *sorted(df_resampled.columns[1:])]]

    # Write the output table to a CSV file
    df_final.to_csv('output_table_3.csv', index=False)

我的输出如下所示:

Time, A-0001_val1, A-0001_val2, A-0002_val1, A-0002_val2
2024-03-01 22:00:00,0.02,2,0.05,7
2024-03-01 22:00:01,0.01,1.5,0.06,16

我认为没关系,但我很好奇是否有人有更好的方法。

pandas pivot-table pandas-resample resample pyresample
1个回答
1
投票

你已经非常接近解决这个问题了:

原始数据是

channelId;utc;scet;val1;val2
A-0001;2024-061T22:00:05.02064;0.03;3;5
A-0002;2024-061T22:00:06.02064;0.07;2;4
A-0001;2024-061T22:00:11.02064;0.02;2;3
A-0002;2024-061T22:00:12.02064;0.05;7;3
A-0001;2024-061T22:01:12.365611;0.01;1.5;2
A-0002;2024-061T22:01:14.365611;.07;16;3

代码应该是

import pandas as pd

df = pd.read_csv(r"C:\Users\s-degossondevarennes\OneDrive - Pricer AB\testfile.csv", sep = ";")

df['utc'] = pd.to_datetime(df['utc'], format='%Y-%jT%H:%M:%S.%f').dt.round('min')
df_pivot = df.pivot_table(index='utc', columns='channelId', values=['val1', 'val2'], aggfunc='mean')
df_pivot.columns = ['{}_{}'.format(col[1], col[0]) for col in df_pivot.columns]
df_pivot.reset_index(inplace=True)
df_pivot.rename(columns={'utc': 'Time'}, inplace=True)
print(df_pivot)

这给出了


                 Time  A-0001_val1  A-0002_val1  A-0001_val2  A-0002_val2
0 2024-03-01 22:00:00          2.5          4.5          4.0          3.5
1 2024-03-01 22:01:00          1.5         16.0          2.0          3.0
© www.soinside.com 2019 - 2024. All rights reserved.