重塑,连接和聚合多个pandas DataFrame

问题描述 投票:0回答:1

我有五个不同的pandas数据框,显示了相同数据的相同数据的计算结果,所有数组的形状相同。 (5×10)

df shape for each data set:



   (recording channels)
   0 1 2 3 4 5 6 7 8 9
t)
0  x x x x x x x x x x
1  x x x x x x x x x x
2  x x x x x x x x x x
3  x x x x x x x x x x
4  x x x x x x x x x x


df 1 : calculation 1
df 2 : calculation 2
.
.
.
df 5 : calculation 5

我想将所有这些数据帧合并为一个数据框,如下所示:

recording_channel-----time-----cal_1----cal_2----cal_3....cal_5
       0                0        x        x        x        x
       0                1        x        x        x        x
       0                2        x        x        x        x
       0                3        x        x        x        x
       0                4        x        x        x        x
       1                0        x        x        x        x
       1                1        x        x        x        x
       1                2        x        x        x        x
       1                3        x        x        x        x
       1                4        x        x        x        x
       .                .        .        .        .        .
       .                .        .        .        .        .
       9                4        x        x        x        x           

用于生成数据的代码:

import numpy as np 
import pandas as pd

list_df = []

for i in range(5):
    a = np.array(np.random.randint(0,1000+i, 50))
    a = a.reshape(5,10)
    df = pd.DataFrame(a)
    list_df.append(df)

for i in list_df:
    print(len(i))

df_joined = pd.concat(list_df, axis=1)

print(df_joined)
pandas dataframe python-3.5 dask dask-distributed
1个回答
0
投票

使用您的代码生成数据,我们使用melt将其从wide转换为long格式:

df_all = pd.DataFrame()
for i in range(5):
    a = np.array(np.random.randint(0,1000+i, 50))
    a = a.reshape(5,10)
    df = pd.DataFrame(a)
    list_df.append(df)
    # rather using melt here
    df_long = pd.melt(df.reset_index().rename(columns={'index': 'time'}), 
                                    id_vars='time', value_name='col', 
                                    var_name='recording_channel')
    df_all['col'+str(i+1)] = df_long['col']

# storing the other columns in your result
df_all['recording_channel'] = df_long.recording_channel
df_all['time'] = df_long.time
df_all.head()
© www.soinside.com 2019 - 2024. All rights reserved.