Pandas：df 的每一行中的值的平均值，但仅限于使用正则表达式过滤的选定列范围

Question

鉴于我的 df

id  weight  Project   Exp_type   researcher events_d1 events_d2 events_d3 events_d4  events_d5   
0   50        p1        Acute      alex         0         0         0         4       2
1   52        p2        chronic    mat          0         1         1         5       1
2   75        p1        Acute      alex         1                   2                 1
3   53        p2        chronic    mat          0                             0       0

我想获得每行中值的平均值，但仅限于选定的列间隔。（events_d2 到 d3）以便出现 df_output：

  weight  Project Exp_type   researcher events_d1 events_d2 events_d3 events_d4  events_d5  meand2_d4 
0 50      p1      Acute      alex         0         0         0         4          2          1.33
1 52      p2      chronic    mat          0         1         1         5          1          2.66
2 75      p1      Acute      alex         1                   2                    1          0.66
3 53      p2      chronic    mat          0                             0          0          0

我尝试过以下方法

df['meand2_d4'] = df.filter(regex="events_d[2-4]").agg(np.mean, axis=1)

但作为输出获得的是每整行的单元格中包含的值的平均值，而不考虑我感兴趣的列间隔。我还注意到结果被平均为包含至少一个零的单元格数量，这对于每行都是不同的，并且取决于 NaN/空单元格的数量。

Answer 1

IIUC，而你的空白单元格实际上是

NaN

，那么你需要在

filter

上

axis=1

和

fillna

与

，然后再取平均值：

df['meand2_d4'] = df.filter(regex=r'^events_d[2-4]$', axis=1).fillna(0).mean(axis=1)

输出：

   id  weight Project Exp_type researcher  events_d1  events_d2  events_d3  events_d4  events_d5  meand2_d4
0   0      50      p1    Acute       alex          0        0.0        0.0        4.0        2.0   1.333333
1   1      52      p2  chronic        mat          0        1.0        1.0        5.0        1.0   2.333333
2   2      75      p1    Acute       alex          1        NaN        2.0        NaN        1.0   0.666667
3   3      53      p2  chronic        mat          0        NaN        NaN        0.0        0.0   0.000000

Pandas：df 的每一行中的值的平均值，但仅限于使用正则表达式过滤的选定列范围

问题描述投票：0回答：1

1个回答

最新问题

Pandas：df 的每一行中的值的平均值，但仅限于使用正则表达式过滤的选定列范围

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1