Groupby.mean() 如果条件为真

问题描述 投票:0回答:1

我得到了以下数据框:

   index  user  default_shipping_cost     category  shipping_cost  shipping_coalesce  estimated_shipping_cost
0      0     1                      1      clothes            NaN                1.0                      6.0
1      1     1                      1  electronics            2.0                2.0                      6.0
2      2     1                     15    furniture            NaN               15.0                      6.0
3      3     2                     15    furniture            NaN               15.0                     15.0
4      4     2                     15    furniture            NaN               15.0                     15.0

每个用户,将 Shipping_cost 与 default_shipping_cost 结合起来,并计算组合后的 Shipping_costs 的平均值,但前提是至少给出一个 Shipping_cost。

说明:

  • user_1
    shipping_cost
    被给出(至少一次),这样我们就可以计算平均值
  • user_2没有
    shipping_cost
    ,所以我想和Nan一起去

代码:

import pandas as pd

pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)
pd.set_option('display.width', 1000)

df = pd.DataFrame(
    {
        'user': [1, 1, 1, 2, 2],
        'default_shipping_cost': [1, 1, 15, 15, 15],
        'category': ['clothes', 'electronics', 'furniture', 'furniture', 'furniture'],
        'shipping_cost': [None, 2, None, None, None]
    }
)
df.reset_index(inplace=True)
df['shipping_coalesce'] = df.shipping_cost.combine_first(df.default_shipping_cost)

dfg_user = df.groupby(['user'])
df['estimated_shipping_cost'] = dfg_user['shipping_coalesce'].transform("mean")
print(df)

预期输出:

   index  user  default_shipping_cost     category  shipping_cost  shipping_coalesce  estimated_shipping_cost
0      0     1                      1      clothes            NaN                1.0                      6.0
1      1     1                      1  electronics            2.0                2.0                      6.0
2      2     1                     15    furniture            NaN               15.0                      6.0
3      3     2                     15    furniture            NaN               15.0                      NaN
4      4     2                     15    furniture            NaN               15.0                      NaN
python pandas
1个回答
1
投票

使用

transform('any')
where
添加额外条件:

df['estimated_shipping_cost'] = (dfg_user['shipping_coalesce'].transform('mean')
                                .where(dfg_user['shipping_cost'].transform('any'))
                                )

输出:

   index  user  default_shipping_cost     category  shipping_cost  shipping_coalesce  estimated_shipping_cost
0      0     1                      1      clothes            NaN                1.0                      6.0
1      1     1                      1  electronics            2.0                2.0                      6.0
2      2     1                     15    furniture            NaN               15.0                      6.0
3      3     2                     15    furniture            NaN               15.0                      NaN
4      4     2                     15    furniture            NaN               15.0                      NaN

中级:

dfg_user['shipping_cost'].transform('any')

0     True
1     True
2     True
3    False
4    False
Name: shipping_cost, dtype: bool
© www.soinside.com 2019 - 2024. All rights reserved.