我得到了以下数据框:
index user default_shipping_cost category shipping_cost shipping_coalesce estimated_shipping_cost
0 0 1 1 clothes NaN 1.0 6.0
1 1 1 1 electronics 2.0 2.0 6.0
2 2 1 15 furniture NaN 15.0 6.0
3 3 2 15 furniture NaN 15.0 15.0
4 4 2 15 furniture NaN 15.0 15.0
每个用户,将 Shipping_cost 与 default_shipping_cost 结合起来,并计算组合后的 Shipping_costs 的平均值,但前提是至少给出一个 Shipping_cost。
说明:
shipping_cost
被给出(至少一次),这样我们就可以计算平均值shipping_cost
,所以我想和Nan一起去代码:
import pandas as pd
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)
pd.set_option('display.width', 1000)
df = pd.DataFrame(
{
'user': [1, 1, 1, 2, 2],
'default_shipping_cost': [1, 1, 15, 15, 15],
'category': ['clothes', 'electronics', 'furniture', 'furniture', 'furniture'],
'shipping_cost': [None, 2, None, None, None]
}
)
df.reset_index(inplace=True)
df['shipping_coalesce'] = df.shipping_cost.combine_first(df.default_shipping_cost)
dfg_user = df.groupby(['user'])
df['estimated_shipping_cost'] = dfg_user['shipping_coalesce'].transform("mean")
print(df)
预期输出:
index user default_shipping_cost category shipping_cost shipping_coalesce estimated_shipping_cost
0 0 1 1 clothes NaN 1.0 6.0
1 1 1 1 electronics 2.0 2.0 6.0
2 2 1 15 furniture NaN 15.0 6.0
3 3 2 15 furniture NaN 15.0 NaN
4 4 2 15 furniture NaN 15.0 NaN
transform('any')
和 where
添加额外条件:
df['estimated_shipping_cost'] = (dfg_user['shipping_coalesce'].transform('mean')
.where(dfg_user['shipping_cost'].transform('any'))
)
输出:
index user default_shipping_cost category shipping_cost shipping_coalesce estimated_shipping_cost
0 0 1 1 clothes NaN 1.0 6.0
1 1 1 1 electronics 2.0 2.0 6.0
2 2 1 15 furniture NaN 15.0 6.0
3 3 2 15 furniture NaN 15.0 NaN
4 4 2 15 furniture NaN 15.0 NaN
中级:
dfg_user['shipping_cost'].transform('any')
0 True
1 True
2 True
3 False
4 False
Name: shipping_cost, dtype: bool