我想根据日期计算product_mrp
的最大差值。为此,我试图按日期分组,但此后无法获得。
输入:
+-------------+--------------------+
| product_mrp | order_date |
+-------------+--------------------+
| 142 | 01-12-2019 |
| 20 | 01-12-2019 |
| 20 | 01-12-2019 |
| 120 | 01-12-2019 |
| 30 | 03-12-2019 |
| 20 | 03-12-2019 |
| 45 | 03-12-2019 |
| 215 | 03-12-2019 |
| 15 | 03-12-2019 |
| 25 | 07-12-2019 |
| 5 | 07-12-2019 |
+-------------+--------------------+
期望的输出:
+-------------+--------------------+
| product_mrp | order_date |
+-------------+--------------------+
| 122 | 01-12-2019 |
| 200 | 03-12-2019 |
| 20 | 07-12-2019 |
+-------------+--------------------+
groupby
,并像max
,min
和reset_index
一样使用:gr = df.groupby('order_date')['product_mrp']
df_ = (gr.max()-gr.min()).reset_index()
print (df_)
order_date product_mrp
0 01-12-2019 122
1 03-12-2019 200
2 07-12-2019 20
pandas
加载数据,然后使用groupby
按共享索引分组:import pandas as pd
dates = ['01-12-2019']*4 + ['03-12-2019']*5 + ['07-12-2019']*2
data = [142,20,20,120,30,20,45,215,15,25,5]
df = pd.DataFrame(data,)
df.index = pd.DatetimeIndex(dates)
grouped = df.groupby(df.index).apply(lambda x: x.max()-x.min())
输出:
product mrp 2019-01-12 122 2019-03-12 200 2019-07-12 20