如何将每个组city
和district
的未来两个月的行插入以下数据帧?
city district date price
0 a c 2019-08-01 00:00:00.000 12
1 a c 2019-09-01 00:00:00.000 13
2 a c 2019-10-01 00:00:00.000 11
3 a c 2019-11-01 00:00:00.000 15
4 b d 2019-08-01 00:00:00.000 8
5 b d 2019-09-01 00:00:00.000 6
6 b d 2019-10-01 00:00:00.000 9
7 b d 2019-11-01 00:00:00.000 15
所需的输出将是这样。
city district date price
0 a c 2019-08-01 00:00:00.000 12
1 a c 2019-09-01 00:00:00.000 13
2 a c 2019-10-01 00:00:00.000 11
3 a c 2019-11-01 00:00:00.000 15
4 a c 2019-12-01 00:00:00.000
5 a c 2020-01-01 00:00:00.000
6 b d 2019-08-01 00:00:00.000 8
7 b d 2019-09-01 00:00:00.000 6
8 b d 2019-10-01 00:00:00.000 9
9 b d 2019-11-01 00:00:00.000 15
10 b d 2019-12-01 00:00:00.000
11 b d 2020-01-01 00:00:00.000
[set_index
至date
,然后按频率reindex
,按MS
:
print (df.set_index("date").groupby(["city","district"])
.apply(lambda d: d[["price"]].reindex(pd.date_range(min(df["date"]),max(df["date"])+pd.DateOffset(months=2),freq="MS")))
.reset_index())
或通过MultiIndex
,city
和district
的组合创建date
:
month_range = pd.date_range(min(df["date"]),max(df["date"])+pd.DateOffset(months=2),freq="MS")
combos = [(*k,d) for k in df.groupby(["city","district"]).groups.keys() for d in month_range ]
m_index = pd.MultiIndex.from_tuples(combos,names=["city","district","date"])
print (df.set_index(["city","district","date"]).reindex(m_index).reset_index())
两者都产生相同的结果:
city district level_2 price
0 a c 2019-08-01 12.0
1 a c 2019-09-01 13.0
2 a c 2019-10-01 11.0
3 a c 2019-11-01 15.0
4 a c 2019-12-01 NaN
5 a c 2020-01-01 NaN
6 b d 2019-08-01 8.0
7 b d 2019-09-01 6.0
8 b d 2019-10-01 9.0
9 b d 2019-11-01 15.0
10 b d 2019-12-01 NaN
11 b d 2020-01-01 NaN