keytable
Out[66]:
datahora pp pres ... WeekDay Power_kW Power_kW18
Month Day Hour ...
1 3 0 2019-01-03 00:00 0.0 1027.6 ... 3 77.303046 117.774419
1 2019-01-03 01:00 0.0 1027.0 ... 3 72.319602 110.710928
2 2019-01-03 02:00 0.0 1027.0 ... 3 71.831852 106.067667
3 2019-01-03 03:00 0.0 1027.0 ... 3 69.555751 106.325955
4 2019-01-03 04:00 0.0 1027.0 ... 3 69.525780 102.855393
... ... ... ... ... ... ...
12 30 19 2019-12-30 19:00 0.0 1031.5 ... 0 72.590489 89.749535
20 2019-12-30 20:00 0.0 1032.0 ... 0 71.444516 87.691824
21 2019-12-30 21:00 0.0 1032.0 ... 0 68.940099 87.242445
22 2019-12-30 22:00 0.0 1032.0 ... 0 67.244716 83.618018
23 2019-12-30 23:00 0.0 1032.0 ... 0 68.531573 81.288847
[8637 rows x 12 columns]
我有这个数据框,我希望通过创建一天称为'rainday'的列来查看一天中的'pp'(降水)值,以了解是否在24天内下雨,如果确定阈值,该列将变为1白天会传递“ pp”的值。我该怎么办?
将groupby
与max
一起使用并与您的阈值进行比较:
threshold = 1
df["rainday"] = (df.reset_index().groupby(["Month","Day"])["pp"].max()
.gt(threshold).astype(int))
print (df)
datahora pp pres WeekDay Power_kW Power_kW18 rainday
Month Day Hour
1 3 0 2019-01-03 00:00 0.0 1027.6 3 77.303046 117.774419 0
1 2019-01-03 01:00 0.0 1027.0 3 72.319602 110.710928 0
2 2019-01-03 02:00 0.0 1027.0 3 71.831852 106.067667 0
3 2019-01-03 03:00 0.0 1027.0 3 69.555751 106.325955 0
4 2019-01-03 04:00 1.0 1027.0 3 69.525780 102.855393 0
12 30 19 2019-12-30 19:00 0.0 1031.5 0 72.590489 89.749535 1
20 2019-12-30 20:00 0.0 1032.0 0 71.444516 87.691824 1
21 2019-12-30 21:00 0.0 1032.0 0 68.940099 87.242445 1
22 2019-12-30 22:00 1.0 1032.0 0 67.244716 83.618018 1
23 2019-12-30 23:00 2.0 1032.0 0 68.531573 81.288847 1