对于给定的数据帧如下:
id address sell_price market_price status start_date \
1 7552 Atlantic Lane 1170787.30 1463484.12 finished 2019/8/2
1 7552 Atlantic Lane 1137782.02 1422227.52 finished 2019/8/2
2 888 Foster Street 1066708.28 1333385.35 finished 2019/8/2
2 888 Foster Street 1871757.05 1416757.05 finished 2019/10/14
2 888 Foster Street NaN 763744.52 current 2019/10/12
3 5 Pawnee Avenue NaN 928366.20 current 2019/10/10
3 5 Pawnee Avenue NaN 2025924.16 current 2019/10/10
3 5 Pawnee Avenue NaN 4000000.00 forward 2019/10/9
3 5 Pawnee Avenue 2236138.90 1788938.90 finished 2019/10/8
4 916 W. Mill Pond St. 2811026.73 1992026.73 finished 2019/9/30
4 916 W. Mill Pond St. 13664803.02 10914803.02 finished 2019/9/30
4 916 W. Mill Pond St. 3234636.64 1956636.64 finished 2019/9/30
5 68 Henry Drive NaN 2699959.92 failed 2019/10/8
5 68 Henry Drive NaN 5830725.66 failed 2019/10/8
5 68 Henry Drive 2668401.36 1903401.36 finished 2019/12/8
end_date
2019/10/1
2019/10/1
2019/10/1
2019/10/15
2019/10/13
2019/10/11
2019/10/11
2019/10/10
2019/10/9
2019/10/1
2019/10/1
2019/10/1
2019/10/9
2019/10/9
2019/12/9
我想根据以下条件对id
和address
进行分组,并计算mean_ratio
和result_count
:
mean_ratio
:由id
和address
分组,并计算行的均值满足以下条件:status
为finished
,start_date
在2019-09
和2019-10
的范围内] >result_count
:由id
和address
分组,并计算行满足以下条件:status
为finished
或failed
,并且start_date
在2019-09
的范围内,并且2019-10
所需的输出将像这样:
id address mean_ratio result_count 0 1 7552 Atlantic Lane NaN 0 1 2 888 Foster Street 1.32 1 2 3 5 Pawnee Avenue 1.25 1 3 4 916 W. Mill Pond St. 1.44 3 4 5 68 Henry Drive NaN 2
我到目前为止已经尝试过:
# convert date df[['start_date', 'end_date']] = df[['start_date', 'end_date']].apply(lambda x: pd.to_datetime(x, format = '%Y/%m/%d')) # calculate ratio df['ratio'] = round(df['sell_price']/df['market_price'], 2)
为了过滤
start_date
在2019-09
和2019-10
的范围内:
L = [pd.Period('2019-09'), pd.Period('2019-10')] c = ['start_date'] df = df[np.logical_or.reduce([df[x].dt.to_period('m').isin(L) for x in c])]
要过滤行状态为
finished
或failed
,我使用:
mask = df['status'].str.contains('finished|failed') df[mask]
但是我不知道如何使用它们来获得最终结果。谢谢您的帮助。
对于给定的数据帧,如下所示:id地址sell_price market_price status start_date \ 1 7552 Atlantic Lane 1170787.30 1463484.12完成于2019/8/2 ...
我认为您需要GroupBy.agg
,但是由于排除了某些行,例如GroupBy.agg
,因此,请在id=1
中将它们与DataFrame.join
中的所有唯一对DataFrame.join
和id
对加起来,最后替换[ C0]栏:
一些助手