基于熊猫的多个条件进行分组并计算计数和均值

问题描述 投票:0回答:1

对于给定的数据帧如下:

    id               address   sell_price  market_price    status  start_date  \
     1    7552 Atlantic Lane   1170787.30    1463484.12  finished    2019/8/2   
     1    7552 Atlantic Lane   1137782.02    1422227.52  finished    2019/8/2   
     2     888 Foster Street   1066708.28    1333385.35  finished    2019/8/2   
     2     888 Foster Street   1871757.05    1416757.05  finished  2019/10/14   
     2     888 Foster Street          NaN     763744.52   current  2019/10/12   
     3       5 Pawnee Avenue          NaN     928366.20   current  2019/10/10   
     3       5 Pawnee Avenue          NaN    2025924.16   current  2019/10/10   
     3       5 Pawnee Avenue          NaN    4000000.00   forward   2019/10/9   
     3       5 Pawnee Avenue   2236138.90    1788938.90  finished   2019/10/8   
     4  916 W. Mill Pond St.   2811026.73    1992026.73  finished   2019/9/30   
     4  916 W. Mill Pond St.  13664803.02   10914803.02  finished   2019/9/30   
     4  916 W. Mill Pond St.   3234636.64    1956636.64  finished   2019/9/30   
     5        68 Henry Drive          NaN    2699959.92    failed   2019/10/8   
     5        68 Henry Drive          NaN    5830725.66    failed   2019/10/8   
     5        68 Henry Drive   2668401.36    1903401.36  finished   2019/12/8   

      end_date  
     2019/10/1  
     2019/10/1  
     2019/10/1  
    2019/10/15  
    2019/10/13  
    2019/10/11  
    2019/10/11  
    2019/10/10  
     2019/10/9  
     2019/10/1  
     2019/10/1  
     2019/10/1  
     2019/10/9  
     2019/10/9  
     2019/12/9  

我想根据以下条件对idaddress进行分组,并计算mean_ratioresult_count

  1. mean_ratio:由idaddress分组,并计算行的均值满足以下条件:statusfinishedstart_date2019-092019-10的范围内] >
  2. result_count:由idaddress分组,并计算行满足以下条件:statusfinishedfailed,并且start_date2019-09的范围内,并且2019-10
  3. 所需的输出将像这样:

   id               address  mean_ratio  result_count
0   1    7552 Atlantic Lane         NaN             0
1   2     888 Foster Street        1.32             1
2   3       5 Pawnee Avenue        1.25             1
3   4  916 W. Mill Pond St.        1.44             3
4   5        68 Henry Drive         NaN             2

我到目前为止已经尝试过:

# convert date
df[['start_date', 'end_date']] = df[['start_date', 'end_date']].apply(lambda x: pd.to_datetime(x, format = '%Y/%m/%d'))
# calculate ratio
df['ratio'] = round(df['sell_price']/df['market_price'], 2)

为了过滤start_date2019-092019-10的范围内:

L = [pd.Period('2019-09'), pd.Period('2019-10')] 
c = ['start_date']
df = df[np.logical_or.reduce([df[x].dt.to_period('m').isin(L) for x in c])]

要过滤行状态为finishedfailed,我使用:

mask = df['status'].str.contains('finished|failed')
df[mask]

但是我不知道如何使用它们来获得最终结果。谢谢您的帮助。

对于给定的数据帧,如下所示:id地址sell_price market_price status start_date \ 1 7552 Atlantic Lane 1170787.30 1463484.12完成于2019/8/2 ...

python-3.x pandas dataframe
1个回答
1
投票

我认为您需要GroupBy.agg,但是由于排除了某些行,例如GroupBy.agg,因此,请在id=1中将它们与DataFrame.join中的所有唯一对DataFrame.joinid对加起来,最后替换[ C0]栏:


1
投票

一些助手

© www.soinside.com 2019 - 2024. All rights reserved.