Pandas 根据另一个数据框上的日期范围设置一个值

Question

我有一些折扣率表，具体取决于代理商及其时间段，我想将其应用于另一张表以获得销售日期的当前适用费率。

这是费率表（df_r）

Agentname   ProductType     OldRate NewRate StartDate   EndDate
0   VSFAAL      SPORTS       0.0    10.0    2020-11-05  2021-01-18
1   VSFAAL      APPAREL      0.0    35.0    2020-11-05  2022-05-03
2   VSFAAL      SPORTS      10.0    15.0    2021-01-18  2022-05-03
3   VSFAALJS    SPORTS       0.0    10.0    2020-11-07  2022-05-03
4   VSFAALJS    APPAREL      0.0    15.0    2020-11-07  2021-11-09
5   VSFAALJS    APPAREL     15.0     5.0    2021-11-09  2022-05-03

这是交易表 (df)

                  Date      Sales   Agentname   ProductType     
0 2020-12-01 08:00:02        100.0  VSFAAL      SPORTS       
1 2022-03-01 08:00:09         99.0  VSFAAL      APPAREL      
2 2022-03-01 08:00:14         75.0  VSFAAL      SPORTS       
3 2021-05-01 08:00:39         67.0  VSFAALJS    SPORTS       
4 2020-05-01 08:00:51        160.0  VSFAALJS    APPAREL      
5 2021-05-01 08:00:56         65.0  VSFAALJS    APPAREL

我希望有这样的结果：

                  Date      Sales   Agentname   ProductType     Agentname_rates
0 2020-12-01 08:00:02        100.0  VSFAAL      SPORTS             10.0
1 2022-03-01 08:00:09         99.0  VSFAAL      APPAREL            35.0
2 2022-03-01 08:00:14         75.0  VSFAAL      SPORTS             15.0
3 2021-05-01 08:00:39         67.0  VSFAALJS    SPORTS             10.0
4 2020-05-01 08:00:51        160.0  VSFAALJS    APPAREL               0
5 2021-05-01 08:00:56         65.0  VSFAALJS    APPAREL            15.0

目前我正在做的是遍历产品类型，然后是代理商，然后是每个日期索引

col='Agentname'
for product in list(df.ProductType.unique()):
        for uname in list(df[col].unique()):
            a = df_r.loc[(df_r['Agentname'] == uname) & (df_r['ProductType'] == product.upper()) &
                         (df_r['EndDate'] >= df['Date'].min())]

            for i in a.index:
                     df.loc[(df['ProductType'].str.upper() == product.upper()) & (df[col] == uname) & (
                            df['Date'] >= a.loc[i]['StartDate']) & (df['Date'] <= a.loc[i]['EndDate']),
                           [f"{col}_rates"]] = a.loc[i]['NewRate']

有更有效的方法吗？

Answer 1

这是一种方法

合并product和agentname的两个DF，然后根据dates过滤

df3=df2.merge(df[['StartDate', 'EndDate','NewRate']], 
         left_on =[df2['Agentname'], df2['ProductType']],
         right_on=[df['Agentname'],  df['ProductType']],
              how='left',
          suffixes=('','_start')
        ).drop(columns=['key_0', 'key_1' ])

df3[df3['Date'].astype('datetime64').dt.strftime('%Y-%m-%d').between(
                                      df3['StartDate'].astype('datetime64'),
                                      df3['EndDate'].astype('datetime64'))
   ]

    Date    Sales   Agentname   ProductType StartDate   EndDate NewRate
0   2020-12-01 08:00:02 100.0   VSFAAL  SPORTS  2020-11-05  2021-01-18  10.0
2   2022-03-01 08:00:09 99.0    VSFAAL  APPAREL 2020-11-05  2022-05-03  35.0
4   2022-03-01 08:00:14 75.0    VSFAAL  SPORTS  2021-01-18  2022-05-03  15.0
5   2021-05-01 08:00:39 67.0    VSFAALJS    SPORTS  2020-11-07  2022-05-03  10.0
8   2021-05-01 08:00:56 65.0    VSFAALJS    APPAREL 2020-11-07  2021-11-09  15.0

Pandas 根据另一个数据框上的日期范围设置一个值

问题描述投票：0回答：1

1个回答

最新问题

Pandas 根据另一个数据框上的日期范围设置一个值

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1