2 个数据帧之间的条件语句

问题描述 投票:0回答:1

我在处理两个数据帧之间的条件语句时遇到问题。 我正在尝试查找供应商编号在两个数据帧 [df][df2] 之间匹配的行,并且采购订单日期 [df] 位于开始 [df2] 和结束 [df2] 日期之间。然后,如果 [df2] 中的行在该单元格中有“X”,我希望将 SMALL_AMT[df] 更改为 TOTAL_AMT[df]。

import pandas as pd
import numpy as np

df = pd.read_csv('ERP2 ISR Data.csv', dtype=str)
df2 = pd.read_excel('Supplier size changes.xlsx', dtype=str)

df['DETL_PUOR_PLACED_DT'] = pd.to_datetime(raw['DETL_PUOR_PLACED_DT'])
df2['Start Date'] = pd.to_datetime(df2['Start Date'])
df2['End Date'] = pd.to_datetime(df2['End Date'])

df['DETL_SUPL_NO'].astype(str)
df2['SUPL_NO'].astype(str)
df['DIRECT_SMALL_AMT'].astype(float)


for i, row in enumerate(df.itertuples()):
    if (row[0]['DETL_SUPL_NO'] == df2['SUPL_NO'].iloc[i]) & (row[0]['DETL_PUOR_PLACED_DT'] >= df2['Start Date'].iloc[i]) & (row[0]['DETL_PUOR_PLACED_DT'] <= df2['End Date'].iloc[i]) & (df2['SM'] == 'X'):
        row[0]['DIRECT_SMALL_AMT'] = row[0].DIRECT_TOTAL
    elif df2['SM'] == "":
        row[0]['DIRECT_SMALL_AMT'] == "0"

不断出现以下错误等,我已经尝试了上面的大量不同方法,但没有任何效果。 类型错误:“int”对象不可下标

示例数据:

df

DETL_SUPL_NO DETL_PUOR_PLACED_DT 直接_小_AMT 直接_总计
1234 2011-02-13 450.12 美元
1222 2022-05-12 $50.11
1123 2019-04-21 200.15 美元
1233 2016-09-09 $5.12
1233 2017-12-29 $3000.56
1222 2023-01-12 $423.56

df2

SUPL_NO 开始日期 结束日期 SM
1234 2015-05-01 2100-01-01 X
1222 2015-01-01 2018-01-05
1123 2019-04-21 2020-05-12 X
1111 2016-09-09 2018-01-20
1112 2017-12-29 2018-01-05 X
1113 2023-01-12 2024-01-05

更新: 我能够使用上述建议让它运行。然而,逻辑并没有改变 df[DIRECT_SMALL_AMT] 中的美元

for (i, row1), (j, row2) in zip(df.iterrows(), df2.iterrows()): 
    if (row1.DETL_SUPL_NO == row2.SUPL_NO) & (row1.DETL_PUOR_PLACED_DT >= row2.Start_Date) & (row1.DETL_PUOR_PLACED_DT <= row2.End_Date): 
       if row2.SM == "X": 
             row1.DIRECT_SMALL_AMT = row1.DIRECT_TOTAL 
       elif row2.SM == "": 
            row1.DIRECT_DISADVANTAGED_AMT == "0" 
pandas numpy if-statement tuples conditional-formatting
1个回答
0
投票

你应该很少在 pandas 中使用循环。看看这个:

df = df.merge(df2, left_on = 'DETL_SUPL_NO', right_on = 'SUPL_NO', how='left') # merge the two dataframes
df = df[(df['DETL_PUOR_PLACED_DT'] > df['Start Date']) & (df['DETL_PUOR_PLACED_DT'] < df['End Date'])] # leave only dates within start and end dates
df
© www.soinside.com 2019 - 2024. All rights reserved.