我在处理两个数据帧之间的条件语句时遇到问题。 我正在尝试查找供应商编号在两个数据帧 [df][df2] 之间匹配的行,并且采购订单日期 [df] 位于开始 [df2] 和结束 [df2] 日期之间。然后,如果 [df2] 中的行在该单元格中有“X”,我希望将 SMALL_AMT[df] 更改为 TOTAL_AMT[df]。
import pandas as pd
import numpy as np
df = pd.read_csv('ERP2 ISR Data.csv', dtype=str)
df2 = pd.read_excel('Supplier size changes.xlsx', dtype=str)
df['DETL_PUOR_PLACED_DT'] = pd.to_datetime(raw['DETL_PUOR_PLACED_DT'])
df2['Start Date'] = pd.to_datetime(df2['Start Date'])
df2['End Date'] = pd.to_datetime(df2['End Date'])
df['DETL_SUPL_NO'].astype(str)
df2['SUPL_NO'].astype(str)
df['DIRECT_SMALL_AMT'].astype(float)
for i, row in enumerate(df.itertuples()):
if (row[0]['DETL_SUPL_NO'] == df2['SUPL_NO'].iloc[i]) & (row[0]['DETL_PUOR_PLACED_DT'] >= df2['Start Date'].iloc[i]) & (row[0]['DETL_PUOR_PLACED_DT'] <= df2['End Date'].iloc[i]) & (df2['SM'] == 'X'):
row[0]['DIRECT_SMALL_AMT'] = row[0].DIRECT_TOTAL
elif df2['SM'] == "":
row[0]['DIRECT_SMALL_AMT'] == "0"
不断出现以下错误等,我已经尝试了上面的大量不同方法,但没有任何效果。 类型错误:“int”对象不可下标
示例数据:
df
DETL_SUPL_NO | DETL_PUOR_PLACED_DT | 直接_小_AMT | 直接_总计 |
---|---|---|---|
1234 | 2011-02-13 | 450.12 美元 | |
1222 | 2022-05-12 | $50.11 | |
1123 | 2019-04-21 | 200.15 美元 | |
1233 | 2016-09-09 | $5.12 | |
1233 | 2017-12-29 | $3000.56 | |
1222 | 2023-01-12 | $423.56 |
df2
SUPL_NO | 开始日期 | 结束日期 | SM |
---|---|---|---|
1234 | 2015-05-01 | 2100-01-01 | X |
1222 | 2015-01-01 | 2018-01-05 | |
1123 | 2019-04-21 | 2020-05-12 | X |
1111 | 2016-09-09 | 2018-01-20 | |
1112 | 2017-12-29 | 2018-01-05 | X |
1113 | 2023-01-12 | 2024-01-05 |
更新: 我能够使用上述建议让它运行。然而,逻辑并没有改变 df[DIRECT_SMALL_AMT] 中的美元
for (i, row1), (j, row2) in zip(df.iterrows(), df2.iterrows()):
if (row1.DETL_SUPL_NO == row2.SUPL_NO) & (row1.DETL_PUOR_PLACED_DT >= row2.Start_Date) & (row1.DETL_PUOR_PLACED_DT <= row2.End_Date):
if row2.SM == "X":
row1.DIRECT_SMALL_AMT = row1.DIRECT_TOTAL
elif row2.SM == "":
row1.DIRECT_DISADVANTAGED_AMT == "0"
你应该很少在 pandas 中使用循环。看看这个:
df = df.merge(df2, left_on = 'DETL_SUPL_NO', right_on = 'SUPL_NO', how='left') # merge the two dataframes
df = df[(df['DETL_PUOR_PLACED_DT'] > df['Start Date']) & (df['DETL_PUOR_PLACED_DT'] < df['End Date'])] # leave only dates within start and end dates
df