用python-pandas实现SQL“ MERGE INTO”命令的最佳方法?

问题描述 投票:0回答:1

我想对2个pandas数据帧进行条件向上插入-类似于merge into SQL函数。对于源数据帧中的每一行,如果索引不存在,请将其插入到目标数据帧中。如果索引确实存在,请检查辅助条件。如果满足条件,请更新现有行。

这里是一个例子:

import pandas as pd
df1 = pd.DataFrame([{'index':'1st','checkval':2,'storeval':'elephant'},
                    {'index':'2nd','checkval':7,'storeval':'giraffe'}]).set_index('index')

df2 = pd.DataFrame([{'index':'1st','checkval':3,'storeval':'hippopotamus'},
                    {'index':'3rd','checkval':4,'storeval':'seagull'}]).set_index('index')

这是df1的外观

        checkval    storeval
index       
1st     2           elephant
2nd     7           giraffe

这是df2的外观

     checkval   storeval
index       
1st     3       hippopotamus
3rd     4       seagull

这是我所描述的蛮力方式:

for ind2, row2 in df2.iterrows():
    found = False
     for ind1, row1 in df1.iterrows():
        if ind2 == ind1:
            #Index matched
            found = True
            if row2['checkval'] > row1['checkval']:
                #Conditions met, updating existing row
                df1.loc[ind1] = row2
    if not found:
        # Row not already in df, insert
        df1 = df1.append(row2)

输出为:

    checkval    storeval
index       
1st     3   hippopotamus
2nd     7   giraffe
3rd     4   seagull

但是,我很想找到某种内置函数,例如

df1.merge(d2, how = 'left', conditions = lambda df1,df2: df2['checkval']>df1['checkval'])

或类似的东西。有没有人对如何改进“蛮力”方法有任何建议?

python pandas upsert
1个回答
0
投票

不要在熊猫中创建不必要的循环,这会减慢速度并弄乱代码

我认为我们可以将DataFrame.append与以前的DataFrame.append一起使用:

groupby.last

替代项:

groupby.last

new_df = df1.append(df2).sort_values('checkval').groupby(level=0).last()
#new_df = df1.append(df2).sort_values('checkval').groupby(level='index').last()

输出

new_df = df1.append(df2)
new_df = new_df.loc[~new_df.sort_values('checkval')
                           .index
                           .duplicated(keep='last'),:].sort_index()
print(new_df)
© www.soinside.com 2019 - 2024. All rights reserved.