按行合并或附加 2 个数据帧,并在单独的列中添加一个检查以确定它来自哪个数据帧

问题描述 投票:0回答:2

我有以下2个数据框,

df1
,

import pandas as pd

data = {
    'commonshortname': ['SNX.US', '002400.CH', 'CDW.US', 'CEC.GR', '300002.CH'],
    'altshortname': ['SNX.US', '002400.SHE', 'CDW.US', 'CEC.XETRA', '300002.SHE'],
    'Code': ['SNX', '002400', 'CDW', 'CEC', '300002', ...],
    'Type': ['Common Stock', 'Common Stock', 'Common Stock', 'Common Stock', 'Common Stock'],
    'common': [1, 1, 1, 1, 1]
}

df1 = pd.DataFrame(data)

df2
看起来像这样,

data = {'altshortname': ['SEDG.US', 'MHLD.US', 'CDW.US', 'POLA.US', 'PHASQ.US'],
        'Code': ['SEDG', 'MHLD', 'CDW', 'POLA', 'PHASQ'],
        'Type': ['Common Stock', 'Common Stock', 'Common Stock', 'Common Stock', 'Common Stock'],
        'alt': [1, 1, 1, 1, 1]}

df2 = pd.DataFrame(data)

这是数据框形式的样子,

     commonshortname altshortname  Code           Type   common
0          SNX.US       SNX.US      SNX   Common Stock     1
1       002400.CH    002400.SHE  002400  Common Stock      1
2          CDW.US       CDW.US      CDW   Common Stock     1
3          CEC.GR     CEC.XETRA     CEC  Common Stock      1
4       300002.CH    300002.SHE  300002  Common Stock      1
...           ...          ...     ...           ...  ...

     altshortname    Code         Type         alt
0         SEDG.US    SEDG  Common Stock          1
1         MHLD.US    MHLD  Common Stock          1
2          CDW.US     CDW  Common Stock          1
3         POLA.US    POLA  Common Stock          1
4        PHASQ.US   PHASQ  Common Stock          1

我想合并这 2 行,这样如果它们都存在,则获取顶部数据框中的数据,并为其在 alt 列中添加 1。

最终的框架应该是这样的,

     commonshortname altshortname  Code           Type   common   alt
0          SNX.US       SNX.US      SNX   Common Stock     1
1       002400.CH    002400.SHE  002400  Common Stock      1
2          CDW.US       CDW.US      CDW   Common Stock     1       1
3          CEC.GR     CEC.XETRA     CEC  Common Stock      1
4       300002.CH    300002.SHE  300002  Common Stock      1
0                      SEDG.US    SEDG  Common Stock               1
1                      MHLD.US    MHLD  Common Stock               1
3                      POLA.US    POLA  Common Stock               1
4                     PHASQ.US   PHASQ  Common Stock               1

基本上,如果数据来自df1,common栏会有1,如果来自df2,alt栏会有1,如果来自两者,两者都会有1专栏。

这可以在 pandas 中完成吗?

我尝试进行合并,但它一直按列加入,最终我得到了数百万行。

merged_df = pd.merge(df1, df2, on=['altshortname', 'Code', 'Type'], how='outer')
python pandas dataframe merge concatenation
2个回答
0
投票

IIUC你需要的是一个

concat
drop_duplicates

out = pd.concat([df1, df2], ignore_index=True).drop_duplicates(
    ["altshortname", "Code", "Type"], ignore_index=True
)

0
投票

这是一个可能的解决方案:

merged_df = pd.merge(df1, df2, on=['altshortname', 'Code', 'Type'], how='outer')
merged_df.fillna(0, inplace=True)

merged_df[['common', 'alt']] = merged_df[['common', 'alt']].astype(int)
merged_df.replace(0, '', inplace=True)
print(merged_df)

  commonshortname altshortname    Code          Type common alt
0          SNX.US       SNX.US     SNX  Common Stock      1    
1       002400.CH   002400.SHE  002400  Common Stock      1    
2          CDW.US       CDW.US     CDW  Common Stock      1   1
3          CEC.GR    CEC.XETRA     CEC  Common Stock      1    
4       300002.CH   300002.SHE  300002  Common Stock      1    
5                      SEDG.US    SEDG  Common Stock          1
6                      MHLD.US    MHLD  Common Stock          1
7                      POLA.US    POLA  Common Stock          1
8                     PHASQ.US   PHASQ  Common Stock          1
© www.soinside.com 2019 - 2024. All rights reserved.