我有以下2个数据框,
df1
,
import pandas as pd
data = {
'commonshortname': ['SNX.US', '002400.CH', 'CDW.US', 'CEC.GR', '300002.CH'],
'altshortname': ['SNX.US', '002400.SHE', 'CDW.US', 'CEC.XETRA', '300002.SHE'],
'Code': ['SNX', '002400', 'CDW', 'CEC', '300002', ...],
'Type': ['Common Stock', 'Common Stock', 'Common Stock', 'Common Stock', 'Common Stock'],
'common': [1, 1, 1, 1, 1]
}
df1 = pd.DataFrame(data)
和
df2
看起来像这样,
data = {'altshortname': ['SEDG.US', 'MHLD.US', 'CDW.US', 'POLA.US', 'PHASQ.US'],
'Code': ['SEDG', 'MHLD', 'CDW', 'POLA', 'PHASQ'],
'Type': ['Common Stock', 'Common Stock', 'Common Stock', 'Common Stock', 'Common Stock'],
'alt': [1, 1, 1, 1, 1]}
df2 = pd.DataFrame(data)
这是数据框形式的样子,
commonshortname altshortname Code Type common
0 SNX.US SNX.US SNX Common Stock 1
1 002400.CH 002400.SHE 002400 Common Stock 1
2 CDW.US CDW.US CDW Common Stock 1
3 CEC.GR CEC.XETRA CEC Common Stock 1
4 300002.CH 300002.SHE 300002 Common Stock 1
... ... ... ... ... ...
和
altshortname Code Type alt
0 SEDG.US SEDG Common Stock 1
1 MHLD.US MHLD Common Stock 1
2 CDW.US CDW Common Stock 1
3 POLA.US POLA Common Stock 1
4 PHASQ.US PHASQ Common Stock 1
我想合并这 2 行,这样如果它们都存在,则获取顶部数据框中的数据,并为其在 alt 列中添加 1。
最终的框架应该是这样的,
commonshortname altshortname Code Type common alt
0 SNX.US SNX.US SNX Common Stock 1
1 002400.CH 002400.SHE 002400 Common Stock 1
2 CDW.US CDW.US CDW Common Stock 1 1
3 CEC.GR CEC.XETRA CEC Common Stock 1
4 300002.CH 300002.SHE 300002 Common Stock 1
0 SEDG.US SEDG Common Stock 1
1 MHLD.US MHLD Common Stock 1
3 POLA.US POLA Common Stock 1
4 PHASQ.US PHASQ Common Stock 1
基本上,如果数据来自df1,common栏会有1,如果来自df2,alt栏会有1,如果来自两者,两者都会有1专栏。
这可以在 pandas 中完成吗?
我尝试进行合并,但它一直按列加入,最终我得到了数百万行。
merged_df = pd.merge(df1, df2, on=['altshortname', 'Code', 'Type'], how='outer')
concat
和drop_duplicates
out = pd.concat([df1, df2], ignore_index=True).drop_duplicates(
["altshortname", "Code", "Type"], ignore_index=True
)
这是一个可能的解决方案:
merged_df = pd.merge(df1, df2, on=['altshortname', 'Code', 'Type'], how='outer')
merged_df.fillna(0, inplace=True)
merged_df[['common', 'alt']] = merged_df[['common', 'alt']].astype(int)
merged_df.replace(0, '', inplace=True)
print(merged_df)
commonshortname altshortname Code Type common alt
0 SNX.US SNX.US SNX Common Stock 1
1 002400.CH 002400.SHE 002400 Common Stock 1
2 CDW.US CDW.US CDW Common Stock 1 1
3 CEC.GR CEC.XETRA CEC Common Stock 1
4 300002.CH 300002.SHE 300002 Common Stock 1
5 SEDG.US SEDG Common Stock 1
6 MHLD.US MHLD Common Stock 1
7 POLA.US POLA Common Stock 1
8 PHASQ.US PHASQ Common Stock 1