大熊猫与部分字符串匹配和条件合并

问题描述 投票:1回答:1

问题

我有两个数据框,分别为df1形状(1597, 37)df2形状(27293, 115)。它们都包含一些公司的名称,邮政编码和其他数据。名称不完全匹配。

逐步合并它们的手动步骤是:

  1. 按邮政编码过滤
  2. 检查df1df2中的公司名称以查找匹配的名称,并从df2中删除已经在df1中的公司。
  3. 将新公司从df2添加到df1
  4. 最终数据库为df1,并添加了来自df2的新公司。

如果名称匹配,但邮政编码不相同,我们假定是另一家公司并保留。

示例:

df1 = pd.DataFrame({'NAME': ['Company A', 'Company B', 'Company C', 'Company D'], 
                    'Postal Code': [9001, 9002, 9003, 9004]})    
df2 = pd.DataFrame({'Name': ['this is b', 'some company d', 'c is a company', 
                             'COMANY f', 'COMANY x', 'Company z','w company'], 
                    'CP': [9002, 9006, 9003, 9005, 9001, 9007, 9008],
                    'Some other data': np.random.randn(7)})

df1



    NAME        Postal Code  
0   Company A   9001         
1   Company B   9002         
2   Company C   9003         
3   Company D   9004         

df2


    Name            CP      Some other data
0   this is b       9002    1.867558
1   some company d  9006    -0.977278
2   c is a company  9003    0.950088
3   COMANY f        9005    -0.151357
4   COMANY x        9001    -0.103219
5   Company z       9007    0.410599
6   w company       9008    0.144044

所需的输出:

df1_merged

    NAME           Postal Code   Some other data
0   Company A       9001         NaN
1   Company B       9002         0.400157
2   Company C       9003         0.978738
3   Company D       9004         NaN
4   some company d  9006         -0.977278
5   COMANY f        9005         -0.151357
6   COMANY x        9001         -0.103219
7   Company z       9007         0.410599
8   w company       9008         0.144044
python pandas merge conditional-statements string-matching
1个回答
0
投票

您可以重命名df1列,然后合并:

df1 = df1.rename(columns={'NAME': 'Name', 'Postal Code': 'CP'}) 
df = pd.merge(left=df1, right=df2, how='outer')
print(df)

              Name    CP  Some other data
0        Company A  9001              NaN
1        Company B  9002              NaN
2        Company C  9003              NaN
3        Company D  9004              NaN
4        this is b  9002        -0.881567
5   some company d  9006         0.186404
6   c is a company  9003        -0.331076
7         COMANY f  9005        -1.645201
8         COMANY x  9001        -0.978169
9        Company z  9007         0.860190
10       w company  9008         0.020805
© www.soinside.com 2019 - 2024. All rights reserved.