合并超过50%匹配两个df列中的列表字符串的列表字符串

问题描述 投票:2回答:1

我有两个df:

 df1 = pd.DataFrame({'col1': ['ID1', 'ID2','ID3','ID4','ID5','ID6','ID7'], 'col2': ["S3,S22,S44", "S133,S32,S334", "S13,S24,S45", "S1,S2,S4,S5", "S3,S4,S5", "S3,S2,S5", "S38,S42,S9"],'col3': ['ab', 'ac','ad','ae','af','as','ak'],})
df2 = pd.DataFrame({'name1': ['Ik3', 'Ik1','Ik2','Ik7','Ik5','Ik6','Ik5'], 'col2': ["S3, S44, S22,S54", "S133, S32,S334, S30", "S13, S24,S45", "S11, S21,S4, S5", "S3, S4,S5", "S3, S2,S5", "S3, S4,S9, S10, S13"],'col3': ['ab', 'ac','ad','ae','af','as','ak'],})

想比较两个df的col2列表并合并匹配度超过50%的对象,其余部分留空:

所需的输出:

pd.DataFrame({'name1': ['ID1', 'ID2','ID3','ID4','ID5','ID6','ID7'], 'col2': ["S3, S22,S44", "S133, S32,S334", "S13, S2 4,S45", "S1, S2,S4 S5", "S3, S4,S5", "S3, S2,S5", "S3, S4,S9"],'col3': ['ab', 'ac','ad','ae','af','as','ak'],'nma1': ['Ik3', 'Ik1','Ik2','Ik5','Ik5','Ik6','nan'],'percentage': ['75', '50','100','50','100','100','0']})

我尝试过使用isin函数:

df1[df1.col2.isin(df2.col2)]

但未获得所需的输出。任何建议表示赞赏。

python python-3.x pandas merge concat
1个回答
2
投票

您的描述和输出不匹配。但是,这里有一些代码希望可以帮助您入门。

def get_ratios(df1, df2):
  for a,b in zip(df1.col2, df2.col2):
    clean = lambda s: list(map(str.strip, s.split(',')))
    vals1, vals2 = clean(a), clean(b)

    inter = set(vals1).intersection(vals2)
    ratio = len(inter)/max(len(vals1), len(vals2))

    yield ratio

s = pd.Series(get_ratios(df1, df2))

然后merge

(df1.merge(df2.rename(columns={'col1': 'nma1'}), 
          on=['col3'])
    .assign(percentage=s)) #.where(s > 0.5)

  col1          col2_x col3 nma1               col2_y  percentage
0  ID1     S3, S22,S44   ab  Ik3     S3, S44, S22,S54    0.750000
1  ID2  S133, S32,S334   ac  Ik1  S133, S32,S334, S30    0.750000
2  ID3   S13, S2 4,S45   ad  Ik2         S13, S24,S45    0.666667
3  ID4    S1, S2,S4 S5   ae  Ik7      S11, S21,S4, S5    0.000000
4  ID5       S3, S4,S5   af  Ik5            S3, S4,S5    1.000000
5  ID6       S3, S2,S5   as  Ik6            S3, S2,S5    1.000000
6  ID7       S3, S4,S9   ak  Ik5  S3, S4,S9, S10, S13    0.600000
© www.soinside.com 2019 - 2024. All rights reserved.