基于数据框中列值的数据框行的交集

问题描述 投票:0回答:2

我有一个df,如下所示。我正在尝试根据主机列的值找到行的交集。

host    values 
test    ['A','B','C','D']
test    ['D','E','B','F']
prod    ['1','2','A','D','E']
prod    []
prod    ['2']

预期输出是一行与下一行的交集 如果主机值相同。对于上述df,输出为

test=['B','D'] - intersection of row 1 and 2
prod=[] - intersection of row 3 and 4
prod=[] - intersection of row 4 and 5

第2行和第3行的交集不执行,因为主机列值不匹配。任何帮助表示赞赏。

python python-3.x pandas dataframe intersection
2个回答
0
投票

不确定所需结果的结构,但是您可以使用shift为每组主机创建一列。然后使用apply,其中此新列为notna并进行set s的交集。

df['val_shift'] = df.groupby('host')['values'].shift()
df['intersect'] = df[df['val_shift'].notna()]\
                    .apply(lambda x: list(set(x['values'])&set(x['val_shift'])), axis=1)
print (df)
   host           values        val_shift intersect
0  test     [A, B, C, D]              NaN       NaN
1  test     [D, E, B, F]     [A, B, C, D]    [B, D]
2  host  [1, 2, A, D, E]              NaN       NaN
3  host               []  [1, 2, A, D, E]        []
4  host              [2]               []        []

0
投票

可以通过自定义功能将df.groupbySeriesGroupBy.apply一起使用。

def f(s):
    s = pd.concat([s,s.shift(-1)],axis=1).dropna(how='any')
    return s.apply(lambda x:f'{set(x[0])&set(x[1])} between row {x.name+1} and {x.name+2}',axis=1)

df.groupby('host')['values'].apply(f)
host
prod  2         set() between row 3 and 4
      3         set() between row 4 and 5
test  0    {'D', 'B'} between row 1 and 2
Name: values, dtype: object

# If you don't want index
# df.groupby('host')['values'].apply(f).reset_index(drop=True)

# 0         set() between 3 and 4
# 1         set() between 4 and 5
# 2    {'D', 'B'} between 1 and 2
# Name: values, dtype: object

要获得[]['D', 'B']而不是set(){'D', 'B'}的输出,请尝试此。

def f(s):
    s = pd.concat([s,s.shift(-1)],axis=1).dropna(how='any')
    return s.apply(lambda x:f'{[*(set(x[0])&set(x[1]))]} between row {x.name+1} and {x.name+2}',axis=1)

df.groupby('host')['values'].apply(f).reset_index(drop=True)

0            [] between 3 and 4
1            [] between 4 and 5
2    ['D', 'B'] between 1 and 2
Name: values, dtype: object
© www.soinside.com 2019 - 2024. All rights reserved.