是否可以计算列表列中、方法链中的一行和前一行之间的公共项的数量? 我下面的代码抛出错误'TypeError: unhashable type: 'list''
import pandas as pd
df = pd.DataFrame({
'x':[1,2,3,4],
'list_column': [
['apple', 'banana', 'cherry'],
['banana', 'cherry'],
['cherry', 'date', 'fig'],
['orange']
]
})
res = len(set(df.loc[1,'list_column']) & set(df.loc[0,'list_column']))
res
df=(df
.assign(
list_length=lambda x: x['list_column'].str.len(),
nr_common=lambda x: (set(x['list_column']) & set(x['list_column'].shift(1))).len()
)
)
df
我会用:
df.assign(sets=lambda d: d['list_column'].apply(set),
common=lambda d: d['sets']-d['sets'].diff(),
n_common=lambda d: d['common'].str.len(),
)
输出:
x list_column sets common n_common
0 1 [apple, banana, cherry] {apple, cherry, banana} NaN NaN
1 2 [banana, cherry] {cherry, banana} {banana, cherry} 2.0
2 3 [cherry, date, fig] {date, cherry, fig} {cherry} 1.0
3 4 [orange] {orange} {} 0.0