我有这个调查数据集,我想指出用户的偏好。数据集看起来像这样:
User Men Women Non-bi Asexual
1 Men Women
2 Men
3 Women Non-bi
4 Asexual
我想知道有多少用户喜欢男性、女性、非双性恋者或两者的组合。在python中有没有一种简单的方法来计算这些数据并给出统计信息?
一个想法是将所有答案转换为列表或一列,然后我可以计算不同组合的实例。
你也许可以使用:
target = ['Men', 'Women', 'Non-bi']
# get rid of non relevant column
tmp = df.drop(columns='User').notna()
# keep users having at least one target
m1 = tmp[target].any(axis=1)
# drop rows having another match
m2 = ~tmp.drop(columns=target).any(axis=1)
# count
count = (m1&m2).sum()
输出:
3
用途:
s = df.set_index('User').apply(lambda x: frozenset(x.dropna()), axis=1).value_counts()
print (s)
(Asexual) 1
(Women, Non-bi) 1
(Men) 1
(Women, Men) 1
dtype: int64
或:
s = df.set_index('User').stack().groupby('User').agg(frozenset).value_counts()
print (s)
(Asexual) 1
(Women, Non-bi) 1
(Men) 1
(Women, Men) 1
Name: value, dtype: int64