如何在一个热编码数据帧中找到唯一的组合?

问题描述 投票:1回答:1

我有一个名为test的数据框,看起来像这样

+-------+---------+---------+---------+------------+
|       | Term 1  | Term 2  | Term 3  | Final Exam |
+-------+---------+---------+---------+------------+
| 1288  |      0  |      0  |      1  |          1 |
| 1290  |      1  |      1  |      1  |          1 |
| 1294  |      0  |      0  |      1  |          1 |
| 1296  |      1  |      1  |      1  |          1 |
| 1297  |      1  |      1  |      1  |          1 |
| 1304  |      0  |      1  |      1  |          1 |
| 1308  |      0  |      0  |      1  |          1 |
| 1324  |      1  |      1  |      1  |          1 |
| 1325  |      1  |      1  |      1  |          1 |
| 1332  |      1  |      1  |      1  |          1 |
+-------+---------+---------+---------+------------+

我想要一个所有唯一组合的汇总表,其中column = 1及其出现的次数:

+-----------------------------------+-----------+
|            Combination            | Frequency |
+-----------------------------------+-----------+
| Term 3, Final Exam                |         3 |
| Term 2, Term 3, Final Exam        |         1 |
| Term 1, Term2, Term 3, Final Exam |         6 |
+-----------------------------------+-----------+

我已经尝试过使用mlxtend.apriori,但这会让我出现所有列:

from mlxtend.frequent_patterns import apriori
results = apriori(test,min_support=0.00001,use_colnames=True)
results['length'] = results['itemsets'].apply(lambda x:len(x))
numberofcases = test.shape[0]
results['Frequency'] = results['support'] * numberofcases
results['Terms'] = results['itemsets'].astype(str).str.replace('frozenset\({','').str.replace('}\)','').str.replace('\'','').str.replace('\"','')
results[results['length'] > 1][['Terms','Frequency']]

结果集:

+-----+-------------------------------------+-----------+
|     |               Terms                 | Frequency |
+-----+-------------------------------------+-----------+
|  4  | Term 2, Term 1                      |       6.0 |
|  5  | Term 3, Term 1                      |       6.0 |
|  6  | Final Exam, Term 1                  |       6.0 |
|  7  | Term 2, Term 3                      |       7.0 |
|  8  | Term 2, Final Exam                  |       7.0 |
|  9  | Term 3, Final Exam                  |      10.0 |
| 10  | Term 2, Term 3, Term 1              |       6.0 |
| 11  | Term 2, Final Exam, Term 1          |       6.0 |
| 12  | Term 3, Final Exam, Term 1          |       6.0 |
| 13  | Term 2, Term 3, Final Exam          |       7.0 |
| 14  | Term 2, Term 3, Final Exam, Term 1  |       6.0 |
+-----+-------------------------------------+-----------+

apriori中是否有一些参数可以产生预期的结果或其他方式来做到这一点?

pandas apriori
1个回答
2
投票

dotvalue_counts

df.dot(df.columns+',').str[:-1].value_counts()
Out[419]: 
Term1,Term2,Term3,FinalExam    6
Term3,FinalExam                3
Term2,Term3,FinalExam          1
dtype: int64
© www.soinside.com 2019 - 2024. All rights reserved.