我正在尝试将 apriori 算法合并到 python 程序中,但是“te_ary = te.fit(dataset).transform(dataset)”行出现类型错误。我相信这与我从计算机读取数据集而不是手动将其输入到 jupyter 笔记本中有关。我认为它可能在我声明“frequent_itemsets”的行中处理了我的变量,但错误来自第 3 行?
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from apyori import apriori
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori
filename = '/Users/emitsch/Documents/Database 1.csv'
#loading the excel spreadsheet file with my database
dataset = pd.read_csv(filename, header = None)
te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset)
df = pd.DataFrame(te_ary, columns=te.columns_)
frequent_itemsets = apriori(df, min_support=0.6, use_colnames=True)
这是错误:
TypeError Traceback (most recent call last)
<ipython-input-19-ff180148a5c5> in <module>
1 te = TransactionEncoder()
----> 2 te_ary = te.fit(dataset).transform(dataset)
3 df = pd.DataFrame(te_ary, columns=te.columns_)
4 frequent_itemsets = apriori(df, min_support=0.6, use_colnames=True)
//anaconda3/lib/python3.7/site-packages/mlxtend/preprocessing/transactionencoder.py in fit(self, X)
54 unique_items = set()
55 for transaction in X:
---> 56 for item in transaction:
57 unique_items.add(item)
58 self.columns_ = sorted(unique_items)
TypeError: 'int' object is not iterable
这是一个带有小型交易数据集的简单示例(其中有 5 个项目,itemid 为 1 到 5,以及 4 个交易):
df = pd.DataFrame([[1, 2, pd.NA, pd.NA],
[1, 3, pd.NA, pd.NA],
[2, 3, 4, 5],
[1, 4, 5, pd.NA]], columns=['item1','item2','item3','item4'])
df.head()
# item1 item2 item3 item4
#0 1 2 <NA> <NA>
#1 1 3 <NA> <NA>
#2 2 3 4 5
#3 1 4 5 <NA>
TransactionEncoder
接受列表的列表作为数据集,因此进行预处理
dataset = [[item for item in row if item is not pd.NA] for row in df.values]
dataset
# [[1, 2], [1, 3], [2, 3, 4, 5], [1, 4, 5]]
最后,在数据集上拟合
TransactionEncoder
并运行 apriori
算法来计算频繁项集:
te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset)
df = pd.DataFrame(te_ary, columns=te.columns_)
frequent_itemsets = apriori(df, min_support=0.5, use_colnames=True)
frequent_itemsets
# support itemsets
#0 0.75 (1)
#1 0.75 (2)
#2 0.50 (3)
#3 0.50 (4)
#4 0.50 (5)
#5 0.50 (1, 2)
#6 0.50 (2, 3)
#7 0.50 (4, 5)