数据挖掘中的Apriori算法 - 如何解决Python中有关TransactionEncoder()的TyperError?

问题描述 投票:0回答:1

我正在尝试将 apriori 算法合并到 python 程序中,但是“te_ary = te.fit(dataset).transform(dataset)”行出现类型错误。我相信这与我从计算机读取数据集而不是手动将其输入到 jupyter 笔记本中有关。我认为它可能在我声明“frequent_itemsets”的行中处理了我的变量,但错误来自第 3 行?

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from apyori import apriori


from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori

filename = '/Users/emitsch/Documents/Database 1.csv'

#loading the excel spreadsheet file with my database
dataset = pd.read_csv(filename, header = None)

te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset)
df = pd.DataFrame(te_ary, columns=te.columns_)
frequent_itemsets = apriori(df, min_support=0.6, use_colnames=True)

这是错误:

TypeError                                 Traceback (most recent call last)
<ipython-input-19-ff180148a5c5> in <module>
      1 te = TransactionEncoder()
----> 2 te_ary = te.fit(dataset).transform(dataset)
      3 df = pd.DataFrame(te_ary, columns=te.columns_)
      4 frequent_itemsets = apriori(df, min_support=0.6, use_colnames=True)

    //anaconda3/lib/python3.7/site-packages/mlxtend/preprocessing/transactionencoder.py in fit(self, X)
         54         unique_items = set()
         55         for transaction in X:
    ---> 56             for item in transaction:
         57                 unique_items.add(item)
         58         self.columns_ = sorted(unique_items)

    TypeError: 'int' object is not iterable
python database pandas data-mining apriori
1个回答
0
投票

这是一个带有小型交易数据集的简单示例(其中有 5 个项目,itemid 为 1 到 5,以及 4 个交易):

df = pd.DataFrame([[1, 2, pd.NA, pd.NA], 
                   [1, 3, pd.NA, pd.NA], 
                   [2, 3, 4, 5],
                   [1, 4, 5, pd.NA]], columns=['item1','item2','item3','item4'])
df.head()
#    item1  item2   item3   item4
#0   1      2       <NA>    <NA>
#1   1      3       <NA>    <NA>
#2   2      3       4       5
#3   1      4       5       <NA>

TransactionEncoder
接受列表的列表作为数据集,因此进行预处理

dataset = [[item for item in row if item is not pd.NA] for row in df.values]
dataset
# [[1, 2], [1, 3], [2, 3, 4, 5], [1, 4, 5]]

最后,在数据集上拟合

TransactionEncoder
并运行
apriori
算法来计算频繁项集:

te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset)
df = pd.DataFrame(te_ary, columns=te.columns_)
frequent_itemsets = apriori(df, min_support=0.5, use_colnames=True)
frequent_itemsets
#   support itemsets
#0  0.75    (1)
#1  0.75    (2)
#2  0.50    (3)
#3  0.50    (4)
#4  0.50    (5)
#5  0.50    (1, 2)
#6  0.50    (2, 3)
#7  0.50    (4, 5) 
© www.soinside.com 2019 - 2024. All rights reserved.