根据列值拆分Python数据帧,然后在算法中使用它们

问题描述 投票:1回答:1

我目前正在使用mlxtend中的Apriori算法进行简单频繁的模式分析。目前,我只关注所有交易。但我想根据国家区分分析。我当前的脚本如下所示:

import pandas as pd
import numpy as np
import pyodbc
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

dataset = pd.read_sql_query("""some query"", cnxn)

# Transform/prep dataset into list data
dataset_tx = dataset.groupby(['ReceiptCode'])['ItemCategoryName'].apply(list).values.tolist()

# Define classifier
te = TransactionEncoder()

# Binary-transform dataset
te_ary = te.fit(dataset_tx).transform(dataset_tx)

# Fit to new dataframe (sparse dataframe)
df = pd.SparseDataFrame(te_ary, columns=te.columns_)

# Run algorithm 
frequent_itemsets = apriori(df, min_support=0.10, use_colnames=True)
frequent_itemsets['length'] = frequent_itemsets['itemsets'].apply(lambda x: len(x))
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.3)

以下是dataset的一个例子。

+----------------------+--+------------------+--+------------------+
|     ReceiptCode      |  | ItemCategoryName |  | StoreCountryName |
+----------------------+--+------------------+--+------------------+
|  0000P70322000031467 |  |  Food            |  |   Denmark        |
|  0000P70322000031867 |  |  Food            |  |   Denmark        |
|  0000P70322000051467 |  |  Interior        |  |   Germany        |
|  0000P70322000087468 |  |  Kitchen         |  |   Switzerland    |
|  0000P70322000031469 |  |  Leisure         |  |   Germany        |
|  0000P70322000031439 |  |  Food            |  |   Switzerland    |
+----------------------+--+------------------+--+------------------+

是否可以“自动”基于列StoreCountryName创建多个数据帧,然后在算法中使用它,即在分析中使用特定国家/地区的数据框并遍历所有国家/地区?我知道我可以手动创建数据帧,然后只应用转换和分析。

python pandas apriori
1个回答
2
投票

你可以qazxsw poi和列表理解来将数据帧存储在列表中然后迭代它们:

groupby

或者你可以创建一个函数并使用g = df.groupby('StoreCountryName') dfs = [group for _,group in g] for i in range(len(dfs)): dfs[i]['iteration'] = i # do stuff to each frame print(f"{dfs[i]} \n") ReceiptCode ItemCategoryName StoreCountryName iteration 0 0000P70322000031467 Food Denmark 0 1 0000P70322000031867 Food Denmark 0 ReceiptCode ItemCategoryName StoreCountryName iteration 2 0000P70322000051467 Interior Germany 1 4 0000P70322000031469 Leisure Germany 1 ReceiptCode ItemCategoryName StoreCountryName iteration 3 0000P70322000087468 Kitchen Switzerland 2 5 0000P70322000031439 Food Switzerland 2 groupby

apply
© www.soinside.com 2019 - 2024. All rights reserved.