R中关联规则的数据准备-到事务的数据帧

问题描述 投票:0回答:3

我的数据来自SQL数据库,并采用表格形式,其中单个事务有多个行。我不仅要使用“产品”字段,还希望使用数据框中的所有其他列。

我的数据如下:

transID <- c('1','1','2','3')
state <- c('TX','TX','CA','MA')
product <- c('Oranges','Banana','Fish','Cheese')
Month <- c('January','January','Febuary','March')
Place <- c('A','A','B','C')

transactions <- data.frame(transID,state,product,Month,Place)

transactions
  transID state product   Month Place
1       1    TX Oranges January     A
2       1    TX  Banana January     A
3       2    CA    Fish Febuary     B
4       3    MA  Cheese   March     C

理想情况下,我的数据如下:

1 (TX,Oranges,Banana,January,A)
2 (CA,Fish,Febuary,B)
3 (MA, Cheese, March,C)

将这种数据转换为事务格式的最佳方法是什么?

我尝试了以下操作,但我仅将第1行和第2行作为一个事务连接在一起:

transactionData <- ddply(transactions,c("transID"),
                         function(df1) paste(df1$state,
                                             df1$product,
                                             df1$Month,
                                             df1$Place,
                                             collapse = ","))
r data-mining apriori
3个回答
0
投票

这样的[[重塑呢?

reshape(transactions,v.names = "product",timevar = "product",idvar = "state", direction = "wide") transID state Month Place product.Oranges product.Banana product.Fish product.Cheese 1 1 TX January A Oranges Banana <NA> <NA> 3 2 CA Febuary B <NA> <NA> Fish <NA> 4 3 MA March C <NA> <NA> <NA> Cheese

0
投票
这是基本解决方案:

stack(tapply(transactions[, -1], transactions[, 1, drop = F], FUN = function(DF) { paste(unique(unlist(DF), use.names = F), collapse = ',') }))[, 2:1] # ind values #1 1 TX,Oranges,Banana,January,A #2 2 CA,Fish,Febuary,B #3 3 MA,Cheese,March,C

主要部分是tapply()部分,该部分被transID分割,然后取消列出data.frame的其余部分,并且仅保留唯一值。这是tapply()调用的输出。

1 2 3 "TX,Oranges,Banana,January,A" "CA,Fish,Febuary,B" "MA,Cheese,March,C"

[stack()[, 2:1]纯粹是为了产生漂亮的data.frame而排列整齐的外观。

0
投票
这有点尴尬,因为data.frames存储因素。

library("arules") # make all columns into items df <- data.frame( id = transactions$transID, items = factor(c(as.character(transactions$state), as.character(transactions$product), as.character(transactions$Month), as.character(transactions$Place)))) # remove duplicated state, month and place enties df <- df[!duplicated(df),] # this is from the manual page '? transactions' trans <- as(split(df[,"items"], df[,"id"]), "transactions") inspect(trans) items transactionID [1] {A,Banana,January,Oranges,TX} 1 [2] {B,CA,Febuary,Fish} 2 [3] {C,Cheese,MA,March} 3

我希望这会有所帮助。
© www.soinside.com 2019 - 2024. All rights reserved.