我对lists的使用很陌生,因此,如果这个问题听起来很愚蠢,我深表歉意。
从最初的459,046位客户中,我创建了一个函数,该函数将基础拆分并存储在列表的多个元素中。
sampled_list <- baseSample(dataset = clv_df_cbs, sample.size = 10000, seed = 12345)
执行此功能(baseSample),您将获得一个新的对象列表,其中包含互斥的客户组(每个组将由10,000个客户组成-除了最后一个可能较小的客户之外,具体取决于初始客户)。音量)
> sampled_list <- baseSample(dataset = clv_df_cbs, sample.size = 10000, seed = 12345)
[1] "Seed: 12345"
[1] "Total groups created: 46"
[1] "Group size: 10000"
在这种情况下,输出是存储在名为sample_list的对象中的46个元素的列表。
现在,我希望将这46个元素中的每一个传递给BTYD模型,该模型将预测接下来90天内的交易数量(根据输入的经验得出的结果。)>
之所以无法将完整的数据集传递给BTYD模型,是因为该模型大量使用mcmc,因此计算时间过长,导致模型无法提供任何输出。因此,我决定多次运行同一模型(在足够大的样本上)来生成预测,直到我设法将所有基础都作为模型输入。
需要对每个元素执行的操作如下
# Estimate parameters for element1 of the list pggg.draws1 <- pggg.mcmc.DrawParameters(element1, mcmc = 1000, # number of MCMC steps burnin = 250, # number of initial MCMC steps which are discarded thin = 10, # only every thin-th MCMC step will be returned chains = 2, # number of MCMC chains to be run trace = 50) # print logging step every trace iteration # generate draws for holdout period pggg.xstar.draws1 <- mcmc.DrawFutureTransactions(element1, pggg.draws1) # conditional expectations element1$xstar.pggg <- apply(pggg.xstar.draws1, 2, mean) # P(active) element1$pactive.pggg <- mcmc.PActive(pggg.xstar.draws1) # P(alive) element1$palive.pggg <- mcmc.PAlive(pggg.draws1) # show estimates for first few customers head(element1[, c("x", "t.x", "x.star", "xstar.pggg", "pactive.pggg", "palive.pggg")],50) # report median cohort-level parameter estimates round(apply(as.matrix(pggg.draws1$level_2), 2, median), 3) # report mean over median individual-level parameter estimates median.est1 <- sapply(pggg.draws1$level_1, function(draw) { apply(as.matrix(draw), 2, median) }) round(apply(median.est1, 1, mean), 3)
理想情况下,输出应直接存储到新的data.frame中-这样我就可以检索ID和预测(以及最初包含在数据集中的其他内容)。
下面是一些可从公开数据集中播放的模拟数据。
library(BTYDplus) library(tidyverse) data("groceryElog") dataset<-elog2cbs(groceryElog, T.cal = "2006-12-01") # FUNCTION baseSample #### baseSample <- function(dataset, sample.size, seed=NULL) { seed.value <- if(is.null(seed)) { as.numeric(format(Sys.Date(),"%Y"))*10000+as.numeric(format(Sys.Date(),"%m"))*100+as.numeric(format(Sys.Date(),"%d")) } else { seed } set.seed(seed.value) # RE-ORDER DATA FRAME (SAME LENGTH) data <- with(dataset, dataset[order(sample(cust, nrow(dataset))),]) # BUILD A LIST OF DFs set.sample.size <- sample.size data$cycles_group <- paste0("sample_", ceiling(1:nrow(data)/set.sample.size)) df_list <- split(data, data$cycles_group) print(paste0("Seed: ", seed.value)) print(paste0("Total groups created: ", length(unique(data$cycles_group)))) print(paste0("Group size: ", set.sample.size)) return(df_list) #print(df_list) } # ** OUTPUT: Base split in lists #### sampled_list <- baseSample(dataset = dataset, sample.size = 100, seed = 12345)
谢谢
我对使用列表还很陌生,因此,如果这个问题听起来很愚蠢,我深表歉意。我从最初的459,046位客户中创建了一个函数,该函数将基础拆分并存储在多个...
在基数R中,可以使用lapply
在列表的元素上迭代一个函数,并使用这些迭代的结果返回一个新列表。使用示例代码生成名为baseSample
...]的列表后
# turn the code for the operations you want to perform on each list element into a function
thingy <- function(i) {
# Estimate parameters for element1 of the list
pggg.draws1 <- pggg.mcmc.DrawParameters(i,
mcmc = 1000, # number of MCMC steps
burnin = 250, # number of initial MCMC steps which are discarded
thin = 10, # only every thin-th MCMC step will be returned
chains = 2, # number of MCMC chains to be run
trace = 50) # print logging step every trace iteration
# generate draws for holdout period
pggg.xstar.draws1 <- mcmc.DrawFutureTransactions(i, pggg.draws1)
# conditional expectations
i$xstar.pggg <- apply(pggg.xstar.draws1, 2, mean)
# P(active)
i$pactive.pggg <- mcmc.PActive(pggg.xstar.draws1)
# P(alive)
i$palive.pggg <- mcmc.PAlive(pggg.draws1)
# show estimates for first few customers [commenting out for this iterated version]
# head(element1[, c("x", "t.x", "x.star", "xstar.pggg", "pactive.pggg", "palive.pggg")],50)
# report median cohort-level parameter estimates
round(apply(as.matrix(pggg.draws1$level_2), 2, median), 3)
# report mean over median individual-level parameter estimates
median.est1 <- sapply(pggg.draws1$level_1, function(draw) {
apply(as.matrix(draw), 2, median)
})
# get the bits you want in a named vector
z <- round(apply(median.est1, 1, mean), 3)
# convert that named vector of results into a one-row data frame to make collapsing easier
data.frame(as.list(z))
}
# now use lapply to iterate that function over the elements of your list
results <- lapply(sampled_list, thingy)
# now bind the results into a data frame
boundresults <- do.call(rbind, results)