如何将列表元素传递到R中的模型?

问题描述 投票:0回答:1

我对lists的使用很陌生,因此,如果这个问题听起来很愚蠢,我深表歉意。

从最初的459,046位客户中,我创建了一个函数,该函数将基础拆分并存储在列表的多个元素中。

sampled_list <- baseSample(dataset = clv_df_cbs, sample.size = 10000, seed = 12345)

执行此功能(baseSample),您将获得一个新的对象列表,其中包含互斥的客户组(每个组将由10,000个客户组成-除了最后一个可能较小的客户之外,具体取决于初始客户)。音量)

> sampled_list <- baseSample(dataset = clv_df_cbs, sample.size = 10000, seed = 12345)
[1] "Seed: 12345"
[1] "Total groups created: 46"
[1] "Group size: 10000"

在这种情况下,输出是存储在名为sample_list的对象中的46个元素的列表。

现在,我希望将这46个元素中的每一个传递给BTYD模型,该模型将预测接下来90天内的交易数量(根据输入的经验得出的结果。)>

之所以无法将完整的数据集传递给BTYD模型,是因为该模型大量使用mcmc,因此计算时间过长,导致模型无法提供任何输出。因此,我决定多次运行同一模型(在足够大的样本上)来生成预测,直到我设法将所有基础都作为模型输入。

需要对每个元素执行的操作如下

# Estimate parameters for element1 of the list
pggg.draws1 <- pggg.mcmc.DrawParameters(element1, 
                                           mcmc = 1000, # number of MCMC steps
                                           burnin = 250, # number of initial MCMC steps which are discarded
                                           thin = 10, # only every thin-th MCMC step will be returned
                                           chains = 2, # number of MCMC chains to be run
                                           trace = 50) # print logging step every trace iteration

# generate draws for holdout period
pggg.xstar.draws1 <- mcmc.DrawFutureTransactions(element1, pggg.draws1)

# conditional expectations
element1$xstar.pggg <- apply(pggg.xstar.draws1, 2, mean)

# P(active)
element1$pactive.pggg <- mcmc.PActive(pggg.xstar.draws1)

# P(alive)
element1$palive.pggg <- mcmc.PAlive(pggg.draws1)

# show estimates for first few customers
head(element1[, c("x", "t.x", "x.star",
                            "xstar.pggg", "pactive.pggg", "palive.pggg")],50)

# report median cohort-level parameter estimates
round(apply(as.matrix(pggg.draws1$level_2), 2, median), 3)

# report mean over median individual-level parameter estimates
median.est1 <- sapply(pggg.draws1$level_1, function(draw) {
  apply(as.matrix(draw), 2, median)
})
round(apply(median.est1, 1, mean), 3)

理想情况下,输出应直接存储到新的data.frame中-这样我就可以检索ID和预测(以及最初包含在数据集中的其他内容)。

下面是一些可从公开数据集中播放的模拟数据。

library(BTYDplus)
library(tidyverse)
data("groceryElog")
dataset<-elog2cbs(groceryElog, T.cal = "2006-12-01") 


# FUNCTION baseSample ####
baseSample <- function(dataset, sample.size, seed=NULL) {
  seed.value <- if(is.null(seed)) {
    as.numeric(format(Sys.Date(),"%Y"))*10000+as.numeric(format(Sys.Date(),"%m"))*100+as.numeric(format(Sys.Date(),"%d"))
  } else {
    seed
  }

  set.seed(seed.value)

  # RE-ORDER DATA FRAME (SAME LENGTH)
  data <- with(dataset, dataset[order(sample(cust, nrow(dataset))),])

  # BUILD A LIST OF DFs 
  set.sample.size <- sample.size
  data$cycles_group <- paste0("sample_", ceiling(1:nrow(data)/set.sample.size))

  df_list <- split(data, data$cycles_group)

  print(paste0("Seed: ", seed.value))
  print(paste0("Total groups created: ", length(unique(data$cycles_group))))
  print(paste0("Group size: ", set.sample.size))
  return(df_list)
  #print(df_list)
}

# ** OUTPUT: Base split in lists ####
sampled_list <- baseSample(dataset = dataset, sample.size = 100, seed = 12345)

谢谢

我对使用列表还很陌生,因此,如果这个问题听起来很愚蠢,我深表歉意。我从最初的459,046位客户中创建了一个函数,该函数将基础拆分并存储在多个...

r list
1个回答
0
投票

在基数R中,可以使用lapply在列表的元素上迭代一个函数,并使用这些迭代的结果返回一个新列表。使用示例代码生成名为baseSample ...]的列表后

# turn the code for the operations you want to perform on each list element into a function
thingy <- function(i) {

  # Estimate parameters for element1 of the list
  pggg.draws1 <- pggg.mcmc.DrawParameters(i, 
                                          mcmc = 1000, # number of MCMC steps
                                          burnin = 250, # number of initial MCMC steps which are discarded
                                          thin = 10, # only every thin-th MCMC step will be returned
                                          chains = 2, # number of MCMC chains to be run
                                          trace = 50) # print logging step every trace iteration

  # generate draws for holdout period
  pggg.xstar.draws1 <- mcmc.DrawFutureTransactions(i, pggg.draws1)

  # conditional expectations
  i$xstar.pggg <- apply(pggg.xstar.draws1, 2, mean)

  # P(active)
  i$pactive.pggg <- mcmc.PActive(pggg.xstar.draws1)

  # P(alive)
  i$palive.pggg <- mcmc.PAlive(pggg.draws1)

  # show estimates for first few customers [commenting out for this iterated version]
  # head(element1[, c("x", "t.x", "x.star", "xstar.pggg", "pactive.pggg", "palive.pggg")],50)

  # report median cohort-level parameter estimates
  round(apply(as.matrix(pggg.draws1$level_2), 2, median), 3)

  # report mean over median individual-level parameter estimates
  median.est1 <- sapply(pggg.draws1$level_1, function(draw) {
    apply(as.matrix(draw), 2, median)
  })

  # get the bits you want in a named vector
  z <- round(apply(median.est1, 1, mean), 3)

  # convert that named vector of results into a one-row data frame to make collapsing easier
  data.frame(as.list(z))

}

# now use lapply to iterate that function over the elements of your list
results <- lapply(sampled_list, thingy)

# now bind the results into a data frame
boundresults <- do.call(rbind, results)
© www.soinside.com 2019 - 2024. All rights reserved.