如何为我的 ML 模型添加超参数调整和交叉验证?

问题描述 投票:0回答:0

我有这个集成学习功能,它训练多个 lightGBM 模型,然后在随机森林元学习器中使用这些模型(对于每个单独样本)的预测分数作为预测特征,连同原始预测特征。

所以lightGBM模型的结果被用于随机森林分类器的学习。

我现在需要的是稍微调整一下代码,并添加一些超参数调整、10 折交叉验证和 lightGBM 树的早期停止。我怎样才能将它们添加到我的代码中?

ensemble_learning <- function(data) {
  # split data into train and test sets
  
  train <- as.matrix(data[[1]])
  test <- as.matrix(data[[2]])
  
  train[, -1] <- apply(train[, -1], 2, function(x) as.numeric(x))
  test[, -1] <- apply(test[, -1], 2, function(x) as.numeric(x))
  
  dtrain <- lgb.Dataset(train[, -1], label = train[, 1])
  dtest <- lgb.Dataset.create.valid(dtrain, test[,-1], label = test[,1])
  myparams <- list(
    objective = "binary",
    metric = "auc",
    min_data = 1L,
    learning_rate = 1.0,
    type = 'binary'
  )
  
  myvalids <- list(test = dtest)
  
  # create a list to store the base models
  base_models <- list()
  
  # train the base models using lightGBM
  for (i in 1:15) {
    
    model <- lgb.train(
      data = dtrain,
      params = myparams,
      nrounds = 100,
      valids = myvalids
    )
    
    # add the trained model to the list of base models
    base_models[[i]] <- model
  }
  
  # create a list to store the predictions from the base models
  base_preds <- list()
  
  # make predictions on the test set using the base models
  for (i in 1:15) {
    
    # make predictions using the i-th base model
    preds <- predict(base_models[[i]], test[,-1],  type = "response")
    
    # add the predictions to the list of base predictions
    base_preds[[i]] <- preds
  }
  
  # test <- test[complete.cases(test),]
  
  # combine the base predictions into a single data frame
  base_preds_df <- do.call(cbind, base_preds)
  rownames(base_preds_df) = rownames(test)
  base_preds_df = as.data.frame(base_preds_df)

  supertest = cbind(test, base_preds_df)
  rownames(supertest) = rownames(base_preds_df)
  supertest = as.data.frame(supertest)
  supertest[, -1] <- apply(supertest[, -1], 2, function(x) as.numeric(x))
  supertest[,1] = factor(supertest[,1])

  IND = createDataPartition(y = supertest[,1], p = 0.7, list = FALSE)
  data_train = supertest[IND, ]
  data_test = supertest[-IND,]
  
  train_features <- data_train[, -1] # Exclude the first column (target variable)
  train_target <- data_train[, 1] # First column (target variable)
  
  meta_model <- randomForest(train_features, train_target, ntree = 500, mtry = 5)
  
  # return the final ensemble model
  final = list(meta_model,data_test,data_train)
  return(final)
}
r machine-learning random-forest cross-validation
© www.soinside.com 2019 - 2024. All rights reserved.