使用分层采样为决策树学习拆分数据帧

问题描述 投票:1回答:1

我想使用分层抽样创建培训和测试样本集。我尝试环顾四周,但是找到的所有包都返回一个数据帧而不是一个表达式。我用来构建树的树包要求将子集作为表达式给出。

示例代码:

library(tree)
library(ISLR)
library(dplyr)

Carseats <- Carseats %>% mutate(High = factor(ifelse(Sales <= 8, "No", "Yes")))

set.seed(2)
train_sample <- sample(nrow(Carseats), nrow(Carseats) * 0.7)
carseats_test <- Carseats[-train_sample,]

tree.carseats <- tree(High~ . -Sales, Carseats, subset = train_sample)

是否可以修改上述代码,以便使用分层进行采样?

r dplyr decision-tree
1个回答
0
投票

您可以做:

library(tree)
library(ISLR)
library(dplyr)

Carseats <- Carseats %>% mutate(High = factor(ifelse(Sales <= 8, "No", "Yes")))

mean(Carseats$High == "Yes")
[1] 0.41

train_sample <- Carseats %>% 
group_by(High) %>%
sample_n(0.7*n()) %>%
ungroup()

mean(train_sample$High == "Yes")
[1] 0.4086022
© www.soinside.com 2019 - 2024. All rights reserved.