我是R和ML的新手,但有一个我想回答的焦点问题。
我正在使用我自己的数据,但是按照Matt Dancho的例子来预测损耗:http://www.business-science.io/business/2017/09/18/hr_employee_attrition.html
我根据他的更新删除了零方差和缩放变量。
我的问题是在explaininer步骤上运行explain()。当我运行前一个原始代码和后一个变体时,我得到下面两个错误的变化(粗体)。其他一切都达到了这一点。
explanation <- lime::explain(
as.data.frame(test_h2o[1:10,-1]),
explainer = explainer,
n_labels = 1,
n_features = 4,
kernel_width = 0.5)
得到:
Error during wrapup: arguments imply differing number of rows: 50000, 0
而
explanation <- lime::explain(
as.data.frame(test_h2o[1:500,-1]),
explainer = explainer,
n_labels = 1,
n_features = 5,
kernel_width = 1)
得到:
ERROR: Unexpected HTTP Status code: 500 Server Error (url = http://localhost:54321/3/PostFile?destination_frame=C%3A%2FUsers%2Fsim.s%2FAppData%2FLocal%2FTemp%2FRtmpykNkl1%2Ffileb203a8d4a58.csv_sid_afd3_26)
Error: lexical error: invalid char in json text.
<html> <head> <meta http-equiv=
(right here) ------^
如果您对此问题有任何想法或见解,或者需要我提供其他信息,请与我们联系。
试试这个,让我知道你得到了什么。请注意,这假设您的Excel文件存储在工作目录中名为“data”的文件夹中。使用getwd()
和setwd()
来获取/设置工作目录(或使用RStudio IDE中的Projects)。
library(h2o) # Professional grade ML pkg
library(tidyquant) # Loads tidyverse and several other pkgs
library(readxl) # Super simple excel reader
library(lime) # Explain complex black-box ML models
library(recipes) # Preprocessing for machine learning
hr_data_raw_tbl <- read_excel(path = "data/WA_Fn-UseC_-HR-Employee-Attrition.xlsx")
hr_data_organized_tbl <- hr_data_raw_tbl %>%
mutate_if(is.character, as.factor) %>%
select(Attrition, everything())
recipe_obj <- hr_data_organized_tbl %>%
recipe(formula = Attrition ~ .) %>%
step_rm(EmployeeNumber) %>%
step_zv(all_predictors()) %>%
step_center(all_numeric()) %>%
step_scale(all_numeric()) %>%
prep(data = hr_data_organized_tbl)
hr_data_bake_tbl <- bake(recipe_obj, newdata = hr_data_organized_tbl)
h2o.init()
hr_data_bake_h2o <- as.h2o(hr_data_bake_tbl)
hr_data_split <- h2o.splitFrame(hr_data_bake_h2o, ratios = c(0.7, 0.15), seed = 1234)
train_h2o <- h2o.assign(hr_data_split[[1]], "train" ) # 70%
valid_h2o <- h2o.assign(hr_data_split[[2]], "valid" ) # 15%
test_h2o <- h2o.assign(hr_data_split[[3]], "test" ) # 15%
y <- "Attrition"
x <- setdiff(names(train_h2o), y)
automl_models_h2o <- h2o.automl(
x = x,
y = y,
training_frame = train_h2o,
validation_frame = valid_h2o,
leaderboard_frame = test_h2o,
max_runtime_secs = 15
)
automl_leader <- automl_models_h2o@leader
explainer <- lime::lime(
as.data.frame(train_h2o[,-1]),
model = automl_leader,
bin_continuous = FALSE
)
explanation <- lime::explain(
x = as.data.frame(test_h2o[1:10,-1]),
explainer = explainer,
n_labels = 1,
n_features = 4,
n_permutations = 500,
kernel_width = 1
)
explanation