创建函数时出错:'递归索引失败'

问题描述 投票:0回答:1

我正在尝试创建一个函数,当给定一个数据框和一个列时,它使用 Rosner 的测试 (EnvStats::rosnerTest) 来识别离群值并返回一个新的数据框,以便我可以检查每个离群值。

我可以在不使用函数的情况下实现这一点,但是因为我有一个包含很多变量的数据框,所以我想创建一个函数来更快地自动化它。 (我之前的帖子 显示了一次执行一个变量的工作流程。)

这是我的数据:

> dput(head(data))
structure(list(cap_date = structure(c(4856, 4860, 4860, 4861, 
4866, 4867), class = "Date"), cap_year = c(1983L, 1983L, 1983L, 
1983L, 1983L, 1983L), age_class = c("A", "S", "S", "A", "A", "A"), sex = 
c("F", "F", "F", "F", "F", "F"), alt = c(11, 12, 15.67000008, 7, 14.5, 
17.5), alb = c(2.599999905, 5.369999886, 4.670000076, 4.429999828, 3.75, 
3.700000048), alp = c(9, 86.33000183, 28, 170.6699982, 12, 82.5), 
tbil = c(0.200000003, 1.070000052, 0.430000007, 1.169999957, 
0.300000012, 0.400000006), bun = c(20, 17, 11.32999992, 56.33000183, 
7.5, 45), calcium = c(NA, 8.930000305, 8.800000191, 8.970000267, NA, 
7.550000191), crea = c(0.5, 0.569999993, 0.529999971, 0.600000024, 
1.049999952, 0.75), phos = c(2.75, 4.099999905, 4.96999979, 
5.329999924, 4.099999905, 7.400000095), pot = c(5.550000191, 
6.730000019, 3.869999886, 4.269999981, 3.049999952, 6.849999905), tp 
= c(4.449999809, 6.769999981, 5.800000191, 6.769999981, 5.75, 
6.400000095), sodium = c(NA, 142, 127, 138.3300018, 164, 139), glob = 
c(1.849999905, 1.400000095, 1.130000114, 2.340000153, 2, 
2.700000048), cortisol = c(4.24, 7.2231, 4.5431, NA, 6.0874, 4.8727), 
row = c(1L, 2L, 3L, 4L, 6L, 7L)), row.names = c(1L, 2L, 3L, 4L, 6L, 
7L), class = "data.frame")

这是我的代码:

detect.outlier <- function(df, i, k) {  # i is a column/variable, and k is an input in the Rosner test

  plot(df$year, df[[i]], xlab = "Year", ylab = "Value") # I also want to print the plot

  ros.test <- rosnerTest(df[[i]], k)

  ros.results <- ros.test$all.stats

  ros.outliers <- ros.results %>% filter(Outlier) %>% select(Obs.Num) # filter by outlier = TRUE ; Obs.Num corresponds with row number in my data frame

  ros.outliers <- ros.outliers[,1]  # change from a data frame to a vector 

  outlier_df <- df[df$row %in% ros.outliers,]

  return(outlier_df %>% select(age_class, sex, i))

}

我尝试运行函数:

detect.outlier(data, alt, 20)

但是我得到一个错误:

Error during wrapup: recursive indexing failed at level 2

Error: no more error handlers available (recursive errors?); invoking 'abort' restart

我不确定这意味着什么或如何解决它 - 任何帮助将不胜感激。非常感谢!

编辑:有时当我运行该函数时,我也会收到此错误:

Error in rosnerTest(data$variable, k) : 'x' must be a numeric vector

这看起来很奇怪,因为当我做 class(data$alt) 它说它是数字。

r function tidyverse subset tidy
1个回答
1
投票

你的函数在你给它的时候寻找变量

i
。当您使用
detect.outlier(data, alt, 20)
调用您的函数时,
i
的值为
alt
。所以在你的函数
detect.outlier()
中执行的代码是
plot(df$year, df[[alt]], xlab = "Year", ylab = "Value")
,而它应该是
plot(df$year, df[["alt"]], xlab = "Year", ylab = "Value")
.

您可以通过写

detect.outlier(df, "alt", 20)
来更正。

你的代码显然还有另一个问题:

Error in xy.coords(x, y, xlabel, ylabel, log) :
'x' and 'y' lengths differ

但这应该已经对你有所帮助了。

编辑:您应该为 rosnerTest 函数提供包名称。

© www.soinside.com 2019 - 2024. All rights reserved.