按组有条件地计算观察值的误差函数

问题描述 投票:0回答:3

我正在使用以下示例调查数据:

df <- data.frame(county = c("A","A","A","A","A","B","B","B","B","B","C","C","C","C","C"),
                  response = c(0,1,0,1,1,0,0,1,0,1,1,1,0,1,1),
                  value = sample(20:100, 15, replace=TRUE))

如果 n < 3 when response = 1 for a particular county the data cannot be shared for confidentiality reasons. I am trying to write a "simple" function that will create an error or warning indicating that a county has less than three observations. Ideally, the function would indicate the problem county from the warning/error message. County B above is an example of a county that does not meet the minimum confidentiality threshold.

我尝试过类似这样的函数迭代,但显然是错误的。

my_funct <- function(x){
  if (any(x > 3)) stop("Error")
}

任何建议或方向将不胜感激。

r
3个回答
0
投票
errors <- df |> filter(sum(response == 1) < 3, .by = county) |> select(county) |> unique()

给出:

> errors
  county
1      B

我们正在过滤每个县 (

.by = county
) 的响应数量 = 1 (
sum(response == 1)
) 少于三个,仅选择县名称并为您提供一列符合该标准的唯一县。


0
投票

听起来您需要按

response
组对
county
求和,然后过滤以保留具有
n < 3
的县。

library(dplyr)
df |> summarize(n = sum(response), .by = county) |>
  filter(n < 3)

将其放入可以发出警告的函数中:

check_response_threshold <- function(df){
  too_low = df |> 
    summarize(n = sum(response), .by = county) |>
    filter(n < 3)
  if(nrow(too_low) > 0) {
    warning(
      "These counties have less than 3 responses:\n",
      toString(too_low$county)
    )
  }
  invisible()
}

在示例数据上运行它:

check_response_threshold(df)
# Warning message:
# In check_response_threshold(df) :
#   These counties have less than 3 responses:
# B


0
投票

我不知道这是否正是您所要求的,但您可以这样做:

checkCounties <- function(data){
    grouped_df <- data %>% 
        group_by(county) %>% 
        summarise(ResponseCount = sum(response))
    
    insufficient_responses <- grouped_df %>% filter(ResponseCount < 3)
    
    if (nrow(insufficient_responses) > 0) {
        counties_list <- paste(insufficient_responses$county, collapse = ", ")
        stop("Error: The following counties have a response count of less than 3: ", counties_list)
    } else {
        print("All counties have sufficient responses.")
    }
}

checkCounties(df)

输出:

> checkCounties(df)
Error in checkCounties(df) : 
  Error: The following counties have a response count of less than 3: B

如果您不想出现错误消息,可以将 stop() 替换为 print():

print(paste("Error: The following counties have a response count of less than 3: ", counties_list))
© www.soinside.com 2019 - 2024. All rights reserved.