我正在使用以下示例调查数据:
df <- data.frame(county = c("A","A","A","A","A","B","B","B","B","B","C","C","C","C","C"),
response = c(0,1,0,1,1,0,0,1,0,1,1,1,0,1,1),
value = sample(20:100, 15, replace=TRUE))
如果 n < 3 when response = 1 for a particular county the data cannot be shared for confidentiality reasons. I am trying to write a "simple" function that will create an error or warning indicating that a county has less than three observations. Ideally, the function would indicate the problem county from the warning/error message. County B above is an example of a county that does not meet the minimum confidentiality threshold.
我尝试过类似这样的函数迭代,但显然是错误的。
my_funct <- function(x){
if (any(x > 3)) stop("Error")
}
任何建议或方向将不胜感激。
errors <- df |> filter(sum(response == 1) < 3, .by = county) |> select(county) |> unique()
给出:
> errors
county
1 B
我们正在过滤每个县 (
.by = county
) 的响应数量 = 1 (sum(response == 1)
) 少于三个,仅选择县名称并为您提供一列符合该标准的唯一县。
听起来您需要按
response
组对 county
求和,然后过滤以保留具有 n < 3
的县。
library(dplyr)
df |> summarize(n = sum(response), .by = county) |>
filter(n < 3)
将其放入可以发出警告的函数中:
check_response_threshold <- function(df){
too_low = df |>
summarize(n = sum(response), .by = county) |>
filter(n < 3)
if(nrow(too_low) > 0) {
warning(
"These counties have less than 3 responses:\n",
toString(too_low$county)
)
}
invisible()
}
在示例数据上运行它:
check_response_threshold(df)
# Warning message:
# In check_response_threshold(df) :
# These counties have less than 3 responses:
# B
我不知道这是否正是您所要求的,但您可以这样做:
checkCounties <- function(data){
grouped_df <- data %>%
group_by(county) %>%
summarise(ResponseCount = sum(response))
insufficient_responses <- grouped_df %>% filter(ResponseCount < 3)
if (nrow(insufficient_responses) > 0) {
counties_list <- paste(insufficient_responses$county, collapse = ", ")
stop("Error: The following counties have a response count of less than 3: ", counties_list)
} else {
print("All counties have sufficient responses.")
}
}
checkCounties(df)
输出:
> checkCounties(df)
Error in checkCounties(df) :
Error: The following counties have a response count of less than 3: B
如果您不想出现错误消息,可以将 stop() 替换为 print():
print(paste("Error: The following counties have a response count of less than 3: ", counties_list))