随机选择的非 NA 列的行平均值

问题描述 投票:0回答:2

我有一个数据框,对于每一行,我想随机采样三列(其中三列在行之间可以不同)并取这三个采样值的平均值。作为另一个问题,我有许多行完全不适用(并且由于其他原因我无法删除它们)或仅包含 1 或 2 个非 NA 值。基于这个问题和答案,我尝试了以下方法:

df_new <- df %>%
  rowwise %>%
  mutate(inflo_mean = mean(sample(na.omit(c_across(everything())), 3)))

这不起作用,我收到有关使用

sample()
的错误:

Error in `mutate()`:
ℹ In argument: `inflo_mean = mean(sample(na.omit(c_across(everything())), 3))`.
ℹ In row 1.
Caused by error in `sample.int()`:
! invalid first argument

然后,我尝试将其分解为更小的步骤,并分别处理不同的 NA 情况,并得出以下结论:

df_new2 <- df %>%
  rowwise() %>%
  mutate(num_NAs = sum(!is.na(across(starts_with("Col_")))),
         v_inflo = list(na.omit(c_across((starts_with("Col_"))))),
         inflo_mean = case_when(num_NAs > 2 ~ mean(sample(v_inflo, 3)),
                                  num_NAs == 2 ~ mean(v_inflo),
                                  num_NAs == 1 ~ as.numeric(v_inflo),
                                  num_NAs == 0 ~ NA_real_,
                                  TRUE ~ NA_real_))

同样,这也不起作用,我得到了同样的错误。我检查了列的数据类型,它们都是整数。这里可能有什么问题?或者还有其他解决办法吗?

示例数据:

> dput(df)
structure(list(Col_1 = c(NA, 77L, 82L, 172L), Col_2 = c(NA, 79L, 
NA, 135L), Col_3 = c(NA, 81L, NA, 131L), Col_4 = c(NA_integer_, 
NA_integer_, NA_integer_, NA_integer_), Col_5 = c(NA, NA, NA, 
33L), Col_6 = c(NA, NA, NA, 104L), Col_7 = c(NA, NA, NA, 106L
), Col_8 = c(NA, NA, NA, 93L), Col_9 = c(NA, NA, NA, 50L), Col_10 = c(NA, 
NA, NA, 48L), Col_11 = c(NA, NA, NA, 96L), Col_12 = c(NA_integer_, 
NA_integer_, NA_integer_, NA_integer_), Col_13 = c(NA_integer_, 
NA_integer_, NA_integer_, NA_integer_), Col_14 = c(NA_integer_, 
NA_integer_, NA_integer_, NA_integer_), Col_15 = c(NA_integer_, 
NA_integer_, NA_integer_, NA_integer_)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -4L))
r dataframe dplyr
2个回答
0
投票

您可以像下面这样使用

mapply

df$inflo_mean <-
  mapply(
    \(x, k) mean(sample(na.omit(c(x)), k)),
    asplit(df, 1),
    pmin(rowSums(!is.na(df)), 3)
  )

0
投票

问题是包含所有

NA
值的行。如果您使用
tryCatch()
捕获错误并将其替换为
NA
,您的原始代码将有效。

library(dplyr)
df <- structure(list(Col_1 = c(NA, 77L, 82L, 172L), Col_2 = c(NA, 79L, 
NA, 135L), Col_3 = c(NA, 81L, NA, 131L), Col_4 = c(NA_integer_, 
NA_integer_, NA_integer_, NA_integer_), Col_5 = c(NA, NA, NA, 
33L), Col_6 = c(NA, NA, NA, 104L), Col_7 = c(NA, NA, NA, 106L
), Col_8 = c(NA, NA, NA, 93L), Col_9 = c(NA, NA, NA, 50L), Col_10 = c(NA, 
NA, NA, 48L), Col_11 = c(NA, NA, NA, 96L), Col_12 = c(NA_integer_, 
NA_integer_, NA_integer_, NA_integer_), Col_13 = c(NA_integer_, 
NA_integer_, NA_integer_, NA_integer_), Col_14 = c(NA_integer_, 
NA_integer_, NA_integer_, NA_integer_), Col_15 = c(NA_integer_, 
NA_integer_, NA_integer_, NA_integer_)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -4L))


df_new <- df %>%
  rowwise %>%
  mutate(inflo_mean = tryCatch(mean(sample(na.omit(c_across(everything())), 3)), error = function(e)NA))
df_new %>% select(inflo_mean, everything())
#> # A tibble: 4 × 16
#> # Rowwise: 
#>   inflo_mean Col_1 Col_2 Col_3 Col_4 Col_5 Col_6 Col_7 Col_8 Col_9 Col_10 Col_11
#>        <dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int>  <int>  <int>
#> 1       NA      NA    NA    NA    NA    NA    NA    NA    NA    NA     NA     NA
#> 2       79      77    79    81    NA    NA    NA    NA    NA    NA     NA     NA
#> 3       41.7    82    NA    NA    NA    NA    NA    NA    NA    NA     NA     NA
#> 4       95     172   135   131    NA    33   104   106    93    50     48     96
#> # ℹ 4 more variables: Col_12 <int>, Col_13 <int>, Col_14 <int>, Col_15 <int>

创建于 2024-02-08,使用 reprex v2.0.2

© www.soinside.com 2019 - 2024. All rights reserved.