为什么使用 dplyr 的自定义函数与没有函数包装的自定义函数给出不同的结果?

问题描述 投票:0回答:1

所以我正在编写一个函数来创建特定数量的重复行,如下所示:

df1 <- tibble(
  Random_category = c(rep("A", 2), rep("B", 3), rep("C", 6)),
  ID = 1:11,
  Value = sample(1:100, 11, replace = TRUE)
)

   Random_category    ID Value
   <chr>           <int> <int>
 1 A                   1    92
 2 A                   2    11
 3 B                   3    42
 4 B                   4    33
 5 B                   5    93
 6 C                   6    79
 7 C                   7    82
 8 C                   8    46
 9 C                   9    77
10 C                  10    88
11 C                  11    58

对于这样的事情:


Random_category    ID Value
<chr>           <int> <int>
 1 A                   2    60
 2 A                   2    60
 3 A                   1     8
 4 A                   2    60
 5 A                   1     8
 6 B                   3    31
 7 B                   4    13
 8 B                   4    13
 9 B                   5    91
10 B                   5    91
11 C                   6    19
12 C                   9    72
13 C                   7    26
14 C                  10    85
15 C                   8    67

我的函数如下所示:

duplicate_rows <- function(df, target_num_of_rows, group_name) {
  df %>%
    group_by({{group_name}}) %>%
    mutate(rows_to_duplicate = if_else(row_number() <= target_num_of_rows, ceiling(target_num_of_rows / n()), 0)) %>%
    slice(rep(row_number(), times = rows_to_duplicate)) %>%
    ungroup() %>%
    select(-rows_to_duplicate) %>%
    slice_sample(by = {{group_name}}, n = target_num_of_rows)
}

# Duplicate rows ensuring each group has exactly 5 rows
df_duplicated <- duplicate_rows(df1, 5, "Random_category")

但它却给了我:

Random_category    ID Value `"Random_category"`
<chr>           <int> <int> <chr>
1 A                   2    60 Random_category
2 A                   1     8 Random_category
3 B                   3    31 Random_category
4 B                   4    13 Random_category
5 B                   5    91 Random_category

即使我已经从函数中取出了 dplyr 部分并且它工作得很好:

df1 %>%
  group_by(Random_category) %>%
  mutate(rows_to_duplicate = if_else(row_number() <= 5, ceiling(5 / n()), 0)) %>%
  slice(rep(row_number(), times = rows_to_duplicate)) %>%
  ungroup() %>%
  select(-rows_to_duplicate) %>%
  slice_sample(by = Random_category, n = 5)

我怀疑这与组名有关,但我不明白为什么?

r function dplyr
1个回答
0
投票

使用反引号代替引号。

duplicate_rows(df1, 5, `Random_category`)
# # A tibble: 15 × 3
#    Random_category    ID Value
#    <chr>           <int> <int>
#  1 A                   2    11
#  2 A                   1    92
#  3 A                   1    92
#  4 A                   2    11
#  5 A                   1    92
#  6 B                   4    33
#  7 B                   5    93
#  8 B                   3    42
#  9 B                   4    33
# 10 B                   5    93
# 11 C                   8    46
# 12 C                   9    77
# 13 C                   7    82
# 14 C                  10    88
# 15 C                   6    79
© www.soinside.com 2019 - 2024. All rights reserved.