所以我正在编写一个函数来创建特定数量的重复行,如下所示:
df1 <- tibble(
Random_category = c(rep("A", 2), rep("B", 3), rep("C", 6)),
ID = 1:11,
Value = sample(1:100, 11, replace = TRUE)
)
Random_category ID Value
<chr> <int> <int>
1 A 1 92
2 A 2 11
3 B 3 42
4 B 4 33
5 B 5 93
6 C 6 79
7 C 7 82
8 C 8 46
9 C 9 77
10 C 10 88
11 C 11 58
对于这样的事情:
Random_category ID Value
<chr> <int> <int>
1 A 2 60
2 A 2 60
3 A 1 8
4 A 2 60
5 A 1 8
6 B 3 31
7 B 4 13
8 B 4 13
9 B 5 91
10 B 5 91
11 C 6 19
12 C 9 72
13 C 7 26
14 C 10 85
15 C 8 67
我的函数如下所示:
duplicate_rows <- function(df, target_num_of_rows, group_name) {
df %>%
group_by({{group_name}}) %>%
mutate(rows_to_duplicate = if_else(row_number() <= target_num_of_rows, ceiling(target_num_of_rows / n()), 0)) %>%
slice(rep(row_number(), times = rows_to_duplicate)) %>%
ungroup() %>%
select(-rows_to_duplicate) %>%
slice_sample(by = {{group_name}}, n = target_num_of_rows)
}
# Duplicate rows ensuring each group has exactly 5 rows
df_duplicated <- duplicate_rows(df1, 5, "Random_category")
但它却给了我:
Random_category ID Value `"Random_category"`
<chr> <int> <int> <chr>
1 A 2 60 Random_category
2 A 1 8 Random_category
3 B 3 31 Random_category
4 B 4 13 Random_category
5 B 5 91 Random_category
即使我已经从函数中取出了 dplyr 部分并且它工作得很好:
df1 %>%
group_by(Random_category) %>%
mutate(rows_to_duplicate = if_else(row_number() <= 5, ceiling(5 / n()), 0)) %>%
slice(rep(row_number(), times = rows_to_duplicate)) %>%
ungroup() %>%
select(-rows_to_duplicate) %>%
slice_sample(by = Random_category, n = 5)
我怀疑这与组名有关,但我不明白为什么?
使用反引号代替引号。
duplicate_rows(df1, 5, `Random_category`)
# # A tibble: 15 × 3
# Random_category ID Value
# <chr> <int> <int>
# 1 A 2 11
# 2 A 1 92
# 3 A 1 92
# 4 A 2 11
# 5 A 1 92
# 6 B 4 33
# 7 B 5 93
# 8 B 3 42
# 9 B 4 33
# 10 B 5 93
# 11 C 8 46
# 12 C 9 77
# 13 C 7 82
# 14 C 10 88
# 15 C 6 79