问题是 有多少个 9 骰子的序列正在增加(例如 223444556)。好吧,我知道答案是由
choose(14,9)
给出的,但我只是想玩玩 dplyr。
一种快速但不优雅的方式:
library(tidyverse)
expand.grid(data.frame(matrix(rep(1:6,9),ncol=9))) %>%
filter(X1<=X2 & X2<=X3 &X3<=X4 &X4<=X5 &X5<=X6 &X6<=X7 &X7<=X8 &X8<=X9) %>% tally
我尝试了以下两种替代方案(没有显式引用变量名称),但它们都非常慢(并且消耗内存)。你能帮我使用 tidyverse 优化我的代码吗?
expand_grid(!!!data.frame(matrix(rep(1:6,9),ncol=9))) %>%
rownames_to_column(var = "grp") %>%
mutate(grp = as.numeric(grp)) %>%
pivot_longer (cols=!grp) %>%
group_by(grp) %>%
mutate(prev = lag(value)) %>%
filter(!is.na(prev)) %>%
transmute(dif=value-prev) %>%
summarize(res = all(dif >=0)) %>%
group_by(res) %>% summarize(n=n())
9 %>%
rerun(1:6) %>% crossing(!!!.,.name_repair = "minimal") %>%
set_names(glue::glue('c{1:ncol(.)}')) %>%
rowwise() %>%
mutate(asc = all(diff(c_across(cols = everything())>=0))) %>%
filter(asc==TRUE) %>% tally
这也很慢,但不消耗内存。
9 %>%
rerun(1:6) %>% crossing(!!!.,.name_repair = "minimal") %>%
set_names(glue::glue('c{1:ncol(.)}')) %>%
filter(pmap_lgl(.,~{
if(all(list(...) %>% flatten_dbl() %>% diff() >=0)) return(TRUE) else return(FALSE)
})) %>% tally
这是另一个更高效的解决方案,它使用自定义函数
get_nondecreasing_seqs
,但不使用 tidyverse 或 dplyr。
get_nondecreasing_seqs <- function(digits, current_vector = numeric(0), allowed_nums) {
res <- list()
move_to_next <- function(new_digit, current_vector) {
# Check if you have reached the desired length
if (length(current_vector) == digits) {
res[length(res) + 1] <<- list(current_vector)
} else {
# Try appending each allowed digit to the current vector
for (next_digit in allowed_nums[allowed_nums >= new_digit]) {
# Recursively call the function with the updated vector
move_to_next(next_digit, c(current_vector, next_digit))
}
}
}
move_to_next(1, current_vector)
return(res)
}
get_nondecreasing_seqs(9, allowed_nums = 1:6)
在使用expand.grid和tidyverse的原始方法中,首先生成所有可能的骰子组合,然后过滤非递减序列。此方法可能会导致大量内存开销,尤其是在处理大量骰子时,因为它会预先创建所有组合。
相比之下,自定义函数
get_nondecreasing_seqs
采用了更优化的方法。它动态构建序列,同时避免不必要的组合。它从第一个数字开始,仅递归地探索那些导致非递减序列的路径。这显着减少了迭代次数和内存使用量,尤其是当序列长度很大时。
效率增益来自于该函数动态探索有效数字并将其附加到当前向量,从而无需预先生成和存储所有可能的组合。这使得自定义函数在计算掷骰子的非递减序列时更加节省资源,并且速度更快。
这是一种依赖于
tidyverse
的 purrr
方法:
expand.grid(replicate(9, 1:6, FALSE)) %>%
filter(reduce(map2(.[, -length(.)], .[, -1], ~ .x <= .y), `&`)) %>%
tally()
这在管道环境中有点困难。我们都需要比较列
n
和 n + 1
,同时将 did 简化为逻辑向量。然后我们需要过滤原始数据集。
如果您仅对计数感兴趣,我们可以对逻辑向量求和。
expand.grid(replicate(9, 1:6, FALSE)) %>%
{sum(reduce(map2(.[, -length(.)], .[, -1], ~ .x <= .y), `&`))}
最后,如果您不介意多一个依赖性,matrixstats可以与您使用其中一种方法所做的事情并行:
library(matrixStats)
expand.grid(replicate(9, 1:6, FALSE)) %>%
{sum(rowAlls(rowDiffs(as.matrix(.)) >= 0L))}
n <- 9
success <- 0
throw <- function(last_throw = 1, hist = "") {
if (nchar(hist) >= 9) {
success <<- success + 1L
return(NULL)
}
for (i in last_throw:6) throw(last_throw = i, hist = paste0(hist, i))
}
success
# [1] 2002
简化:
throw <- function(last_throw = 1L, len = 0L) {
if (len == n) {
success <<- success + 1L
return(NULL)
}
for (i in last_throw:6L) throw(last_throw = i, len = len + 1L)
}
success
# [1] 2002
可以写个递归来实现,应该比
expand.grid
快很多
f1 <- function(k, v = 1:6) {
if (k == 1) {
return(as.list(v))
}
unlist(
lapply(
Recall(k - 1),
\(x) lapply(v[v >= x[k-1]], \(p) c(x, p))
),
recursive = FALSE
)
}
f2 <- function(k, v = 1:6) {
expand.grid(replicate(k, v, FALSE)) %>%
{
sum(reduce(map2(.[, -length(.)], .[, -1], ~ .x <= .y), `&`))
}
}
f3 <- function(digits, current_vector = numeric(0), allowed_nums= 1:6) {
res <- list()
move_to_next <- function(new_digit, current_vector) {
# Check if you have reached the desired length
if (length(current_vector) == digits) {
res[length(res) + 1] <<- list(current_vector)
} else {
# Try appending each allowed digit to the current vector
for (next_digit in allowed_nums[allowed_nums >= new_digit]) {
# Recursively call the function with the updated vector
move_to_next(next_digit, c(current_vector, next_digit))
}
}
}
move_to_next(1, current_vector)
return(res)
}
microbenchmark(
TIC = f1(9),
Cole = f2(9),
Brani = f3(9),
unit = "relative",
times = 10L
)
你会看到的
Unit: relative
expr min lq mean median uq max neval
TIC 1.649896 1.652466 1.513901 1.648688 1.489092 1.497073 10
Cole 148.716366 158.463960 128.162563 148.872885 134.907332 78.664232 10
Brani 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 10
输出
out <- f1(9)
看起来像
> out <- f1(9)
> length(out)
[1] 2002
> head(out)
[[1]]
[1] 1 1 1 1 1 1 1 1 1
[[2]]
[1] 1 1 1 1 1 1 1 1 2
[[3]]
[1] 1 1 1 1 1 1 1 1 3
[[4]]
[1] 1 1 1 1 1 1 1 1 4
[[5]]
[1] 1 1 1 1 1 1 1 1 5
[[6]]
[1] 1 1 1 1 1 1 1 1 6