提高查找“增加掷骰子序列”的速度

问题描述 投票:0回答:4

问题是 有多少个 9 骰子的序列正在增加(例如 223444556)。好吧,我知道答案是由

choose(14,9)
给出的,但我只是想玩玩 dplyr。

一种快速但不优雅的方式:

library(tidyverse)

expand.grid(data.frame(matrix(rep(1:6,9),ncol=9))) %>% 
 filter(X1<=X2 & X2<=X3 &X3<=X4 &X4<=X5 &X5<=X6 &X6<=X7 &X7<=X8 &X8<=X9) %>% tally

我尝试了以下两种替代方案(没有显式引用变量名称),但它们都非常慢(并且消耗内存)。你能帮我使用 tidyverse 优化我的代码吗?

 expand_grid(!!!data.frame(matrix(rep(1:6,9),ncol=9))) %>% 
  rownames_to_column(var = "grp") %>% 
  mutate(grp = as.numeric(grp)) %>% 
  pivot_longer (cols=!grp) %>% 
  group_by(grp) %>% 
  mutate(prev = lag(value)) %>% 
  filter(!is.na(prev)) %>% 
  transmute(dif=value-prev) %>% 
  summarize(res = all(dif >=0)) %>% 
  group_by(res) %>% summarize(n=n())


 9 %>% 
    rerun(1:6) %>% crossing(!!!.,.name_repair = "minimal") %>%
    set_names(glue::glue('c{1:ncol(.)}')) %>% 
    rowwise() %>%
    mutate(asc = all(diff(c_across(cols = everything())>=0))) %>% 
    filter(asc==TRUE) %>% tally

这也很慢,但不消耗内存。

 9 %>% 
  rerun(1:6) %>% crossing(!!!.,.name_repair = "minimal")  %>% 
  set_names(glue::glue('c{1:ncol(.)}')) %>% 
  filter(pmap_lgl(.,~{
    if(all(list(...) %>% flatten_dbl() %>% diff() >=0)) return(TRUE) else return(FALSE)
  })) %>% tally
r performance dplyr purrr
4个回答
2
投票

这是另一个更高效的解决方案,它使用自定义函数

get_nondecreasing_seqs
,但不使用 tidyverse 或 dplyr。

get_nondecreasing_seqs <- function(digits, current_vector = numeric(0), allowed_nums) {
  res <- list()
  
  move_to_next <- function(new_digit, current_vector) {
    # Check if you have reached the desired length
    if (length(current_vector) == digits) {
      res[length(res) + 1] <<- list(current_vector)
    } else {
      # Try appending each allowed digit to the current vector
      for (next_digit in allowed_nums[allowed_nums >= new_digit]) {
        # Recursively call the function with the updated vector
        move_to_next(next_digit, c(current_vector, next_digit))
      }
    }
  }
  
  move_to_next(1, current_vector)
  
  return(res)
}

get_nondecreasing_seqs(9, allowed_nums = 1:6)

在使用expand.grid和tidyverse的原始方法中,首先生成所有可能的骰子组合,然后过滤非递减序列。此方法可能会导致大量内存开销,尤其是在处理大量骰子时,因为它会预先创建所有组合。

相比之下,自定义函数

get_nondecreasing_seqs
采用了更优化的方法。它动态构建序列,同时避免不必要的组合。它从第一个数字开始,仅递归地探索那些导致非递减序列的路径。这显着减少了迭代次数和内存使用量,尤其是当序列长度很大时。

效率增益来自于该函数动态探索有效数字并将其附加到当前向量,从而无需预先生成和存储所有可能的组合。这使得自定义函数在计算掷骰子的非递减序列时更加节省资源,并且速度更快。


1
投票

这是一种依赖于

tidyverse
purrr
方法:

expand.grid(replicate(9, 1:6, FALSE)) %>%
  filter(reduce(map2(.[, -length(.)], .[, -1], ~ .x <= .y), `&`)) %>%
  tally()

这在管道环境中有点困难。我们都需要比较列

n
n + 1
,同时将 did 简化为逻辑向量。然后我们需要过滤原始数据集。

如果您对计数感兴趣,我们可以对逻辑向量求和。

expand.grid(replicate(9, 1:6, FALSE)) %>%
  {sum(reduce(map2(.[, -length(.)], .[, -1], ~ .x <= .y), `&`))}

最后,如果您不介意多一个依赖性,可以与您使用其中一种方法所做的事情并行:

library(matrixStats)

expand.grid(replicate(9, 1:6, FALSE)) %>%
  {sum(rowAlls(rowDiffs(as.matrix(.)) >= 0L))}

1
投票
n <- 9
success <- 0

throw <- function(last_throw = 1, hist = "") {
  if (nchar(hist) >= 9)  {
    success <<- success + 1L
    return(NULL)
  }
  for (i in last_throw:6) throw(last_throw = i, hist = paste0(hist, i)) 
}
success
# [1] 2002

简化:

throw <- function(last_throw = 1L, len = 0L) {
  if (len == n)  {
    success <<- success + 1L
    return(NULL)
  }
  for (i in last_throw:6L) throw(last_throw = i, len = len + 1L) 
}
success
# [1] 2002

0
投票

可以写个递归来实现,应该比

expand.grid

快很多
f1 <- function(k, v = 1:6) {
    if (k == 1) {
        return(as.list(v))
    }
    unlist(
        lapply(
            Recall(k - 1),
            \(x) lapply(v[v >= x[k-1]], \(p) c(x, p))
        ),
        recursive = FALSE
    )
}

f2 <- function(k, v = 1:6) {
    expand.grid(replicate(k, v, FALSE)) %>%
        {
            sum(reduce(map2(.[, -length(.)], .[, -1], ~ .x <= .y), `&`))
        }
}

f3 <- function(digits, current_vector = numeric(0), allowed_nums= 1:6) {
    res <- list()

    move_to_next <- function(new_digit, current_vector) {
        # Check if you have reached the desired length
        if (length(current_vector) == digits) {
            res[length(res) + 1] <<- list(current_vector)
        } else {
            # Try appending each allowed digit to the current vector
            for (next_digit in allowed_nums[allowed_nums >= new_digit]) {
                # Recursively call the function with the updated vector
                move_to_next(next_digit, c(current_vector, next_digit))
            }
        }
    }

    move_to_next(1, current_vector)

    return(res)
}

microbenchmark(
    TIC = f1(9),
    Cole = f2(9),
    Brani = f3(9),
    unit = "relative",
    times = 10L
)

你会看到的

Unit: relative
  expr        min         lq       mean     median         uq       max neval
   TIC   1.649896   1.652466   1.513901   1.648688   1.489092  1.497073    10
  Cole 148.716366 158.463960 128.162563 148.872885 134.907332 78.664232    10
 Brani   1.000000   1.000000   1.000000   1.000000   1.000000  1.000000    10

输出

out <- f1(9)
看起来像

> out <- f1(9)

> length(out)
[1] 2002

> head(out)
[[1]]
[1] 1 1 1 1 1 1 1 1 1

[[2]]
[1] 1 1 1 1 1 1 1 1 2

[[3]]
[1] 1 1 1 1 1 1 1 1 3

[[4]]
[1] 1 1 1 1 1 1 1 1 4

[[5]]
[1] 1 1 1 1 1 1 1 1 5

[[6]]
[1] 1 1 1 1 1 1 1 1 6
© www.soinside.com 2019 - 2024. All rights reserved.