如何快速创建具有重复元素的大向量?

问题描述 投票:0回答:1

有一个向量,我想通过基于序列获取其元素来创建一个新向量:

set.seed(0)

n <- 1000
ncval1 <- as.integer(n)
ncval2 <- ncval1:1L
ncval3 <- sequence(ncval2, from = 1L, by = 1L)
x <- as.double(runif(n))

y <- x[ncval3]

这大约需要 2.2 毫秒。也许可以通过采用重复元素的属性来加快速度。

r vector profiling
1个回答
0
投票

您可以使用

Rcpp

Rcpp::sourceCpp(code='
  #include <Rcpp.h>
  // [[Rcpp::export]]
  Rcpp::NumericVector foo(int n) {
    // draw from standard normal
    Rcpp::NumericVector r(n);
    r = Rcpp::runif(n);
    // length of result
    int l = 0;
    for (int i = 0; i <= n; i++) {
      l = l + i;
    }
    // subset and concatenate
    Rcpp::NumericVector a(l);
    int p = 0;
    for (int i = 0; i < n; i++) {
      for (int j = 0; j < n - i; j++) {
        a[p] = r[j];
        p = p + 1;
      }
    }
    return a;
  }
')

n = 10 的用法

> set.seed(0)
> foo(10)
 [1] 0.8966972 0.2655087 0.3721239 0.5728534 0.9082078 0.2016819 0.8983897
 [8] 0.9446753 0.6607978 0.6291140 0.8966972 0.2655087 0.3721239 0.5728534
[15] 0.9082078 0.2016819 0.8983897 0.9446753 0.6607978 0.8966972 0.2655087
[22] 0.3721239 0.5728534 0.9082078 0.2016819 0.8983897 0.9446753 0.8966972
[29] 0.2655087 0.3721239 0.5728534 0.9082078 0.2016819 0.8983897 0.8966972
[36] 0.2655087 0.3721239 0.5728534 0.9082078 0.2016819 0.8966972 0.2655087
[43] 0.3721239 0.5728534 0.9082078 0.8966972 0.2655087 0.3721239 0.5728534
[50] 0.8966972 0.2655087 0.3721239 0.8966972 0.2655087 0.8966972

基准

n <- 1e3
microbenchmark::microbenchmark(
  OP={
    set.seed(0)
    ncval1 <- as.integer(n)
    ncval2 <- ncval1:1L
    ncval3 <- sequence(ncval2, from = 1L, by = 1L)
    x <- as.double(runif(n))
    x[ncval3]
  },
  foo={set.seed(0); foo(n)}, 
  check='identical'
)

$ Rscript --vanilla foo.R
Unit: milliseconds
 expr      min       lq     mean   median       uq      max neval cld
   OP 2.109090 2.199845 3.119882 2.294714 4.213308 7.297789   100  a 
  foo 1.055756 1.190470 1.983916 1.318557 2.741124 6.850159   100   b

根据中位数,

foo()
只需要 57% 的时间。

© www.soinside.com 2019 - 2024. All rights reserved.