在R中向量化for循环

问题描述 投票:0回答:2

我正在尝试通过受益于R的矢量化来改进我的代码,例如使用更多apply系列函数而不是for循环,因为我使用的数据集达到30万条记录,并且我希望能够减少时间脚本正在运行。

我准备了一个repex以及实际的for循环,我只是不知道是否有可能将其转换为非循环结构。

在这里:

df <- structure(list(time = structure(c(1500697800, 1500698100, 1500698400, 
                                        1500698700, 1500699000, 1500699300, 1500699600, 1500699900, 1500700200, 
                                        1500700500, 1500700800, 1500701100, 1500701400, 1500701700, 1500702000, 
                                        1500702300, 1500702600, 1500702900, 1500703200, 1500703500, 1500703800, 
                                        1500704100, 1500704400, 1500704700, 1500705000, 1500705300, 1500705600, 
                                        1500705900, 1500706200, 1500706500, 1500706800, 1500707100, 1500707400, 
                                        1500707700, 1500708000, 1500708300, 1500708600, 1500708900, 1500709200, 
                                        1500709500, 1500709800, 1500710100, 1500710400, 1500710700, 1500711000, 
                                        1500711300, 1500711600, 1500711900, 1500712200, 1500712500, 1500712800, 
                                        1500713100, 1500713400, 1500713700, 1500714000, 1500714300, 1500714600, 
                                        1500714900, 1500715200, 1500715500, 1500715800, 1500716100, 1500716400, 
                                        1500716700, 1500717000, 1500717300, 1500717600, 1500717900, 1500718200, 
                                        1500718500, 1500718800, 1500719100, 1500719400, 1500719700, 1500720000, 
                                        1500720300, 1500720600, 1500720900, 1500721200, 1500721500, 1500721800, 
                                        1500722100, 1500722400, 1500722700, 1500723000, 1500723300, 1500723600, 
                                        1500723900, 1500724200, 1500724500, 1500724800, 1500725100, 1500725400, 
                                        1500725700, 1500726000, 1500726300, 1500726600, 1500726900, 1500727200, 
                                        1500727500, 1500727800, 1500728100, 1500728400, 1500728700, 1500729000, 
                                        1500729300, 1500729600, 1500729900, 1500730200, 1500730500, 1500730800, 
                                        1500731100, 1500731400, 1500731700, 1500732000, 1500732300, 1500732600, 
                                        1500732900, 1500733200, 1500733500, 1500733800, 1500734100, 1500734400, 
                                        1500734700, 1500735000, 1500735300, 1500735600, 1500735900, 1500736200, 
                                        1500736500, 1500736800, 1500737100, 1500737400, 1500737700, 1500738000, 
                                        1500738300, 1500738600, 1500738900, 1500739200, 1500739500, 1500739800, 
                                        1500740100, 1500740400, 1500740700, 1500741000), class = c("POSIXct", 
                                                                                                   "POSIXt"), tzone = "UTC"), rate = c(8021.22624828867, 8022.17252092756, 
                                                                                                                                       4026.57093082574, 0, 0, 0, 0, 0, 0, 0, 0, 1092.48742657481, 0, 
                                                                                                                                       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                                                                                                                                       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                                                                                                                                       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                                                                                                                                       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                                                                                                                                       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                                                                                                                                       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2352.47712160156, 0, 0, 0, 0, 0, 
                                                                                                                                       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), is.rate = c("OFF", "OFF", "OFF", 
                                                                                                                                                                                     "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", 
                                                                                                                                                                                     "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", 
                                                                                                                                                                                     "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", 
                                                                                                                                                                                     "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", 
                                                                                                                                                                                     "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", 
                                                                                                                                                                                     "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", 
                                                                                                                                                                                     "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", 
                                                                                                                                                                                     "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", 
                                                                                                                                                                                     "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", 
                                                                                                                                                                                     "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", 
                                                                                                                                                                                     "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", 
                                                                                                                                                                                     "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", 
                                                                                                                                                                                     "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", 
                                                                                                                                                                                     "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", 
                                                                                                                                                                                     "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", 
                                                                                                                                                                                     "OFF", "OFF", "OFF", "OFF", "OFF", "OFF", "OFF")), class = c("tbl_df", 
                                                                                                                                                                                                                                                  "tbl", "data.frame"), row.names = c(NA, -145L))


为了快速解释数据:它具有时间变量,某个速率,以及速率不为0时的标志-> ON。

for循环的想法是它将选择大于0的速率值,并且从时间的角度来看,将在接下来的一个小时内“尾随” is.rate标志。我知道这听起来很复杂,但是一旦您在repex上运行for循环,它就应该有意义。

谈论for循环,这里是:

for (i in which(temp_df$rate != 0)) {
  temp_df$is.rate[i:(i + 12)] <- "ON" # 12 in this case is a factor of lag-time. Since data is in 5 min intervals, this means the next hour
}

我很想尝试优化此代码,最好完全删除for循环并使用类似的方法来应用族函数,但我看不到代码结构。

r performance for-loop optimization vectorization
2个回答
0
投票

我认为您正在寻找"ON"时要设置的rate > 0,并且接下来的11行要滞后。


0
投票

[我认为您需要做的是找出rate != 0所在的索引,在这些索引和inds + 12之间创建一个序列,并将这些索引的is.rate分配给"ON"

© www.soinside.com 2019 - 2024. All rights reserved.