如何有条件地检查和替换xts对象中的数据?

问题描述 投票:0回答:2

这里是可复制的数据集。问题是要在一系列NA之间找到1个或2个连续的非NA值,并将它们分配为NA。如果大于2,则无需执行任何操作。

set.seed(55)
data <- rnorm(10)
dates <- as.POSIXct("2019-03-18 10:30:00", tz = "CET") + 0:9*60

R <- xts(x = data, order.by = dates)
colnames(R) <- "R-factor"
R[c(1, 3, 6, 10)] <- NA
R

输出:

                        R-factor
2019-03-18 10:30:00           NA
2019-03-18 10:31:00 -1.812376850
2019-03-18 10:32:00           NA
2019-03-18 10:33:00 -1.119221005
2019-03-18 10:34:00  0.001908206
2019-03-18 10:35:00           NA
2019-03-18 10:36:00 -0.505343855
2019-03-18 10:37:00 -0.099234393
2019-03-18 10:38:00  0.305353199
2019-03-18 10:39:00           NA

预期结果:

                        R-factor
2019-03-18 10:30:00           NA
2019-03-18 10:31:00           NA
2019-03-18 10:32:00           NA
2019-03-18 10:33:00           NA
2019-03-18 10:34:00           NA
2019-03-18 10:35:00           NA
2019-03-18 10:36:00 -0.505343855
2019-03-18 10:37:00 -0.099234393
2019-03-18 10:38:00  0.305353199
2019-03-18 10:39:00           NA

我已经编写了一个带for循环的函数,该函数对于较小的数据集可以正常使用,但是速度非常慢。原始数据包含100,000+个数据点,并且此功能在超过10分钟后无法执行]

有人可以帮助我避免循环加快速度吗?

r for-loop timestamp time-series xts
2个回答
0
投票

我想,周围还有更多优雅的解决方案,但这将时间缩短了一半

    R_df=as.data.frame(R)

    R_df$shift_1=c(R_df$`R-factor`[-1],NA) #shift value one up
    R_df$shift_2=c(NA,R_df$`R-factor`[-nrow(R_df)]) #shift value one down

# create new filtered variable
    R_df$`R-factor_new`=ifelse(is.na(R_df$`R-factor`),NA,
                               ifelse((!is.na(R_df$shift_1))|(!is.na(R_df$shift_2)),
                                      R_df$`R-factor`,NA)
>                 test replications elapsed relative user.self sys.self user.child sys.child
>     2 ifelseapproach         1000    0.83    1.000      0.65     0.19         NA        NA
>     1       original         1000    1.81    2.181      1.76     0.01         NA        NA

0
投票

也许根据Distance from the closest non NA value in a dataframe试试这个>

library(tidyverse)

set.seed(55)
data <- rnorm(10)
data[c(1, 3, 6, 10)] <- NA
dates <- as.POSIXct("2019-03-18 10:30:00", tz = "CET") + 0:9*60
time_table <- tibble(time = dates,data = data)

lengths_na <- time_table$data %>% is.na %>% rle  %>% pluck('lengths')

time_table %>% 
  mutate(lengths_na =lengths_na %>% seq_along %>% rep(lengths_na)) %>% 
  group_by(lengths_na) %>%
  add_tally() %>%
  ungroup() %>% 
  mutate(replace_sequence = if_else(condition = n < 3,true = NA_real_,false = data))
© www.soinside.com 2019 - 2024. All rights reserved.