用data.table对二维数据进行插值--填充NAs

问题描述 投票:1回答:1

我有两个数据集,有时间步长t和高度h,我把它们合并了。

dataset_a <- data.table(t=rep(c(1,2,3,4,5,6,7,8,9), each=5),
                        h=rep(c(1:5)),
                        v=c(1:(5*9)))

其中一个有测量差距,和我们实际测量的值,但没有测量到任何东西。

dataset_b <- data.table(t=rep(c(1,2,4,5,6,8,9), each=5),
                        h=rep(c(1:5)),
                        w=c(1:(5*7)))

dataset_b$w[12:20] <-0

合并。

dataset_merged <- merge(dataset_a, dataset_b, all=TRUE, by = c('t', 'h'))

现在我想填补这些空白 我如何告诉data.table使用邻近的值来填充像素?

dataset_merged[is.na(w), 
               w:= mean(c(the value at this h one timestep earlier, the value at this h one timestep later))]

非常感谢!

编辑在Bens非常有帮助的评论之后,我不得不调整了可重现的例子。他的解决方案可行,但如果缺少 "框架 "数据,就不行了:if

dataset_b <- data.table(t=rep(c(2,4,5,6,8,9), each=5),
                        h=rep(c(1:5)),
                        w=c(1:(5*6)))
#removed the first timestep in this case
dataset_merged <- merge(dataset_a, dataset_b, all=TRUE, by = c('t', 'h'))


library(zoo)
dataset_merged[order(h,t)][, w := na.approx(w)] 

产生

Error in `[.data.table`(dataset_merged[order(h, t)], , `:=`(w, na.approx(w))) : 
  Supplied 44 items to be assigned to 45 items of column 'w'. The RHS length must either be 1 (single values are ok) or match the LHS length exactly. If you wish to 'recycle' the RHS please use rep() explicitly to make this intent clear to readers of your code.

把这些保留为NA也可以,但我如何向函数说明这一点?不幸的是,原始数据并不在一个常规的网格上,我有两个数据集,有时间步长t和高度h,我把它们合并了。

r data.table interpolation
1个回答
2
投票

或许可以试试这种方法。按以下方式对数据表进行排序 h 在插值前,并使 w 数字为小数。使用 approx (基数R)和组 by = h.

dataset_merged[order(h,t)][, w:= as.numeric(w)][, w := approx(.I, w, .I)$y, by = h]

产量

    t h  v    w
 1: 1 1  1   NA
 2: 2 1  6  1.0
 3: 3 1 11  3.5
 4: 4 1 16  6.0
 5: 5 1 21 11.0
 6: 6 1 26 16.0
 7: 7 1 31 18.5
 8: 8 1 36 21.0
 9: 9 1 41 26.0
10: 1 2  2   NA
11: 2 2  7  2.0
12: 3 2 12  4.5
13: 4 2 17  7.0
14: 5 2 22 12.0
15: 6 2 27 17.0
16: 7 2 32 19.5
17: 8 2 37 22.0
18: 9 2 42 27.0
19: 1 3  3   NA
20: 2 3  8  3.0
21: 3 3 13  5.5
22: 4 3 18  8.0
23: 5 3 23 13.0
24: 6 3 28 18.0
25: 7 3 33 20.5
26: 8 3 38 23.0
27: 9 3 43 28.0
28: 1 4  4   NA
29: 2 4  9  4.0
30: 3 4 14  6.5
31: 4 4 19  9.0
32: 5 4 24 14.0
33: 6 4 29 19.0
34: 7 4 34 21.5
35: 8 4 39 24.0
36: 9 4 44 29.0
37: 1 5  5   NA
38: 2 5 10  5.0
39: 3 5 15  7.5
40: 4 5 20 10.0
41: 5 5 25 15.0
42: 6 5 30 20.0
43: 7 5 35 22.5
44: 8 5 40 25.0
45: 9 5 45 30.0
    t h  v    w

额外(按业务约定)。如果一个小组只有 NA 价值的 w 它必须被排除在外。

编辑(52820): 防止使用 approx 当可用于内插的值少于2个时,你也可以尝试。

dataset_merged[order(h,t)
  ][, w:= as.numeric(w)
    ][, w := if(length(na.omit(w)) < 2) w else approx(.I, w, .I)$y, by = h]

测试案例:

dataset_b <- data.table(t=rep(c(2,4,5,6,8,9), each=5),
                        h=1:5,
                        w=1:30)

dataset_b$w[c(F,F,T,F,F)] <- NA

dataset_merged <- merge(dataset_a, dataset_b, all=TRUE, by = c('t', 'h'))

输出

    t h  v    w
 1: 1 1  1   NA
 2: 2 1  6  1.0
 3: 3 1 11  3.5
 4: 4 1 16  6.0
 5: 5 1 21 11.0
 6: 6 1 26 16.0
 7: 7 1 31 18.5
 8: 8 1 36 21.0
 9: 9 1 41 26.0
10: 1 2  2   NA
11: 2 2  7  2.0
12: 3 2 12  4.5
13: 4 2 17  7.0
14: 5 2 22 12.0
15: 6 2 27 17.0
16: 7 2 32 19.5
17: 8 2 37 22.0
18: 9 2 42 27.0
19: 1 3  3   NA
20: 2 3  8   NA
21: 3 3 13   NA
22: 4 3 18   NA
23: 5 3 23   NA
24: 6 3 28   NA
25: 7 3 33   NA
26: 8 3 38   NA
27: 9 3 43   NA
28: 1 4  4   NA
29: 2 4  9  4.0
30: 3 4 14  6.5
31: 4 4 19  9.0
32: 5 4 24 14.0
33: 6 4 29 19.0
34: 7 4 34 21.5
35: 8 4 39 24.0
36: 9 4 44 29.0
37: 1 5  5   NA
38: 2 5 10  5.0
39: 3 5 15  7.5
40: 4 5 20 10.0
41: 5 5 25 15.0
42: 6 5 30 20.0
43: 7 5 35 22.5
44: 8 5 40 25.0
45: 9 5 45 30.0
    t h  v    w
© www.soinside.com 2019 - 2024. All rights reserved.