R中带有data.table的条件滞后值的列

问题描述 投票:0回答:1

我有一个data.table,看起来像这样:

df
     seller_id buyer_id       hs10 year mean_ln_c_hit
  1:         1        5 1200000000    5      4.407456
  2:         1       24 1500000000    2      4.173422
  3:         1        8 1500000000    3      4.532695
  4:         1        8 1500000000    5      4.830106
  5:         1       16 1500000000    5      4.830106
  6:         1       17 1800000000    3      3.920435
  7:         1       20 1800000000    5      4.357196
  8:         1       27 1800000000    5      4.357196
  9:         1        1 1900000000    1      4.835762
 10:         1        2 1900000000    1      4.835762
 11:         1        3 1900000000    1      4.835762
 12:         1        4 1900000000    1      4.835762
 13:         1        5 1900000000    1      4.835762
 14:         1        6 1900000000    1      4.835762
 15:         1        7 1900000000    1      4.835762
 16:         1        8 1900000000    1      4.835762
 17:         1        9 1900000000    1      4.835762
 18:         1       11 1900000000    1      4.835762
 19:         1       12 1900000000    1      4.835762
 20:         1       13 1900000000    1      4.835762
 21:         1       14 1900000000    1      4.835762
 22:         1       15 1900000000    1      4.835762
 23:         1       16 1900000000    1      4.835762
 24:         1       17 1900000000    1      4.835762
 25:         1       18 1900000000    1      4.835762
 26:         1       19 1900000000    1      4.835762
 27:         1       20 1900000000    1      4.835762
 28:         1       21 1900000000    1      4.835762
 29:         1       22 1900000000    1      4.835762
 30:         1       23 1900000000    1      4.835762
 31:         1       24 1900000000    1      4.835762
 32:         1       25 1900000000    1      4.835762
 33:         1       26 1900000000    1      4.835762
 34:         1       27 1900000000    1      4.835762
 35:         1       28 1900000000    1      4.835762
 36:         1       29 1900000000    1      4.835762
 37:         1       30 1900000000    1      4.835762
 38:         1        1 1900000000    2      4.409253
 39:         1        2 1900000000    2      4.409253
 40:         1        3 1900000000    2      4.409253
 41:         1        4 1900000000    2      4.409253
 42:         1        5 1900000000    2      4.409253
 43:         1        6 1900000000    2      4.409253
 44:         1        7 1900000000    2      4.409253
 45:         1        8 1900000000    2      4.409253
 46:         1        9 1900000000    2      4.409253
 47:         1       10 1900000000    2      4.409253
 48:         1       11 1900000000    2      4.409253
 49:         1       12 1900000000    2      4.409253
 50:         1       13 1900000000    2      4.409253
 51:         1       14 1900000000    2      4.409253
 52:         1       15 1900000000    2      4.409253
 53:         1       16 1900000000    2      4.409253
 54:         1       17 1900000000    2      4.409253
 55:         1       18 1900000000    2      4.409253
 56:         1       19 1900000000    2      4.409253
 57:         1       20 1900000000    2      4.409253
 58:         1       21 1900000000    2      4.409253
 59:         1       22 1900000000    2      4.409253
 60:         1       23 1900000000    2      4.409253
 61:         1       25 1900000000    2      4.409253
 62:         1       26 1900000000    2      4.409253
 63:         1       27 1900000000    2      4.409253
 64:         1       28 1900000000    2      4.409253
 65:         1       29 1900000000    2      4.409253
 66:         1       30 1900000000    2      4.409253
 67:         1        1 1900000000    3      4.514642
 68:         1        3 1900000000    3      4.514642
 69:         1        4 1900000000    3      4.514642
 70:         1        5 1900000000    3      4.514642
 71:         1        6 1900000000    3      4.514642
 72:         1        7 1900000000    3      4.514642
 73:         1        9 1900000000    3      4.514642
 74:         1       11 1900000000    3      4.514642
 75:         1       12 1900000000    3      4.514642
 76:         1       13 1900000000    3      4.514642
 77:         1       14 1900000000    3      4.514642
 78:         1       15 1900000000    3      4.514642
 79:         1       16 1900000000    3      4.514642
 80:         1       18 1900000000    3      4.514642
 81:         1       19 1900000000    3      4.514642
 82:         1       20 1900000000    3      4.514642
 83:         1       21 1900000000    3      4.514642
 84:         1       22 1900000000    3      4.514642
 85:         1       23 1900000000    3      4.514642
 86:         1       24 1900000000    3      4.514642
 87:         1       25 1900000000    3      4.514642
 88:         1       26 1900000000    3      4.514642
 89:         1       27 1900000000    3      4.514642
 90:         1       28 1900000000    3      4.514642
 91:         1       29 1900000000    3      4.514642
 92:         1       30 1900000000    3      4.514642
 93:         1        2 1900000000    5      4.698335
 94:         1        3 1900000000    5      4.698335
 95:         1        4 1900000000    5      4.698335
 96:         1        6 1900000000    5      4.698335
 97:         1        7 1900000000    5      4.698335
 98:         1        9 1900000000    5      4.698335
 99:         1       11 1900000000    5      4.698335
100:         1       12 1900000000    5      4.698335

我想使用data.table功能创建一个名为lag_mean_ln_c_hit的新列,其中包含lag列中值的一年滞后mean_ln_c_hit,以seller_idhs10为条件。我最好的尝试是:

df[!is.na(year), lag_mean_ln_c_hit:= 
                        (.SD[.(seller_id = seller_id, hs10 = hs10, year = year - 1), mean_ln_c_hit, on = c("seller_id", "hs10", "year"), allow.cartesian = TRUE ]) ]

何时显示NAs正确显示哪种图片(即year = 1是第一年,因此第一year中所有观测值的滞后应为NA),但无法捕获正确的滞后。对于除4.835762以外的任何其他组合,它始终报告NA(尽管一旦报告4.173422,则显示此信息)。

每个seller_id具有相同buyer_id值的多个mean_ln_c_hit条目。因此,我有重复。运行代码,我得到以下Warning as well

Warning message:
In `[.data.table`(df, !is.na(year), `:=`(lag_mean_ln_c_hit, (.SD[.(seller_id = seller_id,  :
  Supplied 1640 items to be assigned to 100 items of column 'lag_mean_ln_c_hit' (1540 unused)

没有allow.cartesian = TRUE,代码将无法运行。

关于如何解决此问题的任何想法?它应该很简单,但是我无法理解。

r data.table
1个回答
0
投票
不清楚所需的输出是什么。如果必须是一年前,可以使用以下选项:

seller <- unique(df, by=c("seller_id", "hs10", "year")) df[, oneyearago := year - 1L] df[, lag_mean_ln_c_hit := seller[.SD, on=.(seller_id, hs10, year=oneyearago), x.mean_ln_c_hit]]

或者如果至少在一年之前,则可以使用非等额联接并选择最后一个匹配项:

seller <- unique(df, by=c("seller_id", "hs10", "year")) df[, lag_mean_ln_c_hit := seller[.SD, on=.(seller_id, hs10, year<year), x.mean_ln_c_hit, mult="last"]]

数据:

library(data.table) df <- fread("seller_id buyer_id hs10 year mean_ln_c_hit 1 5 1200000000 5 4.407456 1 24 1500000000 2 4.173422 1 8 1500000000 3 4.532695 1 8 1500000000 5 4.830106 1 16 1500000000 5 4.830106 1 17 1800000000 3 3.920435 1 20 1800000000 5 4.357196 1 27 1800000000 5 4.357196 1 1 1900000000 1 4.835762 1 2 1900000000 1 4.835762 1 3 1900000000 1 4.835762 1 4 1900000000 1 4.835762 1 5 1900000000 1 4.835762 1 6 1900000000 1 4.835762 1 7 1900000000 1 4.835762 1 8 1900000000 1 4.835762 1 9 1900000000 1 4.835762 1 11 1900000000 1 4.835762 1 12 1900000000 1 4.835762 1 13 1900000000 1 4.835762 1 14 1900000000 1 4.835762 1 15 1900000000 1 4.835762 1 16 1900000000 1 4.835762 1 17 1900000000 1 4.835762 1 18 1900000000 1 4.835762 1 19 1900000000 1 4.835762 1 20 1900000000 1 4.835762 1 21 1900000000 1 4.835762 1 22 1900000000 1 4.835762 1 23 1900000000 1 4.835762 1 24 1900000000 1 4.835762 1 25 1900000000 1 4.835762 1 26 1900000000 1 4.835762 1 27 1900000000 1 4.835762 1 28 1900000000 1 4.835762 1 29 1900000000 1 4.835762 1 30 1900000000 1 4.835762 1 1 1900000000 2 4.409253 1 2 1900000000 2 4.409253 1 3 1900000000 2 4.409253 1 4 1900000000 2 4.409253 1 5 1900000000 2 4.409253 1 6 1900000000 2 4.409253 1 7 1900000000 2 4.409253 1 8 1900000000 2 4.409253 1 9 1900000000 2 4.409253 1 10 1900000000 2 4.409253 1 11 1900000000 2 4.409253 1 12 1900000000 2 4.409253 1 13 1900000000 2 4.409253 1 14 1900000000 2 4.409253 1 15 1900000000 2 4.409253 1 16 1900000000 2 4.409253 1 17 1900000000 2 4.409253 1 18 1900000000 2 4.409253 1 19 1900000000 2 4.409253 1 20 1900000000 2 4.409253 1 21 1900000000 2 4.409253 1 22 1900000000 2 4.409253 1 23 1900000000 2 4.409253 1 25 1900000000 2 4.409253 1 26 1900000000 2 4.409253 1 27 1900000000 2 4.409253 1 28 1900000000 2 4.409253 1 29 1900000000 2 4.409253 1 30 1900000000 2 4.409253 1 1 1900000000 3 4.514642 1 3 1900000000 3 4.514642 1 4 1900000000 3 4.514642 1 5 1900000000 3 4.514642 1 6 1900000000 3 4.514642 1 7 1900000000 3 4.514642 1 9 1900000000 3 4.514642 1 11 1900000000 3 4.514642 1 12 1900000000 3 4.514642 1 13 1900000000 3 4.514642 1 14 1900000000 3 4.514642 1 15 1900000000 3 4.514642 1 16 1900000000 3 4.514642 1 18 1900000000 3 4.514642 1 19 1900000000 3 4.514642 1 20 1900000000 3 4.514642 1 21 1900000000 3 4.514642 1 22 1900000000 3 4.514642 1 23 1900000000 3 4.514642 1 24 1900000000 3 4.514642 1 25 1900000000 3 4.514642 1 26 1900000000 3 4.514642 1 27 1900000000 3 4.514642 1 28 1900000000 3 4.514642 1 29 1900000000 3 4.514642 1 30 1900000000 3 4.514642 1 2 1900000000 5 4.698335 1 3 1900000000 5 4.698335 1 4 1900000000 5 4.698335 1 6 1900000000 5 4.698335 1 7 1900000000 5 4.698335 1 9 1900000000 5 4.698335 1 11 1900000000 5 4.698335 1 12 1900000000 5 4.698335")

© www.soinside.com 2019 - 2024. All rights reserved.