我有一个非常大的数据框,其中包含 20 年 400 种商品的每月价格指数。对于我的分析,我需要将指数跨年链接起来,以提供一致的基准参考期。本质上,一月指数是相对于上一月计算的。这张图片显示了如何进行链接的简单示例,突出显示了一月份。 item_index 列显示原始索引,new-index 列显示所需索引。在此示例中,new_index 2018:01=D14xE13/100、2018:02=D15xE14/100 以及...,直到 2019:01=D26xE25/100 等等。如果您能帮助我以有效的方式编写代码,我真的很感激。
df <- structure(list(index_date = structure(c(17167, 17198, 17226,
17257, 17287, 17318, 17348, 17379, 17410, 17440, 17471, 17501,
17532, 17563, 17591, 17622, 17652, 17683, 17713, 17744, 17775,
17805, 17836, 17866, 17897, 17928, 17956, 17987, 18017, 18048,
18078, 18109, 18140, 18170, 18201, 18231, 18262, 18293, 18322,
18353, 18383, 18414, 18444, 18475, 18506, 18536, 18567, 18597
), class = "Date"), item_id = c(310405, 310405, 310405, 310405,
310405, 310405, 310405, 310405, 310405, 310405, 310405, 310405,
310405, 310405, 310405, 310405, 310405, 310405, 310405, 310405,
310405, 310405, 310405, 310405, 310405, 310405, 310405, 310405,
310405, 310405, 310405, 310405, 310405, 310405, 310405, 310405,
310405, 310405, 310405, 310405, 310405, 310405, 310405, 310405,
310405, 310405, 310405, 310405), base_date = c(201601, 201701,
201701, 201701, 201701, 201701, 201701, 201701, 201701, 201701,
201701, 201701, 201712, 201801, 201801, 201801, 201801, 201801,
201801, 201801, 201801, 201801, 201801, 201801, 201812, 201901,
201901, 201901, 201901, 201901, 201901, 201901, 201901, 201901,
201901, 201901, 201912, 202001, 202001, 202001, 202001, 202001,
202001, 202001, 202001, 202001, 202001, 202001), item_index = c(98.258,
99.397, 99.947, 96.607, 98.417, 102.261, 101.719, 102.018, 99.88,
100.447, 95.759, 95.334, 103.718, 100.758, 97.906, 93.305, 98.987,
96.349, 93.586, 100.091, 97.8, 95.633, 93.759, 92.471, 101.023,
97.782, 99.697, 94.008, 97.942, 98.874, 95.886, 99.385, 97.472,
95.792, 98.138, 95.18, 101.098, 99.525, 98.032, 99.571, 96.245,
93.816, 95.445, 99.266, 97.008, 99.151, 96.824, 92.32), new_index = c(98.258,
99.397, 99.947, 96.607, 98.417, 102.261, 101.719, 102.018, 99.88,
100.447, 95.759, 95.334, 98.87851812, 99.6280172873496, 96.8080019505672,
92.258601331866, 97.8768787314444, 95.2684634234388, 92.5364499677832,
98.9684975714892, 96.70319072136, 94.5604932336996, 92.7075098041308,
91.4339544907452, 92.3693238451855, 90.3205722422993, 92.0894447939346,
86.834553960382, 90.4683631604516, 91.3292452586887, 88.5692498621946,
91.8012525035377, 90.0342273383792, 88.4824226977801, 90.6494070351882,
87.9171224358476, 88.8824524401932, 88.4602607911023, 87.1332457761702,
88.5011467192248, 85.544916351064, 83.3859615812917, 84.8338567315424,
88.2300552392822, 86.2230894631826, 88.127840418976, 86.0595457506927,
82.0562800927864)), row.names = c(NA, -48L), class = c("tbl_df",
"tbl", "data.frame"))
听起来你想要这样的东西:
dplyr::mutate(df, out = item_index * first(new_index) / 100, .by = base_date)
输出:
index_date item_id base_date item_index new_index out
<date> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2017-01-01 310405 201601 98.3 98.3 96.5
2 2017-02-01 310405 201701 99.4 99.4 98.8
3 2017-03-01 310405 201701 99.9 99.9 99.3
4 2017-04-01 310405 201701 96.6 96.6 96.0
5 2017-05-01 310405 201701 98.4 98.4 97.8
6 2017-06-01 310405 201701 102. 102. 102.
7 2017-07-01 310405 201701 102. 102. 101.
8 2017-08-01 310405 201701 102. 102. 101.
9 2017-09-01 310405 201701 99.9 99.9 99.3
10 2017-10-01 310405 201701 100. 100. 99.8
# ℹ 38 more rows