我有像这样的长格式数据:
library(tidyverse)
df <- data.frame(
projection1 = c(2,4,3),
projection2 = c(3,1,4),
historical_data = c(2,3,4),
time = c(1,2,3)
) %>%
as_tibble() %>%
gather(key = key, value = val, projection1:historical_data) %>%
mutate(key = key %>% factor())
然后数据看起来像这样:
# A tibble: 9 x 3
time key val
<dbl> <fct> <dbl>
1 1 projection1 2
2 2 projection1 4
3 3 projection1 3
4 1 projection2 3
5 2 projection2 1
6 3 projection2 4
7 1 historical_data 2
8 2 historical_data 3
9 3 historical_data 4
现在,我想计算每年从projection1和projection2相对于history_data的值的相对差异。因此,我希望我的数据最终像这样:
# A tibble: 9 x 4
time key val pct_diff
<dbl> <fct> <dbl> <dbl>
1 1 projection1 2 1
2 2 projection1 4 1.33
3 3 projection1 3 0.75
4 1 projection2 3 1.5
5 2 projection2 1 0.333
6 3 projection2 4 1
7 1 historical_data 2 1
8 2 historical_data 3 1
9 3 historical_data 4 1
我总是最终进行拆分和合并,以获取新的看似多余的列,其中包含已经存在于当前dataframe
/ tibble
中的值以进行计算。我想知道是否有一个优雅的dplyr
或data.table解决方案?也许您可以将我引向已经回答的问题。我自己还没有碰到过。
谢谢
这是使用组的一种简单方法:
data.frame(
projection1 = c(2,4,3),
projection2 = c(3,1,4),
historical_data = c(2,3,4),
time = c(1,2,3)
) %>%
as_tibble() %>%
gather(key = key, value = val, projection1:historical_data) %>%
group_by(time) %>%
mutate(pct_diff = (val / val[key == "historical_data"]))
# Groups: time [3]
time key val pct_diff
<dbl> <chr> <dbl> <dbl>
1 1 projection1 2 1
2 2 projection1 4 1.33
3 3 projection1 3 0.75
4 1 projection2 3 1.5
5 2 projection2 1 0.333
6 3 projection2 4 1
7 1 historical_data 2 1
8 2 historical_data 3 1
9 3 historical_data 4 1
如果您坚持认为key
列是一个因素,那么您必须稍稍修改上面的代码。
这是一种可能的方法,它使用data.table
并使用jangorecki注释使用==
而不是较慢的grep
:
DT[, ratio := 1][key!="historical_data",
ratio := DT[key=="historical_data"][.SD, on=.(time), i.val/x.val]]
或更短,但可能更慢:
DT[, ratio := DT[key=="historical_data"][.SD, on=.(time), i.val/x.val]]
输出:
time key val ratio
1: 1 projection1 2 1.0000000
2: 2 projection1 4 1.3333333
3: 3 projection1 3 0.7500000
4: 1 projection2 3 1.5000000
5: 2 projection2 1 0.3333333
6: 3 projection2 4 1.0000000
7: 1 historical_data 2 1.0000000
8: 2 historical_data 3 1.0000000
9: 3 historical_data 4 1.0000000
数据:
library(data.table)
DT <- fread("time key val
1 projection1 2
2 projection1 4
3 projection1 3
1 projection2 3
2 projection2 1
3 projection2 4
1 historical_data 2
2 historical_data 3
3 historical_data 4")