给定时间除以特定因子的值(dplyr,data.table)

问题描述 投票:1回答:2

我有像这样的长格式数据:

library(tidyverse)

df <- data.frame(
  projection1 = c(2,4,3),
  projection2 = c(3,1,4),
  historical_data = c(2,3,4),
  time = c(1,2,3)
) %>% 
  as_tibble() %>% 
  gather(key = key, value = val, projection1:historical_data) %>% 
  mutate(key = key %>% factor())

然后数据看起来像这样:

# A tibble: 9 x 3
   time key               val
  <dbl> <fct>           <dbl>
1     1 projection1         2
2     2 projection1         4
3     3 projection1         3
4     1 projection2         3
5     2 projection2         1
6     3 projection2         4
7     1 historical_data     2
8     2 historical_data     3
9     3 historical_data     4

现在,我想计算每年从projection1和projection2相对于history_data的值的相对差异。因此,我希望我的数据最终像这样:

# A tibble: 9 x 4
   time key               val pct_diff
  <dbl> <fct>           <dbl>    <dbl>
1     1 projection1         2    1    
2     2 projection1         4    1.33 
3     3 projection1         3    0.75 
4     1 projection2         3    1.5  
5     2 projection2         1    0.333
6     3 projection2         4    1    
7     1 historical_data     2    1    
8     2 historical_data     3    1    
9     3 historical_data     4    1

我总是最终进行拆分和合并,以获取新的看似多余的列,其中包含已经存在于当前dataframe / tibble中的值以进行计算。我想知道是否有一个优雅的dplyr或data.table解决方案?也许您可以将我引向已经回答的问题。我自己还没有碰到过。

谢谢

r dataframe dplyr data.table
2个回答
0
投票

这是使用组的一种简单方法:

 data.frame(
  projection1 = c(2,4,3),
  projection2 = c(3,1,4),
  historical_data = c(2,3,4),
  time = c(1,2,3)
) %>% 
  as_tibble() %>% 
  gather(key = key, value = val, projection1:historical_data) %>%
  group_by(time) %>%
  mutate(pct_diff = (val  / val[key == "historical_data"]))

# Groups:   time [3]
   time key               val pct_diff
  <dbl> <chr>           <dbl>    <dbl>
1     1 projection1         2    1    
2     2 projection1         4    1.33 
3     3 projection1         3    0.75 
4     1 projection2         3    1.5  
5     2 projection2         1    0.333
6     3 projection2         4    1    
7     1 historical_data     2    1    
8     2 historical_data     3    1    
9     3 historical_data     4    1 

如果您坚持认为key列是一个因素,那么您必须稍稍修改上面的代码。


2
投票

这是一种可能的方法,它使用data.table并使用jangorecki注释使用==而不是较慢的grep

DT[, ratio := 1][key!="historical_data", 
    ratio := DT[key=="historical_data"][.SD, on=.(time), i.val/x.val]]

或更短,但可能更慢:

DT[, ratio := DT[key=="historical_data"][.SD, on=.(time), i.val/x.val]]

输出:

   time             key val     ratio
1:    1     projection1   2 1.0000000
2:    2     projection1   4 1.3333333
3:    3     projection1   3 0.7500000
4:    1     projection2   3 1.5000000
5:    2     projection2   1 0.3333333
6:    3     projection2   4 1.0000000
7:    1 historical_data   2 1.0000000
8:    2 historical_data   3 1.0000000
9:    3 historical_data   4 1.0000000

数据:

library(data.table)
DT <- fread("time key val
1 projection1         2
2 projection1         4
3 projection1         3
1 projection2         3
2 projection2         1
3 projection2         4
1 historical_data     2
2 historical_data     3
3 historical_data     4")
© www.soinside.com 2019 - 2024. All rights reserved.