我想研究A组关于某些我称之为"target_n"
的因变量对B的影响。由于数据的生成方式,我的数据集中有“层”信息,按组排序。这意味着,在Group=="B"
的行中,我有关于"target_n"
上的B值和Group=="A"
行的信息,我有关于"X_n"
上A值的信息。组“C”基本上是一个“其他”类别,但是我需要将它们放在与A和B相同的行中,以确保A的效果在B上而不在C上。以下内容应该增加一些清晰度:
我的数据(df
)的结构如下:
df<-data.frame(
"Date"=c(1990-03,2000-01,2010-09,1990-03,2000-01,2010-09,1990-03,2000-01,2010-09),
"Group"=c("A","A","A","B","B","B","C","C","C"),
"X_1_A"=c(9,4,7,NA,NA,NA,NA,NA,NA),
"X_2_A"=c(1,2,6,NA,NA,NA,NA,NA,NA),
"target_1_B"=c(NA,NA,NA,0,2,9,NA,NA,NA),
"target_2_B"=c(NA,NA,NA,9,2,1,NA,NA,NA),
"target_1_C"=c(NA,NA,NA,NA,NA,NA,5,3,1),
"target_2_C"=c(NA,NA,NA,NA,NA,NA,1,9,2)
)
我想要的是计算组"A"
和组"C"
的新变量,以便所有内容都在同一行内。如果我手动执行此操作,我会在“1990-03”日期将A列的“X_1”分数分配给同一日期的A列中的B位。
所以最后,我的数据看起来像这样:
df<-data.frame(
"Date"=c(1990,2000,2010,1990,2000,2010,1990,2000,2010),
"Group"=c("A","A","A","B","B","B","C","C","C"),
"X_1_A"=c(9,4,7,NA,NA,NA,NA,NA,NA),
"X_2_A"=c(1,2,6,NA,NA,NA,NA,NA,NA),
"target_1_B"=c(NA,NA,NA,0,2,9,NA,NA,NA),
"target_2_B"=c(NA,NA,NA,9,2,1,NA,NA,NA),
"target_1_C"=c(NA,NA,NA,NA,NA,NA,5,3,1),
"target_2_C"=c(NA,NA,NA,NA,NA,NA,1,9,2),
"NEW_X_1_A"=c(NA,NA,NA,9,4,7,NA,NA,NA),
"NEW_X_2_A"=c(NA,NA,NA,1,2,6,NA,NA,NA),
"NEW_target_1_C"=c(NA,NA,NA,5,3,1,NA,NA,NA),
"NEW_target_2_C"=c(NA,NA,NA,1,9,2,NA,NA,NA)
)
(我有许多这些"X_"
s和"target_"
变量的数量完全相同。我也不只是有这组A,B和C,而是A1,A2,A3,C1,C2,C3甚至更多Bs。每一组A1,B1,C1我也有一组“日期”与其他“套装”不匹配。但这不会是一个问题,因为我可以简单地将我的数据集水平切割成集合,为所有人做诀窍他们分开并再次合并。)
但是我如何根据Group=="B"
并基于date
将A和C的值带入B的行?
使用data.table
你可以试试
df<-data.frame(
"Date"=c("1990-03","2000-01","2010-09","1990-03","2000-01","2010-09","1990-03","2000-01","2010-09"),
"Group"=c("A","A","A","B","B","B","C","C","C"),
"X1_A"=c(9,4,7,NA,NA,NA,NA,NA,NA),
"X2_A"=c(1,2,6,NA,NA,NA,NA,NA,NA),
"target_value_1_B"=c(NA,NA,NA,0,2,9,NA,NA,NA),
"target_value_2_B"=c(NA,NA,NA,9,2,1,NA,NA,NA),
"target_value_1_C"=c(NA,NA,NA,NA,NA,NA,5,3,1),
"target_value_2_C"=c(NA,NA,NA,NA,NA,NA,1,9,2)
)
library(data.table)
setDT(df)[,`:=` (NEW_X1 = ifelse(Group=="B",X1_A[Group=="A"],NA),
NEW_X2 = ifelse(Group=="B",X2_A[Group=="A"],NA),
NEW_target_value_1_C =ifelse(Group=="B",target_value_1_C[Group=="C"],NA),
NEW_target_value_2_C =ifelse(Group=="B",target_value_2_C[Group=="C"],NA)
)]
结果如下:
df
Date Group X1_A X2_A target_value_1_B target_value_2_B target_value_1_C target_value_2_C NEW_X1 NEW_X2 NEW_target_value_1_C NEW_target_value_2_C
1: 1990-03 A 9 1 NA NA NA NA NA NA NA NA
2: 2000-01 A 4 2 NA NA NA NA NA NA NA NA
3: 2010-09 A 7 6 NA NA NA NA NA NA NA NA
4: 1990-03 B NA NA 0 9 NA NA 9 1 5 1
5: 2000-01 B NA NA 2 2 NA NA 4 2 3 9
6: 2010-09 B NA NA 9 1 NA NA 7 6 1 2
7: 1990-03 C NA NA NA NA 5 1 NA NA NA NA
8: 2000-01 C NA NA NA NA 3 9 NA NA NA NA
9: 2010-09 C NA NA NA NA 1 2 NA NA NA NA