我具有以下具有三列的数据帧-FIRM
,YEAR
和DUMMY
(0,1)。对于每个FIRM
,我想扫描所有年份并确定第一种情况,其中DUMMY
的值1重复多次(在连续的行中)。然后,我想创建一个新列,该列在DUMMY
为1的所有年份中包含0,并在其前几年包含-1,-2,-3,并在这些年份包含1,2,3之后。
------------------------------
| FIRM | YEAR | DUMMY| NEW_COL
------------------------------
| A | 2006 | 0 | 0 |
------------------------------
| A | 2007 | 1 | 0 |
------------------------------
| A | 2008 | 0 | 0 |
------------------------------
| B | 2006 | 0 | 0 |
------------------------------
| B | 2007 | 0 | -1 |
------------------------------
| B | 2008 | 1 | 0 |
------------------------------
| B | 2009 | 1 | 0 |
------------------------------
| B | 2010 | 0 | 1 |
------------------------------
| B | 2011 | 0 | 2 |
------------------------------
| B | 2012 | 1 | 3 |
------------------------------
| B | 2013 | 1 | 4 |
------------------------------
data.table
解决方案。
根据您的描述,我认为B公司的2006年应为-2。
library(data.table)
dt <- fread(' FIRM YEAR DUMMY NEW_COL
A 2006 0 0
A 2007 1 0
A 2008 0 0
B 2006 0 0
B 2007 0 -1
B 2008 1 0
B 2009 1 0
B 2010 0 1
B 2011 0 2
B 2012 1 3
B 2013 1 4 ')
dt[,c("flag","grp"):=.((.N>1) & (DUMMY==1),
.GRP),by=.(FIRM,rleid(DUMMY))]
dt
#> FIRM YEAR DUMMY NEW_COL flag grp
#> 1: A 2006 0 0 FALSE 1
#> 2: A 2007 1 0 FALSE 2
#> 3: A 2008 0 0 FALSE 3
#> 4: B 2006 0 0 FALSE 4
#> 5: B 2007 0 -1 FALSE 4
#> 6: B 2008 1 0 TRUE 5
#> 7: B 2009 1 0 TRUE 5
#> 8: B 2010 0 1 FALSE 6
#> 9: B 2011 0 2 FALSE 6
#> 10: B 2012 1 3 TRUE 7
#> 11: B 2013 1 4 TRUE 7
dt[flag==TRUE,result:=fifelse(grp==min(grp),0,99),by=.(FIRM)]
dt
#> FIRM YEAR DUMMY NEW_COL flag grp result
#> 1: A 2006 0 0 FALSE 1 NA
#> 2: A 2007 1 0 FALSE 2 NA
#> 3: A 2008 0 0 FALSE 3 NA
#> 4: B 2006 0 0 FALSE 4 NA
#> 5: B 2007 0 -1 FALSE 4 NA
#> 6: B 2008 1 0 TRUE 5 0
#> 7: B 2009 1 0 TRUE 5 0
#> 8: B 2010 0 1 FALSE 6 NA
#> 9: B 2011 0 2 FALSE 6 NA
#> 10: B 2012 1 3 TRUE 7 99
#> 11: B 2013 1 4 TRUE 7 99
dt[,result:=lapply(.SD,function(x){
if (any(!is.na(x==0))){
position_0_head <- head(which(x==0),1)
position_0_tail <- tail(which(x==0),1)
x[1:position_0_head] <- 0 - (YEAR[position_0_head]-YEAR[1:position_0_head])
x[position_0_tail:length(x)] <- 0 + (YEAR[position_0_tail:length(x)]-YEAR[position_0_tail])
} else{
x <- 0
}
x
}),.SDcols="result",by=.(FIRM)]
dt[,.SD,.SDcols = !c("flag","grp")]
#> FIRM YEAR DUMMY NEW_COL result
#> 1: A 2006 0 0 0
#> 2: A 2007 1 0 0
#> 3: A 2008 0 0 0
#> 4: B 2006 0 0 -2
#> 5: B 2007 0 -1 -1
#> 6: B 2008 1 0 0
#> 7: B 2009 1 0 0
#> 8: B 2010 0 1 1
#> 9: B 2011 0 2 2
#> 10: B 2012 1 3 3
#> 11: B 2013 1 4 4
由reprex package(v0.3.0)在2020-04-25创建