假设您有以下数据框:
df <- data.frame(year=c(rep(2010,12),rep(2011,12),rep(2012,12)),
country=c(rep("DEU",4),rep("ITA",4),rep("USA",4),
rep("DEU",4),rep("ITA",4),rep("USA",4),
rep("DEU",4),rep("ITA",4),rep("USA",4)),
industry=c(rep(1:4,9)),
stock1=c(rep(0,24),0,0,2,4,1,0,1,2,3,3,3,5),
stock2=c(rep(0,24),0,3,3,4,5,0,1,1,2,2,2,5))
并且您希望获得以下结果:
df2 <- data.frame(year=c(rep(2010,12),rep(2011,12),rep(2012,12)),
country=c(rep("DEU",4),rep("ITA",4),rep("USA",4),
rep("DEU",4),rep("ITA",4),rep("USA",4),
rep("DEU",4),rep("ITA",4),rep("USA",4)),
industry=c(rep(1:4,9)),
stock1=c(rep(NA,24),0,0,2,4,1,0,1,2,3,3,3,5),
stock2=c(rep(NA,24),0,3,3,4,5,0,1,1,2,2,2,5))
这个概念是,如果在特定年份,某个特定国家/地区报告所有行业的库存 2 为零,则这些零应在库存 1 和库存 2 中替换为 NA(不可用)。我的尝试如下
library(dplyr)
df2 = df %>%
group_by(country, year, industry) %>%
mutate(
stock1 = ifelse(all(stock2 == 0), NA, stock1),
stock2 = ifelse(all(stock2 == 0), NA, stock2)
)
谢谢!
你可以尝试这个方法:
df %>%
mutate(
ind = all(stock2==0),
across(stock1:stock2, ~if_else(ind, NA,.)),
.by = c(year, country)) %>% select(-ind)