为另一个变量的第一个非NA创建一个0的变量，然后从0开始向上/向下计数其他值由第三个变量分组

Question

我有以下df：

df <- tibble(country = c("US", "US", "US", "US", "US", "US", "US", "US", "US", "Mex", "Mex"),
         year = c(1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2000, 2001),
         score = c(NA, NA, NA, NA, 426, NA, NA, 430, NA, 450, NA))

我想做的是：创建一个新的变量years_from_implementation，它是一个国家对score具有非NA值的第一年为0，并表示所有其他值从0开始的年数。

换句话说，硬编码，我希望它返回以下df：

df <- tibble(country = c("US", "US", "US", "US", "US", "US", "US", "US", "US", "Mex", "Mex"),
         year = c(1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2000, 2001),
         score = c(NA, NA, NA, NA, 426, NA, NA, 430, NA, 450, NA),
         years_from_implementation = c(-4,-3,-2,-1,0,1,2,3,4,0,1))

这一切都是在country分组时完成的。

我试图将df <- mutate(df, before_after = case_when(!is.na(score) ~ 0))与fill命令结合，但无法获得任何动作。

Tidyverse解决方案将是首选，但真正的任何帮助将非常值得赞赏。

提前致谢！

Answer 1

这是一个dplyr选项

library(dplyr)
df %>%
    group_by(country) %>%
    mutate(years_from_implementation = 1:n() - which(score == first(score[!is.na(score)]))) %>%
    ungroup()
## A tibble: 11 x 4
#   country  year score years_from_implementation
#   <chr>   <dbl> <dbl>                     <int>
# 1 US       1999    NA                        -4
# 2 US       2000    NA                        -3
# 3 US       2001    NA                        -2
# 4 US       2002    NA                        -1
# 5 US       2003   426                         0
# 6 US       2004    NA                         1
# 7 US       2005    NA                         2
# 8 US       2006   430                         3
# 9 US       2007    NA                         4
#10 Mex      2000   450                         0
#11 Mex      2001    NA                         1

Answer 2

我们可以找出第一个非NA score出现的行索引，然后为每个组创建一个从1 - index到n() - index的序列。

library(dplyr)

df %>%
   group_by(country) %>%
   mutate(index = which.max(!is.na(score)), 
          years_from_implementation = (1 - index[1]):(n() - index[1])) %>%
   select(-index)

# country  year score years_from_implementation
#   <chr>   <dbl> <dbl>                     <int>
# 1 US       1999    NA                        -4
# 2 US       2000    NA                        -3
# 3 US       2001    NA                        -2
# 4 US       2002    NA                        -1
# 5 US       2003   426                         0
# 6 US       2004    NA                         1
# 7 US       2005    NA                         2
# 8 US       2006   430                         3
# 9 US       2007    NA                         4
#10 Mex      2000   450                         0
#11 Mex      2001    NA                         1

为另一个变量的第一个非NA创建一个0的变量，然后从0开始向上/向下计数其他值由第三个变量分组

问题描述投票：1回答：2

2个回答

最新问题

为另一个变量的第一个非NA创建一个0的变量，然后从0开始向上/向下计数其他值*由*第三个变量分组

问题描述 投票：1回答：2

2个回答

最新问题

为另一个变量的第一个非NA创建一个0的变量，然后从0开始向上/向下计数其他值由第三个变量分组

问题描述投票：1回答：2