我具有以下大型数据集的示例数据集-
isin directorid dob_Year2 ROLE_START ROLE_END gender datestartrole dateendrole
US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09
US6819771048 340769 1970 2003 2004 M 2003-01-09 2004-02-24
US6819771048 340769 1970 2004 2007 M 2004-02-24 2007-09-07
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30
US68243Q1067 86917 1951 1976 1986 M 1976-04-01 1986-01-01
US68243Q1067 327069 1961 2016 2020 M 2016-06-30 2020-05-21
我的问题是-我想基于变量ROLE_START
和ROLE_END
创建New Rows。的数量假设数据按ROLE_START
和ROLE_END
分组,则要创建的行取决于MINIMUM isin
和MAXIMUM directorid
。例如-对于isin
US6819771048和directorid
340769,MINIMUM ROLE_START
年是1995年,而[[MAXIMUM ROLE_END
年是2007年。所以我需要为1995年的每一年打开行-2007和这些年份应存储在YEAR
变量中。 [请注意,在上面的示例中,在1995-2007年之间没有中断,因为从ROLE_START
和ROLE_END
可以清楚地看到所有年份都包括在内。如果任何一年之间有休息时间,则应排除这些休息时间。对于上述示例数据集,我的预期数据集应如下所示-
isin directorid dob_Year2 ROLE_START ROLE_END gender datestartrole dateendrole YEAR
US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 1995
US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 1996
US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 1997
US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 1998
US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 1999
US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 2000
US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 2001
US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 2002
US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 2003
US6819771048 340769 1970 2003 2004 M 2003-01-09 2004-02-24 2003
US6819771048 340769 1970 2003 2004 M 2003-01-09 2004-02-24 2004
US6819771048 340769 1970 2004 2007 M 2004-02-24 2007-09-07 2004
US6819771048 340769 1970 2004 2007 M 2004-02-24 2007-09-07 2005
US6819771048 340769 1970 2004 2007 M 2004-02-24 2007-09-07 2006
US6819771048 340769 1970 2004 2007 M 2004-02-24 2007-09-07 2007
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1986
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1987
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1988
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1989
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1990
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1991
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1992
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1993
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1994
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1995
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1996
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1997
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1998
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1999
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2000
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2001
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2002
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2003
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2004
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2005
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2006
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2007
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2008
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2009
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2010
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2011
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2012
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2013
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2014
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2015
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2016
US68243Q1067 86917 1951 1976 1986 M 1976-04-01 1986-01-01 1976
US68243Q1067 86917 1951 1976 1986 M 1976-04-01 1986-01-01 1977
US68243Q1067 86917 1951 1976 1986 M 1976-04-01 1986-01-01 1978
US68243Q1067 86917 1951 1976 1986 M 1976-04-01 1986-01-01 1979
US68243Q1067 86917 1951 1976 1986 M 1976-04-01 1986-01-01 1980
US68243Q1067 86917 1951 1976 1986 M 1976-04-01 1986-01-01 1981
US68243Q1067 86917 1951 1976 1986 M 1976-04-01 1986-01-01 1982
US68243Q1067 86917 1951 1976 1986 M 1976-04-01 1986-01-01 1983
US68243Q1067 86917 1951 1976 1986 M 1976-04-01 1986-01-01 1984
US68243Q1067 86917 1951 1976 1986 M 1976-04-01 1986-01-01 1985
US68243Q1067 86917 1951 1976 1986 M 1976-04-01 1986-01-01 1986
US68243Q1067 327069 1961 2016 2020 M 2016-06-30 2020-05-21 2016
US68243Q1067 327069 1961 2016 2020 M 2016-06-30 2020-05-21 2017
US68243Q1067 327069 1961 2016 2020 M 2016-06-30 2020-05-21 2018
US68243Q1067 327069 1961 2016 2020 M 2016-06-30 2020-05-21 2019
US68243Q1067 327069 1961 2016 2020 M 2016-06-30 2020-05-21 2020
ROLE_START
和ROLE_END
之间创建一个序列,并在不同的行中获取数据。 library(dplyr)
df %>%
mutate(YEAR = purrr::map2(ROLE_START, ROLE_END, seq)) %>%
tidyr::unnest(YEAR)