我有以下数据框:
from_year to_year id gender
1990 1993 1 Female
1987 1992 2 Male
2000 2000 3 Male
2010 2011 4 Female
我想生成以下随时间变化的数据框:
id year gender
1 1990 Female
1 1991 Female
1 1992 Female
1 1993 Female
2 1987 Male
2 1988 Male
2 1989 Male
2 1990 Male
2 1991 Male
2 1992 Male
3 2000 Male
4 2010 Female
4 2011 Female
使用 python pandas 将顶部数据帧转换为底部数据帧的最有效方法是什么?
这是一种方法:
res = (
df.reindex(np.repeat(df.index, df['to_year'].sub(df['from_year']).add(1)))
.pipe(lambda x:
x.assign(year=x['from_year'].add(x.groupby('id').cumcount()))
)
.loc[:, ['id', 'year', 'gender']]
.reset_index(drop=True)
)
输出
id year gender
0 1 1990 Female
1 1 1991 Female
2 1 1992 Female
3 1 1993 Female
4 2 1987 Male
5 2 1988 Male
6 2 1989 Male
7 2 1990 Male
8 2 1991 Male
9 2 1992 Male
10 3 2000 Male
11 4 2010 Female
12 4 2011 Female
解释
Series.sub
, Series.add
.np.repeat
重复索引值并应用 df.reindex
。df.pipe
),并分配一个新列“year”(df.assign
),其中填充“from_year”加上“id”中每组的 cumcount (df.groupby
) ,groupby.cumcount
)。df.loc
选择所需的列,然后添加 df.reset_index
。