如何在python中按年扩展/添加行?

问题描述 投票:0回答:2

我有这样的数据框

df = pd.DataFrame({'grade': ['A','C','B'], 'year': [2018,2015,2017], 'label': [1,2,3]})

  grade  year  label
0     A  2018      1
1     C  2015      2
2     B  2017      3

我想根据年份列(每个标签的最近年份)扩展数据框。基本上,每个标签要多产生4行,以涵盖最近5年的总和。

预期输出:

print(df_expanded)

   grade  year  label
0      A  2018      1
1      A  2017      1
2      A  2016      1
3      A  2015      1
4      A  2014      1
5      C  2015      2
6      C  2014      2
7      C  2013      2
8      C  2012      2
9      C  2011      2
10     B  2017      3
11     B  2016      3
12     B  2015      3
13     B  2014      3
14     B  2013      3

我尝试过的:

for lab in df['label'].unique():
    grp = df.loc[(df['label']==lab)]   
    yr = grp['year'].iloc[0]
    df_year = pd.DataFrame({'year': list(reversed(range(yr-4,yr+1)))})
    df_merged = pd.merge(grp, df_year, how='outer', left_on=['year'], right_on=['year'])
    df_merged = df_merged.fillna(method='ffill')
    df_expanded=pd.concat([df_expanded,df_merged],axis=0)

df_expanded = df_expanded.reset_index(drop=True)
df_expanded['label'] = df_expanded['label'].astype(int)

我的“ for循环”方法有效。但是,它在我的实际数据集(包含大约30000个标签)上的运行非常慢。我想知道一定有更好的方法可以做到这一点。非常感谢!

python pandas
2个回答
1
投票

您可以尝试:

(pd.concat(df.assign(year=df['year'].sub(i)) for i in range(5))
   .sort_index()
   .reset_index(drop=True)
)

输出:

   grade  year  label
0      A  2018      1
1      A  2017      1
2      A  2016      1
3      A  2015      1
4      A  2014      1
5      C  2015      2
6      C  2014      2
7      C  2013      2
8      C  2012      2
9      C  2011      2
10     B  2017      3
11     B  2016      3
12     B  2015      3
13     B  2014      3
14     B  2013      3

1
投票

理解

pd.DataFrame(
    [
        (g, y, l) for g, Y, l in zip(*map(df.get, df))
                  for y in range(Y, Y - 5, -1)
    ],
    columns=df.columns
)

   grade  year  label
0      A  2018      1
1      A  2017      1
2      A  2016      1
3      A  2015      1
4      A  2014      1
5      C  2015      2
6      C  2014      2
7      C  2013      2
8      C  2012      2
9      C  2011      2
10     B  2017      3
11     B  2016      3
12     B  2015      3
13     B  2014      3
14     B  2013      3

explode

df.assign(year=[range(y, y - 5, -1) for y in df.year]).explode('year')

  grade  year  label
0     A  2018      1
0     A  2017      1
0     A  2016      1
0     A  2015      1
0     A  2014      1
1     C  2015      2
1     C  2014      2
1     C  2013      2
1     C  2012      2
1     C  2011      2
2     B  2017      3
2     B  2016      3
2     B  2015      3
2     B  2014      3
2     B  2013      3
© www.soinside.com 2019 - 2024. All rights reserved.