完整的数据帧重组

问题描述 投票:2回答:1

我有一个像这样的数据框:

id|sub1 |sub2  (header)
1|Rating:2,Grade:C,Semester:3   |Rating:1,Grade:A,Semester:2    
2|Rating:3,Grade:A,Semester:2   |Rating:2,Grade:B,Semester:1

我希望它像这样:

id|sem|sub|grade|rating
1|3|sub1|C|2
1|2|sub2|A|1
2|2|sub1|A|3
2|1|sub2|B|2

我尝试过:

df.transpose()

您能提出更好的方法吗?

python pandas dataframe machine-learning apache-spark-sql
1个回答
0
投票

我们可以利用一些正则表达式和赋值

pat = (r'Rating:(\d{1})\W+Grade:(\w{1})\W+Semester:(\d{1})')

df.set_index('id',inplace=True)

a = df.sub1.str.extract(pat)
b = df['sub2  (header)'].str.extract(pat)

a['sub'] = 'sub1'
b['sub'] = 'sub2'

df_new = pd.concat([a,b])

df_new.rename(columns={0 : 'Rating', 1 : 'Grade', 2 : 'Semester'},inplace=True)

print(df_new)
     Rating Grade Semester sub
id                          
1       2     C      3  sub1
2       3     A      2  sub1
1       1     A      2  sub2
2       2     B      1  sub2
© www.soinside.com 2019 - 2024. All rights reserved.