我有一个像这样的数据框:
id|sub1 |sub2 (header)
1|Rating:2,Grade:C,Semester:3 |Rating:1,Grade:A,Semester:2
2|Rating:3,Grade:A,Semester:2 |Rating:2,Grade:B,Semester:1
我希望它像这样:
id|sem|sub|grade|rating
1|3|sub1|C|2
1|2|sub2|A|1
2|2|sub1|A|3
2|1|sub2|B|2
我尝试过:
df.transpose()
您能提出更好的方法吗?
我们可以利用一些正则表达式和赋值
pat = (r'Rating:(\d{1})\W+Grade:(\w{1})\W+Semester:(\d{1})')
df.set_index('id',inplace=True)
a = df.sub1.str.extract(pat)
b = df['sub2 (header)'].str.extract(pat)
a['sub'] = 'sub1'
b['sub'] = 'sub2'
df_new = pd.concat([a,b])
df_new.rename(columns={0 : 'Rating', 1 : 'Grade', 2 : 'Semester'},inplace=True)
print(df_new)
Rating Grade Semester sub
id
1 2 C 3 sub1
2 3 A 2 sub1
1 1 A 2 sub2
2 2 B 1 sub2