我正在努力调整和重塑一些数据。我有如下数据。
昵称:尼克·加文昵称:尼克工作:老师昵称:尼克职责:teaching_math昵称:Bob Marcus昵称:Bob职位:音乐家昵称:鲍勃职责:弹奏钢琴
我想将其更改为:
尼克老师的教学_数学加文老师的教学_数学鲍勃·音乐家(Bob Musician)演奏_钢琴马库斯音乐家演奏_钢琴
任何帮助将不胜感激!
#get the names, remove the nickname appendage
df[0] = df[0].str.split(':').str[-1]
#create temp column to get nicknames into another column
df['temp'] = np.where(~df[1].str.contains('[:]'),df[0],np.nan)
#extract words after the ':'
df[1] = df[1].str.lstrip('job:').str.lstrip('duties:').str.strip()
#fillna to the side so each name has job and duties beneath
df = df.ffill(axis=1)
#group by col 0
#combine words
#stack
#split into separate columns
#and drop index 0
final = (df
.groupby(0)
.agg(lambda x: x.str.cat(sep=','))
.stack()
.str.split(',', expand = True)
.reset_index(drop=[0]))
最终
0 1 2
0 Marcus Musician plays_piano
1 Bob Musician plays_piano
2 Gavin Teacher teaching_math
3 Nick Teacher teaching_math