我有一个数据框,其中包含
"id"
、"company name"
、"company type"
等列,还有 "Year 1 Males Total"
、"Year 1 Females Total"
、"Year 2 Males Total"
、"Year 2 Females Total"
等列。如何将这种宽格式转换为深度格式获取前面的列 "id"
、"company name"
、"company type"
和 "Year 1"
、"Year 2"
但附加的 "Gender"
列中每个先前的样本将被分成两行,一排是男性,一排是女性?
我尝试了这个,结果正确地融化了年份,但是对于每一行,所有其他列都是 NaN
df_copy = df.copy()
for i in range(1,12):
df_copy = pd.concat([df_copy, df.melt(value_vars=[f'Year {i} Males Total', f'Year {i} Females Total'], var_name='Gender', value_name=f'Year {i}')], axis=1)
df_copy.drop(columns=[f'Year {i} Males Total',f'Year {i} Females Total', f'Year {i} Total'],axis=1,inplace=True)
用途:
#create MultiIndex in index
df1 = df.set_index(['id','company name','company type'])
#create MultiIndex in columns extract years and Males or Females substrings
df1.columns = pd.MultiIndex.from_frame(df1.columns.str.extract(r'(\d+)\s+(Males|Females)'))
#reshape for years in columns
df1 = df1.rename_axis([None, 'Gender'], axis=1).stack().add_prefix('Year ').reset_index()