如何在 Pandas DataFrame 中融化多列对,每一对分别包含每年的男性和女性性别值

问题描述 投票:0回答:1

我有一个数据框,其中包含

"id"
"company name"
"company type"
等列,还有
"Year 1 Males Total"
"Year 1 Females Total"
"Year 2 Males Total"
"Year 2 Females Total"
等列。如何将这种宽格式转换为深度格式获取前面的列
"id"
"company name"
"company type"
"Year 1"
"Year 2"
但附加的
"Gender"
列中每个先前的样本将被分成两行,一排是男性,一排是女性?

我尝试了这个,结果正确地融化了年份,但是对于每一行,所有其他列都是 NaN

df_copy = df.copy()
for i in range(1,12):
    df_copy = pd.concat([df_copy, df.melt(value_vars=[f'Year {i} Males Total', f'Year {i} Females Total'], var_name='Gender', value_name=f'Year {i}')], axis=1)
    df_copy.drop(columns=[f'Year {i} Males Total',f'Year {i} Females Total', f'Year {i} Total'],axis=1,inplace=True)
python-3.x pandas dataframe data-manipulation
1个回答
0
投票

用途:

#create MultiIndex in index
df1 = df.set_index(['id','company name','company type'])
#create MultiIndex in columns extract years and Males or Females substrings
df1.columns = pd.MultiIndex.from_frame(df1.columns.str.extract(r'(\d+)\s+(Males|Females)'))
#reshape for years in columns
df1 = df1.rename_axis([None, 'Gender'], axis=1).stack().add_prefix('Year ').reset_index()
© www.soinside.com 2019 - 2024. All rights reserved.