使用 pandas 读取基于多索引标题的 excel 文件

Question

我有一个 excel 文件，其中前 3 行有标题名称，我想在 pandas 中读取它但在多索引标题中遇到困难。

                                     PLAN 2023                      
             Traffic per channel                   Traffic Share per Channel        
month week   All Traffic red green orange          red green orange
jan    1     100         50  30    20              50% 30%   20%

对于“月”和“周”，我将标题名称存储在第 3 行中，但对于其他人，它分布在第 1、2、3 行中。此外，行号不固定，因此，我需要按标题阅读。

最终的预期输出应该是这样的

month   week   plan_2023_Traffic_per_channel_All  .....plan_2023_Traffic_Share_per_channel_orange
jan     1                     100                                            20%

我的脚本在下面，为简单起见，我只打印 1 个值

import pandas as pd

# Load the Excel file
df = pd.read_excel('test_3.xlsx', sheet_name='WEEK - 2023', header=None)

# Set the first 3 rows as the header
header = df.iloc[:3,:].fillna(method='ffill', axis=1)
df.columns = pd.MultiIndex.from_arrays(header.values)
df = df.iloc[3:,:]

# Select only the specified columns
df = df.loc[:, ('month', 'week', ('PLAN 2023', 'Traffic per channel', 'red'))]

# Rename the columns to remove the multi-level header
df.columns = ['month', 'week', 'P_traffic_red']

# Print the final data frame
print(df)

图片参考

提前谢谢你

Answer 1

你可以试试：

df = pd.read_excel('test_3.xlsx', header=None)

cols = df.iloc[:3].ffill(axis=1).apply(lambda x: '_'.join(x.dropna()))
df = df.iloc[3:].set_axis(cols, axis=1)

输出：

>>> df
  statMonthName statWeek Plan 2023_Traffic per channel_All Traffic  ... Plan 2023_Traffic Share per Chanel_red Plan 2023_Traffic Share per Chanel_green Plan 2023_Traffic Share per Chanel_orange
3           jan        1                                       100  ...                                    50%                                      30%                                       20%

[1 rows x 9 columns]

>>> df.columns
Index(['statMonthName', 'statWeek',
       'Plan 2023_Traffic per channel_All Traffic',
       'Plan 2023_Traffic per channel_red',
       'Plan 2023_Traffic per channel_green',
       'Plan 2023_Traffic per channel_orange',
       'Plan 2023_Traffic Share per Chanel_red',
       'Plan 2023_Traffic Share per Chanel_green',
       'Plan 2023_Traffic Share per Chanel_orange'],
      dtype='object')

使用 pandas 读取基于多索引标题的 excel 文件

问题描述投票：0回答：1

1个回答

最新问题

使用 pandas 读取基于多索引标题的 excel 文件

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1