pandas read_excel处理多索引中的缺失值

问题描述 投票:0回答:1

我正在尝试使用pandas read_excel()来读取包含多索引行的Excel文件,该索引的第二级包含缺失值。这种类型的多索引在统计数据中并不罕见。如何避免read_excel()填充索引中的缺失值?

为了说明这一点,请考虑以下示例:

In [1]: import pandas as pd

In [2]: m_indx = pd.MultiIndex.from_tuples(
   ...:     [ ('foo','',),
   ...:       ('foo','of which bar',),
   ...:       ('baz','',),
   ...:       ('baz','of which qux',),
   ...:     ]
   ...: )

In [3]: df = pd.DataFrame([[10,],[5,],[15,],[3,]], columns=['Volume'], index=m_indx)

In [4]: df
Out[4]: 
                  Volume
foo                   10
    of which bar       5
baz                   15
    of which qux       3

In [5]: df.to_excel("test.xlsx")

In [6]: pd.read_excel('test.xlsx', index_col=[0,1])
Out[6]: 
                  Volume
foo NaN               10
    of which bar       5
baz of which bar      15
    of which qux       3

这是'of which bar'的重复,我想抑制它,因为它不在从磁盘读取的excel文件中。 (我正在使用Python 3.7.7和Pandas 1.0.3)

pandas multi-index
1个回答
0
投票

您可以尝试使用不带index_col参数的read_excel,然后在导入后设置索引:

pd.read_excel('test.xlsx').set_index(['Unnamed: 0','Unnamed: 1'])

输出:

                         Volume
Unnamed: 0 Unnamed: 1          
foo        NaN               10
NaN        of which bar       5
baz        NaN               15
NaN        of which qux       3
© www.soinside.com 2019 - 2024. All rights reserved.