pandas read_excel处理多索引中的缺失值

Question

我正在尝试使用pandas read_excel（）来读取包含多索引行的Excel文件，该索引的第二级包含缺失值。这种类型的多索引在统计数据中并不罕见。如何避免read_excel（）填充索引中的缺失值？

为了说明这一点，请考虑以下示例：

In [1]: import pandas as pd

In [2]: m_indx = pd.MultiIndex.from_tuples(
   ...:     [ ('foo','',),
   ...:       ('foo','of which bar',),
   ...:       ('baz','',),
   ...:       ('baz','of which qux',),
   ...:     ]
   ...: )

In [3]: df = pd.DataFrame([[10,],[5,],[15,],[3,]], columns=['Volume'], index=m_indx)

In [4]: df
Out[4]: 
                  Volume
foo                   10
    of which bar       5
baz                   15
    of which qux       3

In [5]: df.to_excel("test.xlsx")

In [6]: pd.read_excel('test.xlsx', index_col=[0,1])
Out[6]: 
                  Volume
foo NaN               10
    of which bar       5
baz of which bar      15
    of which qux       3

这是'of which bar'的重复，我想抑制它，因为它不在从磁盘读取的excel文件中。（我正在使用Python 3.7.7和Pandas 1.0.3）

Answer 1

您可以尝试使用不带index_col参数的read_excel，然后在导入后设置索引：

pd.read_excel('test.xlsx').set_index(['Unnamed: 0','Unnamed: 1'])

输出：

                         Volume
Unnamed: 0 Unnamed: 1          
foo        NaN               10
NaN        of which bar       5
baz        NaN               15
NaN        of which qux       3

pandas read_excel处理多索引中的缺失值

问题描述投票：0回答：1

1个回答

最新问题

pandas read_excel处理多索引中的缺失值

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1