pandas reindex method ='pad'在MultiIndex上无法正常工作

问题描述 投票:1回答:2

我想为两列(字符串和datetime64)的Multindex重新编制索引。但是,方法板无法按我预期的方式工作。

我的代码如下:

df = pd.read_csv('./Artikelstatus.csv',
                 sep=';',
                 parse_dates=['VON_DTM'],
                 infer_datetime_format=True,
                 usecols=['Artikel','ATTRIBUT','VON_DTM'],
                 dtype={'Artikel': 'str', 'ATTRIBUT': 'str'})

df["normalized_date"] = df["VON_DTM"].dt.floor("D")

min_dat = min(df['normalized_date'])
max_dat = np.datetime64(datetime.now().date())

articles = df['Artikel'].unique()
dates = np.arange(min_dat, max_dat, step=np.timedelta64(1,'D'))
df = df.set_index(['Artikel','VON_DTM']).groupby(['Artikel','normalized_date']).first()
index = pd.MultiIndex.from_product([articles, dates],names=['Artikel', 'normalized_date'])


df.info()
print(df.query('Artikel == "00017"'))
df = df.reindex(index, method='pad')
df.info()
print(df.query('Artikel == "00017" & normalized_date >= "2018-10-01" & normalized_date <= "2018-11-25"'))

输出在下面列出

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 13265 entries, (00017, 2018-10-01 00:00:00) to (25003, 2018-11-22 00:00:00)
Data columns (total 1 columns):
ATTRIBUT    13265 non-null object
dtypes: object(1)
memory usage: 170.5+ KB
                        ATTRIBUT
Artikel normalized_date         
00017   2018-10-01             0
        2018-11-21             3
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 834065 entries, (00017, 2018-10-01 00:00:00) to (25003, 2020-03-18 00:00:00)
Data columns (total 1 columns):
ATTRIBUT    833877 non-null object
dtypes: object(1)
memory usage: 9.6+ MB
                        ATTRIBUT
Artikel normalized_date         
00017   2018-10-01             3
        2018-10-02             3
...
        2018-11-01             3
        2018-11-02           NaN
        2018-11-03           NaN
        2018-11-04           NaN
        2018-11-05           NaN
        2018-11-06             0
        2018-11-07             3
        2018-11-08             3
...
        2018-11-25             3

我希望ATTRIBUT的值从0开始并在2018-11-21更改为3。我错过了什么吗?

python pandas dataframe multi-index reindex
2个回答
0
投票

似乎我找到了替代解决方案

df = df.reindex(index, method='pad')

with

df = df.reindex(index)
df["ATTRIBUT"] = df["ATTRIBUT"].fillna(method = 'pad')

现在它正在按预期方式工作。但是我仍然无法解释第一种方法的怪异行为。


0
投票

编辑:这不是答案。

print(df.query('Artikel == "16806"'))

输出

                        ATTRIBUT
Artikel normalized_date         
16806   2018-10-01             2
        2018-10-02             2
        2018-10-03             2
        2018-10-04             2
        2018-10-05             2
        2018-10-06             2
        2018-10-07             2
        2018-10-08             2
...

但是该文章的属性值从未为2。似乎它是从先前的文章编号16521复制而来的。有什么办法可以解决此问题?

© www.soinside.com 2019 - 2024. All rights reserved.