根据另一列的值创建列

问题描述 投票:0回答:1

请考虑此数据框:

import pandas as pd
import numpy as np

values = [0, 22, 30, 0, 20, 22, 11, 0, 13]
index = pd.date_range(start = '2023-10-1', periods = len(values))

df = pd.DataFrame({'values':values }, index = index)

df
           values
2023-10-01  0
2023-10-02  22
2023-10-03  30
2023-10-04  0
2023-10-05  20
2023-10-06  22
2023-10-07  11
2023-10-08  0
2023-10-09  13

目标:创建一个新列,计算自

values
中最后一个 0 以来已经过去了多少天。

我可以使用 for 循环来做到这一点:

zero_indices = df[df['values'] == 0].index
df['days'] = np.nan

for i in range(len(zero_indices)-1):
    df['days'][zero_indices[i]: zero_indices[i+1]] = range(len(df[zero_indices[i]: zero_indices[i+1]]))
df['days'][zero_indices[-1]: ] = range(len(df[zero_indices[-1]: ]))


           values   days
2023-10-01  0   0.00
2023-10-02  22  1.00
2023-10-03  30  2.00
2023-10-04  0   0.00
2023-10-05  20  1.00
2023-10-06  22  2.00
2023-10-07  11  3.00
2023-10-08  0   0.00
2023-10-09  13  1.00

问题:如何使用矢量化(更快)来完成此操作?

python pandas numpy vectorization
1个回答
0
投票

有很多方法可以做到这一点,其中一种解决方案是使用

groupby
cumcount
:

df['temp'] = (df.values == 0).cumsum()
df.groupby(['temp']).cumcount() # this just gives the cumulative count since the last 0 value

输出:

2023-10-01    0
2023-10-02    1
2023-10-03    2
2023-10-04    0
2023-10-05    1
2023-10-06    2
2023-10-07    3
2023-10-08    0
2023-10-09    1
Freq: D, dtype: int64
© www.soinside.com 2019 - 2024. All rights reserved.