Pandas 使用方程的新列

问题描述 投票:0回答:1

我有以下数据集:

df = pd.DataFrame ({'index': [10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26], 
'avg': [130, np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN, 135, np.NaN, np.NaN,np.NaN,np.NaN,np.NaN, 136, np.NaN,np.NaN],
 'slope':[.02,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN, .08,np.NaN, np.NaN,np.NaN,np.NaN,np.NaN, .03, np.NaN,np.NaN] })

我想创建新列“fit”,它将拟合“avg”中整数值出现之间的线性方程。我写了以下代码:

df.loc[0,'fit'] = df.loc [0,'avg']
def fitt ():
    for i in range (0, len(df)):
        if df.loc [i,'avg'] > 0:
            a = df.loc[i,'index']
            b = df.loc [i,'slope']
            c= df.loc [i,'avg']
            df.loc [i,'fit'] = df.loc [i, 'avg']
            continue
        while df.loc [i,'avg'] == np.NaN:
            df.loc[i,'fit'] = c + b * (i-a)
            
    return df

输出列“fit”应包含以下值:

df['fit]= [130,130.02,130.04,130.06,130.08,130.10,130.12,130.14,135,135.08,135.16,135.24,135.32,135.40,136, 136.03,136.06]

我想知道如何获得正确的代码。非常感谢任何帮助

python pandas dataframe numpy data-fitting
1个回答
1
投票

如果您首先将斜率传播到所有后续缺失值,您可以轻松地逐步计算“拟合”值,只需将斜率累加到先前的值即可:

df['slope'] = df.slope.fillna(method='ffill')
fit = df.avg.values.copy()
missing = df.avg.isna()

for i in range(len(df)):
    if missing[i]:
        fit[i] = fit[i - 1] + df.slope[i]
        
df['fit'] = fit
© www.soinside.com 2019 - 2024. All rights reserved.