使用简单的Dataframe计算重写pandas中的apply函数

问题描述 投票:0回答:1

我有这段代码,它根据 pandas 数据帧上一些现有列的值来计算列的值。

def get_prj_yield(row):
    try:
        prj_yield = row['prj_rev'] / (row['ds'] + row['otb_demand'])
        if pandas.isnull(prj_yield):
            prj_yield = row['otb_rev'] / row['otb_demand']
        return prj_yield 
    except ZeroDivisionError:
        return 0

使用

apply
函数在数据帧上调用此函数。

df['prj_yield'] = output_df.apply(get_prj_yield, axis=1)

现有的数据帧有超过 1M 行,我想知道是否可以仅使用简单的数据帧计算来重写此函数。这会改善资源消耗吗?

python pandas dataframe apply
1个回答
1
投票

不要循环,使用矢量代码:

s1 = df['prj_rev'].div(df['ds'] + df['otb_demand'])
s2 = df['otb_rev'].div(df['otb_demand'])

df['prj_yield'] = s1.mask(s1.eq(0), s2).replace({np.inf: 0, -np.inf: 0})

替代方案:

import numpy as np

s1 = df['prj_rev'].div(df['ds'] + df['otb_demand'])
s2 = df['otb_rev'].div(df['otb_demand'])
s3 = s1.mask(s1.eq(0), s2)

df['prj_yield'] = s3.where(np.isfinite(s3), 0)

输出示例:

   prj_rev  ds  otb_demand  otb_rev  prj_yield
0        0   2           2        2        1.0
1        2   0           2        2        1.0
2        2   2           0        2        1.0
3        2   2           2        0        0.5
4        0   1           0        1        0.0
5       -2   1          -1        1        0.0
© www.soinside.com 2019 - 2024. All rights reserved.