Pandas Groupby - 运行自函数 - 然后转换(应用)

问题描述 投票:0回答:1

我需要对每组进行回归,然后将系数传递到新列

b
。这是我的代码:

自定义功能:

def simplereg(g, y, x):
    try:
        xvar = sm.add_constant(g[x])
        yvar = g[y]
        model = sm.OLS(yvar, xvar, missing='drop').fit()
        b = model.params[x]
        return pd.Series([b*100]*len(g))
    except Exception as e:
        return pd.Series([np.NaN]*len(g))

创建样本数据:

import pandas as pd
import numpy as np

# Setting the parameters
gvkeys = ['A', 'B', 'C', 'D']  # Possible values for gvkey
years = np.arange(2000, 2020)  # Possible values for year

# Number of rows for each gvkey, ensuring 5-7 observations for each
num_rows_per_gvkey = np.random.randint(5, 8, size=len(gvkeys))
total_rows = sum(num_rows_per_gvkey)

# Creating the DataFrame
np.random.seed(0)  # For reproducibility
df = pd.DataFrame({
    'gvkey': np.repeat(gvkeys, num_rows_per_gvkey),
    'year': np.random.choice(years, size=total_rows),
    'y': np.random.rand(total_rows),
    'x': np.random.rand(total_rows)
})

df.sort_values(by='year', ignore_index=True, inplace=True) # make sure if the code can handle even data without sort

运行

groupby
代码:

df['b'] = df.groupby('gvkey').apply(simplereg, y='y', x='x')

但是,代码返回列“b”且全部不适用 请问问题出在哪里以及如何解决?

谢谢你

python python-3.x pandas group-by
1个回答
0
投票

捕获所有异常是不是一个坏主意?

显然是的。此外,您不会显示任何错误消息。如果您的函数返回

NaN
值,则可能是因为您的代码引发了异常。

如果我

import statsmodels.api as sm
并进行一些小的更改,你的代码对我来说效果很好:

import statsmodels.api as sm

def simplereg(g, y, x):
    try:
        xvar = sm.add_constant(g[x])
        yvar = g[y]
        model = sm.OLS(yvar, xvar, missing='drop').fit()
        b = model.params[x]
        return pd.Series([b*100]*len(g), index=g.index)  # reindex here
    except Exception as e:
        print(e)  # At least, print exception here
        return pd.Series([np.NaN]*len(g), index=g.index)  # reindex here

# drop the first group level (gvkey) to align indexes
df['b'] = df.groupby('gvkey').apply(simplereg, y='y', x='x').droplevel('gvkey')

输出:

>>> df
   gvkey  year         y         x          b
0      A  2000  0.799159  0.128926 -55.326856
1      B  2001  0.774234  0.253292 -68.351309
2      A  2003  0.461479  0.315428 -55.326856
3      A  2003  0.780529  0.363711 -55.326856
4      B  2004  0.521848  0.208877 -68.351309
5      C  2005  0.612096  0.656330   6.342994
6      D  2005  0.060225  0.096098  36.320231
7      B  2006  0.414662  0.161310 -68.351309
8      C  2006  0.456150  0.466311   6.342994
9      A  2007  0.118274  0.570197 -55.326856
10     C  2007  0.568434  0.244426   6.342994
11     C  2008  0.943748  0.196582   6.342994
12     D  2009  0.681820  0.368725  36.320231
13     B  2009  0.639921  0.438602 -68.351309
14     A  2012  0.870012  0.670638 -55.326856
15     B  2012  0.264556  0.653108 -68.351309
16     C  2013  0.616934  0.138183   6.342994
17     C  2014  0.018790  0.158970   6.342994
18     A  2015  0.978618  0.210383 -55.326856
19     D  2015  0.666767  0.976459  36.320231
20     D  2016  0.437032  0.097101  36.320231
21     C  2017  0.617635  0.110375   6.342994
22     B  2018  0.944669  0.102045 -68.351309
23     B  2019  0.143353  0.988374 -68.351309
24     D  2019  0.359508  0.820993  36.320231
25     D  2019  0.697631  0.837945  36.320231
© www.soinside.com 2019 - 2024. All rights reserved.