我需要在 Python 中计算滚动窗口回归,其中针对 HAC 校正了标准误差(Newey-West,1987)。我知道statsmodels有一个滚动窗口回归(Rolling Regression)的函数,但是这个函数中HAC无法修正标准误。因此,我定义了自己的功能。
我有一个面板数据集,有 108,768 行(410 只基金),结构如下:
funds = pd.DataFrame({
"Fund": ["A", "A", "A", "A", "B", "B", "B", "B"],
"Excess_Return": [np.NaN, 0.172, 0.0465, 0.039, 0.003995, -0.022139, 0.009518, 0.03233],
"Regression_Constant": [1,1,1,1,1,1,1,1],
"RMRF": [0.0118,0.0557,0.0129,0.0403,0.0118,0.0557,0.0129,0.0403],
"SMB": [0.0445,0.1838,-0.1539,-0.0496,0.0445,0.1838,-0.1539,-0.0496],
"HML": [-0.0189,-0.0981,0.0823,0.0725,-0.0189,-0.0981,0.0823,0.0725],
"RMW": [-0.0629,-0.1876,0.1182,0.0767,-0.0629,-0.1876,0.1182,0.0767],
"CMA": [0.0474,-0.0035,-0.0161,0.0562,0.0474,-0.0035,-0.0161,0.0562]})
函数定义如下:
min_t = 30
t = 36
def process(x):
if x["Excess_Return"].count() >= min_t:
reg = smf.ols("Excess_Return ~ RMRF + SMB + HML + RMW + CMA", data = x).fit(cov_type="HAC", cov_kwds={"maxlags":1})
return [
reg.params[0],
reg.params["RMRF"],
reg.params["SMB"],
reg.params["HML"],
reg.params["RMW"],
reg.params["CMA"],
# tvalues
reg.tvalues[0],
reg.tvalues["RMRF"],
reg.tvalues["SMB"],
reg.tvalues["HML"],
reg.tvalues["RMW"],
reg.tvalues["CMA"],
]
# Else return NaN
return [np.nan] * 10
要运行函数并在数据框中获取结果,我运行以下命令:
df_1 = funds.join(
# join new DataFrame back to original
pd.DataFrame(
(process(x) for x in funds.rolling(t)),
columns=["alpha", "Beta_RMRF", "Beta_SMB", "Beta_HML", "Beta_RMW", "Beta_CMA",
"t_alpha", "t_RMRF", "t_SMB", "t_HML", "t_RMW", "t_CMA"]
)
)
这将返回回归的正确 t 统计量。但是,由于我有多个基金,我需要按名称对基金进行分组,即在对基金 B 进行回归时不考虑基金 A 的回报。有人有解决方案如何重新排列代码吗?
下面是来自 statsmodels 的 RollingOLS 的工作示例。灵感来自这个问题的answer on Rolling OLS Regressions and Predictions by Group。
这可以很容易地修改为您的面板数据以执行滚动窗口回归。在
model.fit()
中指定协方差结构。参见https://www.statsmodels.org/dev/generated/statsmodels.regression.rolling.RollingOLS.fit.html#statsmodels.regression.rolling.RollingOLS.fit.
from statsmodels.regression.rolling import RollingOLS
from statsmodels.tools.tools import add_constant
import statsmodels.api as sm
import pandas as pd
import numpy as np
data = sm.datasets.grunfeld.load()
df_grunfeld = pd.DataFrame(data.data)
df_grunfeld.set_index(['firm'], append=True, inplace=True)
# Simple Model
# $$invest = \beta_0 + \beta_1 value$$
def invest_params(df_gf, intercept=False):
"""
Function to operate on the data of a single firm.
Assumes df_gf has the columns 'invest' and 'value' available.
Returns a dataframe containing model parameters
"""
# we should have at least k + 1 observations
min_obs = 3 if intercept else 2
wndw = 8
# if there are less than min_obs rows in df_gf, RollingOLS will throw an error
# Instead, handle this case separately
if df_gf.shape[0] < min_obs:
cols = ['coef_intercept', 'coef_value'] if intercept else ['coef_value']
return pd.DataFrame(index=df_gf.index, columns=cols)
y = df_gf['invest']
x = add_constant(df_gf['value']) if intercept else df_gf['value']
model = RollingOLS(y, x, expanding=True, min_nobs=min_obs, window=wndw).fit()
parameters = model.params
params_shifted = model.params.shift(1)
mse = model.mse_resid
parameters['invest_hat'] = (parameters.mul(add_constant(df_gf['value']), axis=0)\
.sum(axis=1, min_count=1)).to_frame('invest_hat')
parameters['invest_hat_shift'] = (params_shifted.mul(add_constant(df_gf['value']), axis=0)\
.sum(axis=1, min_count=1)).to_frame('invest_hat_shift')
parameters['mse'] = mse
parameters['rmse'] = np.sqrt(mse)
parameters['nobs'] = model.nobs
parameters['ssr'] = model.ssr
parameters['t_const'] = model.tvalues['const']
parameters['t_value'] = model.tvalues['value']
parameters.rename(columns = {'const' : 'b0', 'value' : 'b1'}, inplace = True)
parameters['r2_adj'] = model.rsquared_adj
return parameters
grouped = df_grunfeld.groupby('firm')
df_params = grouped.apply(lambda x: invest_params(x, True))
df_grunfeld_output = df_grunfeld.join(df_params, rsuffix='_coef')