在Python中进行时间序列分析时遇到问题

问题描述 投票:0回答:1

我希望进行事件研究分析,但我似乎无法正确构建以时间为自变量的简单预测模式。我一直使用this作为指导。

import pandas as pd
import numpy as np
import statsmodels.formula.api as smf
import seaborn as sns
import matplotlib.pyplot as plt

#sample data
units = [0.916301354, 0.483947819, 0.551258976, 0.147971439, 0.617461504, 0.957460424, 0.905076453, 0.274261518, 0.861609383, 0.285914819, 0.989686616, 0.86614591, 0.074250832, 0.209507105, 0.082518752, 0.215795111, 0.953852132, 0.768329343, 0.380686392, 0.623940323, 0.155944248, 0.495745862, 0.0845513, 0.519966471, 0.706618333, 0.872300766, 0.70769554, 0.760616731, 0.213847926, 0.703866155, 0.802862491, 0.52468101, 0.352283626, 0.128962646, 0.684358794, 0.360520106, 0.889978575, 0.035806225, 0.15459103, 0.227742501, 0.06248614, 0.903500165, 0.13851151, 0.664684486, 0.011042697, 0.86353796, 0.971852899, 0.487774978, 0.547767217, 0.153629408, 0.076994094, 0.230693561, 0.961345948]
begin_date = '2022-8-01'
df = pd.DataFrame({'date':pd.date_range(begin_date, periods=len(units)),'units':units})


# Create estimation data set
est_data = df['2022-08-01':'2022-08-30']

# And observation data
obs_data = df['2022-09-01':'2022-09-14']

# Estimate a model predicting stock price with market return
m = smf.ols('variable ~ date', data = est_data).fit()

# Get AR
# Using mean of estimation return
var_return = np.mean(est_data['variable'])
obs_data['AR_mean'] = obs_data['variable'] - var_return

# Then using model fit with estimation data
obs_data['risk_pred'] = m.predict()

obs_data['AR_risk'] = obs_data['variable'] - obs_data['risk_pred']

# Graph the results
sns.lineplot(x = obs_data['date'],y = obs_data['AR_risk'])
plt.show()

照原样,它不会将日期识别为变量(附图)

我尝试将索引作为计数器,只是将日期作为一个单独的变量,但是当它到达“预测”部分时,它不明白如何预测它以前没有见过的日期.

python indexing time-series regression predict
1个回答
0
投票

您的代码中有很多错误。下面我会一一解释(查看''' '''之间的注释):

'''
small note, here you defined the variable as units and below you want to use a column called "variable".
Not a big problem, most probably you were reading the data from a file anyway, just something to keep in mind
'''
df = pd.DataFrame({'date':pd.date_range(begin_date, periods=len(units)),'units':units})
'''
The following two lines do not work like that. 
First, the dataframe is not indexed by a datetime
Second, to reference the index you need to use .iloc. Alternatively you can use .loc
'''
# Create estimation data set
est_data = df['2022-08-01':'2022-08-30'] 
# And observation data
obs_data = df['2022-09-01':'2022-09-14']
'''
Here you are fitting according to est_data.
using the m.predict() function will give you the fitted points of est_data.
This will be important later
'''
# Estimate a model predicting stock price with market return
m = smf.ols('variable ~ date', data = est_data).fit()
# Get AR
# Using mean of estimation return
'''
you don't need np.mean for this, just use est_data['variable'].mean()
Also it is most probably not needed to have the mean in your script.
You can directly subtract using obs_data['variable'] - est_data['variable'].mean()
'''
var_return = np.mean(est_data['variable'])
obs_data['AR_mean'] = obs_data['variable'] - var_return
'''
This will not always work, and in this case it does not.
m.predict() returns the predictions based on the data in est_data. The same number of points will be outputed
In order for this to work, obs_data needs to have the same number of points as est_data
'''
obs_data['risk_pred'] = m.predict()
obs_data['AR_risk'] = obs_data['variable'] - obs_data['risk_pred']

我目前正在修复错误,很快就会给你一个工作示例。为此,请给我留下以下问题的答案:

  • 您真的想根据
    est_data
    来拟合模型吗?如果是这样,你将如何将其与
    obs_data
    结合起来?
© www.soinside.com 2019 - 2024. All rights reserved.