Pandas - 如何在数据帧的每个组中执行OLS回归值与时间的关系?

问题描述 投票:1回答:1

我在表格的数据框中有每小时读数:

Date_Time             Temp           
2001-01-01 00:00:00  -1.3
2001-01-01 01:00:00  -2.1
2001-01-01 02:00:00  -1.9
2001-01-01 03:00:00  -2.2
2001-01-01 04:00:00  -2.8
2001-01-01 05:00:00  -2.0
2001-01-01 06:00:00  -2.2

我想将读数分组N小时(即3),并确定每组的温度与时间的OLS斜率。

我知道如何对数据帧进行分组:

df_g = df_g.assign(tgp = df['Temp'].groupby(pds.Grouper(freq='3h')) )

但在那之后我被卡住了,我无法弄清楚从哪里开始。有人可以帮我实现目标吗?

python pandas pandas-groupby least-squares
1个回答
1
投票

简单(单变量)OLS回归的beta只是cov(x,y)/ var(x)

考虑到这一点:

# Generate Test data
df = pd.DataFrame(np.random.rand(50), 
                  index=pd.date_range(start='2018 1 1', periods=50, freq='15T'), 
                  columns=['Temp'])
# Copy index as a part of data set
df['DateTime'] = df.index

# Choose starting point as reference date (It doesnt matter what date it is) 
# I'm just looking to convert the dates to numbers
rederence_dt = df['DateTime'].iloc[0] 
df['DateTime'] = (rederence_dt - df['DateTime']).dt.seconds

var = df.groupby(pd.Grouper(freq='3h')).var()['DateTime']
cov = df.groupby(pd.Grouper(freq='3h')).corr().loc(axis=0)[:, 'Temp']['DateTime'].reset_index(level=1, drop=True)

beta = cov/var
© www.soinside.com 2019 - 2024. All rights reserved.