我的随机森林模型为什么/如何获得 -10 的 R 平方?

问题描述 投票:0回答:0

这是我的数据头:

logger  year    avg_max_temp    avg_min_temp    tot_precipitation   yield
0   072.txt 1985-01-01  15.37   4.33    77.43   225447
1   187.txt 1985-01-01  19.24   7.88    146.40  225447
2   338.txt 1985-01-01  14.43   2.97    95.16   225447
3   280.txt 1985-01-01  16.98   6.51    114.02  225447
4   436.txt 1985-01-01  17.13   6.78    124.63  225447
... ... ... ... ... ... ...
4786    552.txt 2014-01-01  13.60   3.29    88.02   361091
4787    769.txt 2014-01-01  15.17   2.11    89.00   361091
4788    822.txt 2014-01-01  13.49   2.37    82.22   361091
4789    830.txt 2014-01-01  17.09   4.31    84.66   361091
4790    312.txt 2014-01-01  14.70   2.88    99.43   361091

我的PI刚刚让我用建模的方式考察target(yield)和三个数值特征之间的关系。请注意,每年只有一个产量值,但每年有 167 个气象站的天气观测值。我将其视为时间序列分析并这样做:

df['year'] = pd.to_datetime(df['year'], format='%Y')
df = df.set_index('year')

#Set aside an 8 year testing section
train = df.loc[df.index < '2006-01-01']
test = df.loc[df.index >= '2006-01-01']


#Create training and testing features
features = ['avg_max_temp', 'avg_min_temp', 'tot_precipitation']
target = 'yield'

X_train = train[features]
y_train = train[target]

X_test = test[features]
y_test = test[target]

# Create and score model

rf = RandomForestRegressor()
rf.fit(X_train, y_train)
rf.score(X_test, y_test)

不幸的是,这给出了结果:-10.55。我相信 Random Forest 在 SkLearn 中的分数是 R-Squared,所以这里肯定出了什么问题。任何关于出了什么问题的想法都将不胜感激。提前致谢。

python machine-learning scikit-learn data-science random-forest
© www.soinside.com 2019 - 2024. All rights reserved.