更新:np.arange(min(indp),max(indp),0.01) ValueError:具有多个元素的数组的真值不明确。使用 a.any() 或 a.all()

问题描述 投票:0回答:1

我有一个用于作物产量预测的随机森林回归器,具有与回归器相关的 5 个特征

['Precipitation'    ,'Min_Temp' ,'Cloud_Cover'  ,'Vapour_pressure'  ,'Area']
,对于给定的数据集,我的因变量是
Production
,此代码给出了以下错误

 x_grid = np.arange(min(indp),max(indp),0.01)
 ValueError :  The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

代码:

from numpy.core.fromnumeric import reshape
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
from sklearn.ensemble import RandomForestRegressor


df = pd.read_csv('dataset.csv')



X = df[['Precipitation' ,'Min_Temp' ,'Cloud_Cover'  ,'Vapour_pressure'  ,'Area']] 
Y = df['Production']
 

def Preprocessing(df,drop_indx_list):
    for indx in  drop_indx_list:
        df=df.drop(indx)
        df.reset_index()
    df.loc[df.Min_Temp>40,'Min_Temp']=df.Min_Temp.mean()
    return df

def DataVisualization(chart_list):
    colors = ['blue','red','green','black','yellow']
    c = 0
    for chrt in chart_list:
        t1,x,t2 = chrt.split()
        plt.scatter(df[t1],df[t2],color=colors[c])
        plt.title(chrt)
        plt.xlabel(t1)
        plt.ylabel(t2)
        plt.grid()
        plt.show()
        c+=1

drop_indx_list = [4]
df =  Preprocessing(df,drop_indx_list)
viusal_list = ['Precipitation Vs Production','Min_Temp Vs Production','Cloud_Cover Vs Production',
'Vapour_pressure  Vs Production','Area Vs Production']

x_train,x_test,y_train,y_test=train_test_split(X,Y, test_size=0.2, random_state=1)
reg=linear_model.LinearRegression()
reg.fit(x_train,y_train)
  




# prediction
y_pred=reg.predict(x_test)
for i in range(len(y_pred)):
    print('data point number : ',i," prediction : ",y_pred[i],'\n')

# Coefficients
print('\nCoefficients: ', reg.coef_,'\n')

# R-squared score
print('\nR-squared score: ', r2_score(y_test,y_pred),'\n')

# DataVisualization(viusal_list)

# Random forest
indp = df.iloc[:,1:6].values



dep = df.iloc[:,6].values

reg = RandomForestRegressor(n_estimators=10,random_state=0)
reg.fit(indp,dep)
y_pred_reg = reg.predict(indp) #fix1

x_grid = np.arange(min(indp),max(indp),0.01) # current error
x_grid = x_grid.reshape((len(x_grid),1))
plt.scatter(indp,dep,color = 'red')
plt.plot(x_grid,reg.predict(x_grid),color ='blue')
plt.title('Random forest regression')
plt.xlabel('X-axis')
plt.ylabel('Prodution')

数据集:

Dist,Precipitation,Min_Temp,Cloud_Cover,Vapour_pressure,Area,Production
Bidar,622.438,27.643,35.241,17.953,4709,9043
Bangalore,748.194,25.263,49.134,21.56,18790,20981
Belgaum,1334.194,21.254,39.728,22.5509,4398,6054
Bellary,574.325,26.407,38.466,20.008,3768,5903
Bengalore Rural,733.003,25.228000,47.620000,21.241000,140213,534214
Kolar,724.545,25.464,47.029,20.63,2278,2759
Dharwad,1623.548,26.148,38.267,23.652,8395,10986
Koppal,724.545,26.871005,41.039,19.992,3084,3952
Chikmagalur,1923.742,26.459,44.842,24.717,1650,2958
Chitradurga,674.17,25.214,41.364,20.82,3026,3325
Haveri,1473.343,25.817,41.292,23.168,10659,9865
Chamrajanagar,1334.754,25.089,50.77,23.079,3485,4120
Mandya,1477.249,24.567,49.54775,22.234,11349,18957
Mysore,2242.378,25.76766667,50.57941667,24.64266667,3462,4539
Raichur,450.113,27.42241667,35.76258333,18.93741667,4586,6145
Kodaku,1691.933,25.426,46.353,23.975,17856,15362
Hassan,2200.349,25.348,47.12,24.008,10487,7586
Devanagare,1060.343,25.509,40.929,22.042,2459,1865
Gulbarga,525.402,27.851,35.109,18.662,10487,7895

我尝试了这个,但没有帮助类似查询链接

编辑:reg.predict(indp) 修正而不是 [[6.5]]

但是现在

 np.arange(min(indp),max(indp),0.01)
行给了我另一个错误:具有多个元素的数组的真值不明确。使用 a.any() 或 a.all()

python pandas numpy regression random-forest
1个回答
0
投票

使用 - np.arange(np.min(indp),np.max(indp),0.1)

© www.soinside.com 2019 - 2024. All rights reserved.