机器学习Python:预测看不见的输入以获得最佳输出(任何算法都可以)

问题描述 投票:0回答:1

我用随机森林算法(你可以使用任何算法)编写了一个机器学习模型,并用我的数据集成功地训练了它,它准确地预测了我的测试集输入(x)输出(y)。到这里为止它工作正常。现在我想添加一个部分,它还可以预测 x 的新 ungiven 值,它认为 y 将是 maximum。所以我希望它基本上针对任何 x 值优化 y,而不需要给它输入 x。 如果需要的话,这是我的代码:(其中没有关于我想添加的部分,它只是训练和测试):



import numpy as np
import pandas as pd
from sklearn.metrics import r2_score
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestRegressor


excel_file_path = r'file_location'
df = pd.read_excel(excel_file_path)
df.columns=['Chordwise_Portion','Deformation','TSR', 'CP/CP_baseline']
df['CP/CP_baseline'] = pd.to_numeric(df['CP/CP_baseline'], errors='coerce')

training_set = df.iloc[0:350, 0:4]
test_set = df.iloc[350:649, 0:4]


def scale_dataset(dataframe):
  X = dataframe[dataframe.columns[:-1]].values
  y = dataframe[dataframe.columns[-1]].values
  scaler = StandardScaler()
  X = scaler.fit_transform(X)
  data = np.hstack((X, np.reshape(y, (-1, 1))))
  return data, X, y

train, X_train, y_train = scale_dataset(training_set)
test, X_test, y_test = scale_dataset(test_set)


#RF
rf_model = RandomForestRegressor(n_estimators=100, random_state=42)
rf_model.fit(X_train, y_train)
y_pred = rf_model.predict(X_test)


# Calculate R-squared
r2 = r2_score(y_test, y_pred)
r2 = r2*100
print()
print(f'R-squared : {r2} %')
print()


#optimum prediction
max_train = max(y_train)
index_max_train = np.where(y_train == np.max(y_train)) ###
max_pred = max(y_pred)
index_max_pred = [np.where(y_pred == max_pred)[0][0]+350]
data_pred = df.iloc[index_max_pred,0:3] ###

######
max_value = max(max_train, max_pred)
if max_value == max_train:
  max_data = df.iloc[index_max_train]
else:
  max_data = data_pred

print("The maximum OVERALL value of Cp/Cp_baseline is: ", max_value, " for the following conditions: ")
print(max_data)
print()
print("The maximum PREDICTED value of Cp/Cp_baseline is: ", max_pred, " for the following conditions: ")
print(data_pred)

# Plot the results
plt.scatter(X_test[:, 0], y_test, label='True data')
plt.scatter(X_test[:,0], y_pred, color='r', label='Predicted data')
plt.xlabel('Chordwise_portion')
plt.ylabel('Cp/Cp_baseline')
title = "Random forest Algorithm - R^2 = {:.4f} %".format(r2)

plt.title(title)
plt.legend()
plt.show()

python machine-learning optimization random-forest predict
1个回答
0
投票

为了使您的模型针对新的、未见过的 x 值预测 y 的最佳值,您可以利用经过训练的模型对一系列 x 值进行预测,然后确定与最大预测 y 值相对应的 x 值。您可以通过以下方式修改代码来实现此目的。

import numpy as np
import pandas as pd
from sklearn.metrics import r2_score
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestRegressor

excel_file_path = r'file_location'
df = pd.read_excel(excel_file_path)
df.columns = ['Chordwise_Portion', 'Deformation', 'TSR', 'CP/CP_baseline']
df['CP/CP_baseline'] = pd.to_numeric(df['CP/CP_baseline'], errors='coerce')

# Split data into training and testing sets
training_set = df.iloc[0:350, 0:4]
test_set = df.iloc[350:649, 0:4]

# Define function to scale dataset
def scale_dataset(dataframe):
    X = dataframe[dataframe.columns[:-1]].values
    y = dataframe[dataframe.columns[-1]].values
    scaler = StandardScaler()
    X = scaler.fit_transform(X)
    data = np.hstack((X, np.reshape(y, (-1, 1))))
    return data, X, y

# Scale training and testing datasets
train, X_train, y_train = scale_dataset(training_set)
test, X_test, y_test = scale_dataset(test_set)

# Train RandomForestRegressor
rf_model = RandomForestRegressor(n_estimators=100, random_state=42)
rf_model.fit(X_train, y_train)
y_pred = rf_model.predict(X_test)

# Calculate R-squared
r2 = r2_score(y_test, y_pred)
r2 = r2 * 100
print(f'R-squared: {r2}%')

# Predict maximum y value for new x
def predict_max_y(model, df):
    X_new = df.iloc[:, :-1]  # Input features
    max_y_pred = model.predict(X_new)  # Predict CP/CP_baseline for all inputs
    max_index = np.argmax(max_y_pred)  # Index of maximum predicted value
    max_data = df.iloc[max_index]  # Corresponding input conditions
    return max_y_pred[max_index], max_data

max_pred, max_data = predict_max_y(rf_model, df)
print("The maximum predicted value of Cp/Cp_baseline is:", max_pred)
print("For the following conditions:")
print(max_data)

# Plot the results
plt.scatter(X_test[:, 0], y_test, label='True data')
plt.scatter(X_test[:, 0], y_pred, color='r', label='Predicted data')
plt.xlabel('Chordwise_portion')
plt.ylabel('Cp/Cp_baseline')
title = "Random forest Algorithm - R^2 = {:.4f} %".format(r2)
plt.title(title)
plt.legend()
plt.show()
© www.soinside.com 2019 - 2024. All rights reserved.