import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
df = pd.read_csv("/Users/aahan/Desktop/diabetes_prediction_dataset.csv")
df_e = pd.get_dummies(df, columns = ["gender", "smoking_history"])
df_e.rename(columns={"smoking_history_No Info": "smoking_history_No_Info"}, inplace = True)
df_e.rename(columns={"smoking_history_not current": "smoking_history_not_current"}, inplace = True)
X=df_e.drop(["diabetes"], axis=1)
Y=df_e["diabetes"]
X_train, X_test, y_train, y_test = train_test_split(X,Y,train_size = 0.8, random_state=42)
model = LogisticRegression(max_iter = 2000)
model.fit(X_train, y_train)
print(model.predict([[87, 0, 1, 26, 6.9, 100, 1, 0, 0, 0, 0, 0, 0, 1, 0]]))
这是我的 X 变量的功能
names: 'age', 'hypertension', 'heart_disease', 'bmi', 'HbA1c_level',
'blood_glucose_level', 'gender_Female', 'gender_Male', 'gender_Other',
'smoking_history_No_Info', 'smoking_history_current',
'smoking_history_ever', 'smoking_history_former',
'smoking_history_never', 'smoking_history_not_current'
您应该检查类似的问题,例如:SKLearn 版本 1.0 中警告“有效功能名称”
您正在使用数据帧标题训练模型,但在数组上进行预测。
要使警告消失,您可以在数组上进行训练,例如
X=df_e.drop(["diabetes"], axis=1).values
或者在预测样本时传递标头