ID Ever_Married Graduated Gender Profession Spending_Score Segmentation Family_Size Age Work_Experience
0 462809 0 0 1 5 2 3 3 4 1
1 462643 1 1 0 2 0 0 2 18 15
2 466315 1 1 0 2 2 1 0 44 1
3 461735 1 1 1 7 1 1 1 44 0
4 462669 1 1 0 3 1 0 5 20 15
... ... ... ... ... ... ... ... ... ... ...
8063 464018 0 0 1 9 2 3 6 4 0
8064 464685 0 0 1 4 2 3 3 15 3
8065 465406 0 1 0 5 2 3 0 14 1
8066 467299 0 1 0 5 2 1 3 8 1
8067 461879 1 1 1 4 0 1 2 17 0
8068 rows × 10 columns
data1=data.drop(["ID","Segmentation"],axis=1)
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(data1,data.Segmentation,test_size=0.20,random_state=50)
from sklearn.neighbors import KNeighborsClassifier
knn=KNeighborsClassifier(n_neighbors=17)
knn.fit(x_train,y_train)
tahmin=knn.predict(x_test)
knn.score(x_test,y_test)
#0.4838909541511772
knn.predict([[1,1,0,2,0,2,18,15]])
UserWarning: X does not have valid feature names, but KNeighborsClassifier was fitted with feature names
#warnings.warn(
array([1])
当我做出预测时,我没有预料到这个警告。
出现此警告是因为训练数据和测试数据中的特征名称不匹配。当您训练 KNN 模型 (knn.fit(x_train, y_train)) 时,x_train DataFrame 可能具有列名称(例如,“Ever_Married”、“Graduated”等),但是当您进行预测时 (knn.predict([ [1,1,0,2,0,2,18,15]])),要预测的输入数据没有这些名称。
prediction_data = pd.DataFrame([[1, 1, 0, 2, 0, 2, 18, 15]], columns=['provide list of columns here'])
prediction = knn.predict(prediction_data)
print(prediction)