ValueError:输入包含 NaN、无穷大或对于 dtype('float64') 来说太大的值。 K 表示聚类

问题描述 投票:0回答:0

`我正在做一个 k 均值聚类项目,当我对分类值进行 onehotencoder 并对数值应用标准缩放器时,我收到错误。

ValueError: Input contains NaN, infinity or a value too large for dtype('flo
64')。

数据干净,没有空值,没有大值,删除了离群值,没有缺失值

我该如何纠正这个问题?

我的代码如下:

# Columns to be one-hot encoded
columns_to_onehot = ['gender', 'category', 'payment_method', ]

# Columns to be scaled
columns_to_scale = ['age', 'quantity', 'price', 'total_amount']
# One Hot Encoding
encoder = OneHotEncoder(drop='first', sparse=False) # 'drop' parameter is set to 'first' to avoid multicollinearity

#encoder = LabelEncoder()
one_hot_encoded_columns = encoder.fit_transform(subset_df1[columns_to_onehot])
#getting the column names
column_names = encoder.get_feature_names(input_features=columns_to_onehot)

df_encoded = pd.concat([subset_df1.drop(columns_to_onehot, axis=1),
                       pd.DataFrame(one_hot_encoded_columns, columns=column_names)],
                       axis=1)

# Standard Scaling


scaler = StandardScaler()
df_encoded[columns_to_scale] = scaler.fit_transform(df_encoded[columns_to_scale])

#Finding the optimal K with Elbow Method and Silhouette score

Sum_of_squared_distances = []
silhouette_avg = []

K = range(1,10)
for k in K:
    model = KMeans(n_clusters=k, random_state=0)
    model.fit(df_encoded)
    Sum_of_squared_distances.append(model.inertia_)
    
    if k>1:
        silhouette_avg.append(silhouette_score(df_encoded, model.labels_ ,metric='euclidean'))
       
    else:
        pass

enter image description here

enter image description here

nan valueerror infinity
© www.soinside.com 2019 - 2024. All rights reserved.