将 K 均值散点图与地面实况散点图进行比较

Question

我有一个包含 x 和 y 变量的数据集。还有一个 z 列，指定 x 和 y 属于哪个组。有11组。我使用 K 均值聚类来创建一台机器，它将 x 和 y 变量分类到正确的组中。然后，我将这些 x 和 y 变量绘制到散点图上，并且 K 均值将它们分类为 11 种独特颜色之一。我现在想将其与真实情况（在本例中是原始 x 和 y 变量）进行比较。我希望它在第三个散点图上表示，该散点图将以红色突出显示 K 均值生成的数据点，这些数据点与真实情况不相符，而与真实情况相符的数据点则以绿色突出显示。

我该如何编码？

import matplotlib.pyplot as plt
import numpy as np
from sklearn.cluster import KMeans  
import pandas as pd

df = pd.read_csv("FILENAME")
 print(df)

x = df['height_mean']
y = df['weight_mean']

points = df[['height_mean', 'weight_mean']].values

#np.array([Values here])

# Number of clusters
n_clusters = 11

# Fit the KMeans model
kmeans = KMeans(n_clusters=n_clusters)
kmeans.fit(points)

# Get cluster assignments
labels = kmeans.labels_

# Get cluster centers
centers = kmeans.cluster_centers_

# Plot the clusters
plt.scatter(x, y, c=labels, cmap='viridis')
plt.scatter(centers[:, 0], centers[:, 1], c='red', marker='x', s=100)  # Plot cluster centers    as red X's
plt.xlabel('height')
plt.ylabel('weight')
plt.title("K-Means Clustering")

plt.show()
print(df)`

这是 k 均值散点图。我如何修改它以将其与地面实况散点图进行比较？

Answer 1

要将 K 均值聚类分配与基本事实（在“z”列中指定）进行比较，您可以创建一个新的散点图，其中根据 K 均值标签是否与基本事实匹配对点进行着色。这是基于您提供的代码的代码示例：

import matplotlib.pyplot as plt
import numpy as np
from sklearn.cluster import KMeans  
import pandas as pd

# Read data
df = pd.read_csv("FILENAME")
print(df)

# Extract data
x = df['height_mean']
y = df['weight_mean']
z = df['group_column_name']  # Replace 'group_column_name' with the actual name of the column that specifies the group

points = df[['height_mean', 'weight_mean']].values

# Number of clusters
n_clusters = 11

# Fit the KMeans model
kmeans = KMeans(n_clusters=n_clusters)
kmeans.fit(points)

# Get cluster assignments
labels = kmeans.labels_

# Create an array to store colors for each point
colors = np.zeros(labels.shape, dtype=str)

# Compare with ground truth
for i in range(len(labels)):
    if labels[i] == z[i]:
        colors[i] = 'g'  # Green for a match
    else:
        colors[i] = 'r'  # Red for a mismatch

# Plot the clusters
plt.scatter(x, y, c=labels, cmap='viridis', alpha=0.5)
plt.scatter(x, y, c=colors, alpha=0.5)  # Overplot points with the color array to show matches/mismatches
plt.xlabel('height')
plt.ylabel('weight')
plt.title("K-Means Clustering vs Ground Truth")

plt.show()

```
z
```
数组代表每个点的基本事实。
我们创建一个
```
colors
```
数组，用于存储每个点的颜色（“g”表示绿色，“r”表示红色）。
然后我们循环遍历 K 均值标签和真实值标签以填充
```
colors
```
数组。如果 K 均值标签与真实值匹配，则点将被着色为绿色，否则为红色。
最后，我们用
```
colors
```
数组中指定的颜色绘制点。

将 K 均值散点图与地面实况散点图进行比较

问题描述投票：0回答：1

1个回答

最新问题

将 K 均值散点图与地面实况散点图进行比较

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1