我试图找出如何让我的集群的SD获得我的k-means聚类分析。我做了k-means并获得了几个输出,其中一个是“中心”,我认为是手段。我需要所有这些中心的标准偏差来呈现我的数据,我不知道,如何获得它们?
#kmeans
resultspoorT0t <- kmeans(poor_T0v, 3)
resultspoorT0t[["centers"]]
ALH BCF LIN VAP VCL VSL
1 5.130483 12.66909 40.14618 69.78680 146.97313 55.51221
2 3.098673 10.11618 34.38605 29.20927 69.74657 22.70321
3 7.212529 12.98836 41.71680 111.67745 229.73901 92.12502
我尝试了简单的sd()
function,但这使得一个SD,我需要SD用于每个群集的每个参数
#SD
sd(resultspoorT0t$cluster, na.rm = FALSE)
[1] 0.758434
我们假设你想要一个简单的循环SD。因此,您需要计算从群集到该群集中心的每个点的距离。它是欧几里德距离sqrt(sum((x_mean - x)** 2 +(y_mean - y)** 2 ...))。然后,您可以计算每个群集的距离SD。代码是:
# Some fake data
set.seed(2222)
df <- matrix(rnorm(6 * 50), 50)
colnames(df) <- letters[1:6]
df <- as.data.frame(df)
k_res <- kmeans(df, 3)
# SD = sd of points distances from cluster center
clusters <- k_res$cluster
centers <- k_res$centers
res_sd <- NULL
for (cl in c(unique(clusters))){
df_part <- df[clusters == cl, ]
# Calculate Euclidian distance between
# each point (row) and cluster center.
dist <- sqrt(rowSums((df_part - centers[cl, ]) ** 2))
# Calculate SD for each column (i.e. SD along each axis)
sd_s <- apply(df_part - centers[cl, ], 2, sd)
names(sd_s) <- paste("sd_", colnames(df_part), sep = "")
res_part <- c(cluster = cl, total_sd = sd(dist), sd_s)
res_sd <- rbind(res_sd, res_part)
}
res_sd <- as.data.frame(res_sd)
rownames(res_sd) <- res_sd$cluster