以下代码生成所示的图。
library(ggplot2)
library(dendextend)
library(cowplot)
set.seed(1234)
N<-10
set1 <- mvrnorm(n = N, c(0,0), matrix(c(0.5,0,0,0.5),2))
df <- data.frame(set1,label=1:N)
# ?dist
# dist method options: "euclidean", "maximum", "manhattan", "canberra", "binary" or "minkowski"
set1.dist <- dist(x=df[1:2],method = "euclidean")
fit1 <- hclust(d=set1.dist, method = "complete")
df$cluster <- cutree(fit1,k = 3)
p1 <- ggplot(df) +
geom_text(aes(x=X1,y=X2,label=label ,color=as.factor(cluster)))+
theme(legend.position = "none")
# p1
# ?hclust
# hclust method options "ward.D", "ward.D2", "single", "complete", "average" (= UPGMA), "mcquitty" (= WPGMA), "median" (= WPGMC) or "centroid" (= UPGMC).
p2 <- hclust(d=set1.dist, method = "complete") %>%
as.dendrogram() %>%
color_labels(k=3) %>%
set("branches_k_color", k = 3) %>%
as.ggdend() %>%
ggplot(horiz = T, theme = NULL) +
theme(axis.title.y = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank())
#plot(fit1,hang = -1, labels = df$label, main="Test",xlab = "")
#?rect.hclust()
plot_grid(p1,p2)
我想为两个图中的聚类(散布图和树状图)分配相同的颜色,但是我的尝试似乎都没有成功。我认为树状图中的聚类顺序有误或其他原因。
您的颜色模式的问题在于,对于两种绘图,它们的排序方式不同。在右图上,hclust
将根据其距离对每个id进行排序,在左图上,它们将按其标签ID进行排序。为了获得相同的顺序,您需要将hclust
的代码归因于您的数据帧。
您可以从order
对象的变量hclust
中找到此顺序:
> fit1$order
[1] 5 6 2 3 10 4 1 7 8 9
因此,您现在可以通过执行此操作(在定义其集群ID之后)在df中传递此命令:
fit1 <- hclust(d=set1.dist, method = "complete")
df$cluster <- cutree(fit1,k = 3)
df <- df[order(match(df$label, fit1$order)),]
X1 X2 label cluster
5 -0.67846476 0.3034370 5 2
6 0.07798362 0.3578356 6 2
2 0.70596583 0.1961721 2 2
3 0.54889439 0.7668157 3 2
10 -1.70825344 -0.6293518 10 3
4 -0.04557927 -1.6586588 4 1
1 0.33742619 -0.8535244 1 1
7 0.36133829 -0.4064025 7 1
8 0.64431246 -0.3865271 8 1
9 0.59196977 -0.3991278 9 1
现在绘制第一张图,您需要根据以下顺序将cluster
设置为因子和属性级别:
p1 <- df %>% mutate(cluster = factor(cluster, unique(cluster))) %>%
ggplot()+
geom_text(aes(x=X1,y=X2,label=label ,color=cluster))+
theme(legend.position = "none")
然后,第二个情节不变,您将最终得到:
它回答了您的问题吗?