使用 ggplot 绘制具有相似度分数的上三角相关矩阵

问题描述 投票:0回答:1

我有一个数据框,如下所示: enter image description here

该表仅包含矩阵上三角的值。 我想绘制一个相关图(相关图),其中颜色根据相似度分数显示相关性和大小。我尝试使用 ggplot2 进行绘图:

ggplot(df, aes(x = var1, y = var2, color = cor, size = jaccard)) +
geom_point(aes(), alpha = 0.7) +
scale_size_continuous(name = "Correlation") +
scale_color_continuous(name = "Jaccard Index", low = "blue", high = "red")

当我使用上面的代码时,它会绘制整个矩阵并且绘图是分散的(示例如下)。 enter image description here

我想制作一个整洁的图,其中的值仅显示上三角形。 如何在 R 中执行此操作?

r ggplot2 correlation similarity
1个回答
0
投票

我已经使用“mtcars”数据集模拟了您的问题。请参阅下面的代码。

install.packages(c("tidyverse", "foreach"))
data(mtcars)

colnames(data) <- paste0("var", 1:length(mtcars)) # rename column names as var1, ..., var11 (A, ..., L in pictured data frame)

newdata <- data.frame(column_one = rep(colnames(data)[1:length(data)-1], times = seq(from = length(data)-1, to = 1, by = -1))) # create column 1 of the dataset (var1 in the pictured data frame)

library(foreach)
newdata$column_two <- foreach(i = 2:length(data), .combine="c") %do% {rep(colnames(data)[i:length(data)], each=1)} # create column 2 of the dataset (var2 in the pictured data frame)

newdata$column_three <- foreach(i = newdata$column_one, j = newdata$column_two, .combine = "c") %do% {
  cor(data[[i]], data[[j]])
} # calculate correlations and create column 3 of the dataset (correlation in the pictured data frame)

newdata$column_four <- runif(55,0,1) # dummy values to simulate jaccard index (jaccard in the pictured data frame)

lapply(newdata, class) # column_one, column_two should be character vectors
# column_three, column_four should be numeric vectors

# if the outcome of lapply() is otherwise, run the below four lines
newdata$column_one <- as.character(newdata$column_one)
newdata$column_two <- as.character(newdata$column_two)
newdata$column_three <- as.numeric(newdata$column_three)
newdata$column_four <- as.numeric(newdata$column_four)

newdata$column_one <- factor(newdata$column_one, levels = c("var1", "var2", "var3", "var4", "var5", "var6", "var7", "var8", "var9", "var10", "var11")) # convert column one into factor with the desired order of the levels specified
newdata$column_two <- factor(newdata$column_two, levels = c("var2", "var3", "var4", "var5", "var6", "var7", "var8", "var9", "var10", "var11")) # convert column two into factor with the desired order of the levels specified

library(tidyverse)
ggplot(newdata, aes(x = column_one, y = column_two, color = column_three, size = column_four)) +
  geom_point(aes(), alpha = 0.7) +
  scale_size_continuous(name = "Jaccard Index") +
  scale_color_continuous(name = "Correlation", low = "blue", high = "red")

最终的气泡图将如下所示: final bubble chart

© www.soinside.com 2019 - 2024. All rights reserved.