我正在尝试进行组合分析,以图表形式显示结果。我有一个包含 9 列的数据框,如果样本中不存在某个值,则每列都包含不同的百分比或 NA。
我为此使用的示例代码可以在这里找到:https://epirhandbook.com/en/combinations-analysis.html
问题是一行中的 1 会变成 0,反之亦然。线路是:
data <- data %>%
mutate(across(all_of(columns), ~ as.integer(. %in% c("yes", NA))))
我使用的完整代码是:
library(tidyverse)
library(UpSetR)
library(ggupset)
data <- META_new[c("lengthpergram","countpergram","acrylrel",
"cottonrel","polyestrel","polyamiderel",
"elastaanrel","lyocellrel","viscoserel",
"nylonrel","wolrel")]
columns <- c("acrylrel", "cottonrel", "polyestrel", "polyamiderel",
"elastaanrel", "lyocellrel", "viscoserel", "nylonrel", "wolrel")
for (col in columns) {
data[[col]][data[[col]] > 0] <- "yes"
data[[col]][data[[col]] == 0] <- NA
}
data <- data %>%
mutate(acrylrel = ifelse(acrylrel == "yes", 1, 0),
cottonrel = ifelse(cottonrel == "yes", 1, 0),
polyestrel = ifelse(polyestrel == "yes", 1, 0),
polyamiderel = ifelse(polyamiderel == "yes", 1, 0),
elastaanrel = ifelse(elastaanrel == "yes", 1, 0),
lyocellrel = ifelse(lyocellrel == "yes", 1, 0),
viscoserel = ifelse(viscoserel == "yes", 1, 0),
nylonrel = ifelse(nylonrel == "yes", 1, 0),
wolrel = ifelse(wolrel== "yes", 1, 0),)
data <- data %>%
mutate(across(all_of(columns), ~ as.integer(. %in% c("yes", NA))))
data %>%
UpSetR::upset(
sets = columns,
order.by = "freq",
sets.bar.color = c("red", "orange", "yellow", "green", "cyan", "blue", "purple", "pink", "salmon"),
empty.intersections = "on",
number.angles = 0,
point.size = 2,
line.size = 1,
mainbar.y.label = "Fabric combinations by frequency",
sets.x.label = "Types of fabric present in samples")
代码给出了很好的情节。但它为值分配了错误的列名。例如,聚酯纤维应该是最常见的组合,但分配了 lyocellrel,即使 lyocellrel 是最不常见的。
不幸的是,我无法添加 df,因为它太大了,但我希望有人对如何解决此问题提出建议(如果这一行甚至是问题)。
我更改了网站原有的一些代码,原文:
mutate(across(c(fever, chills, cough, aches, vomit), .fns = ~+(.x == "yes")))
因为当我尝试时,我得到了这个错误:
Error in start_col:end_col : argument of length 0
前5行
data <- data <- data.frame(
acrylrel = c(0.00000, 0.00000, 0.00000, 36.61972, 0.00000),
cottonrel = c(9.089974, 65.000000, 0.000000, 19.014085, 8.500000),
polyestrel = c(83.72237, 35.00000, 42.81081, 44.36620, 15.00000),
polyamiderel = c(5.583548, 0.000000, 53.594595, 0.000000, 40.000000),
elastaanrel = c(1.604113, 0.000000, 3.594595, 0.000000, 1.500000),
lyocellrel = c(0, 0, 0, 0, 0),
viscoserel = c(0, 0, 0, 0, 0),
nylonrel = c(0, 0, 0, 0, 0),
wolrel = c(0, 0, 0, 0, 0)
)
这似乎就是您想要的:
data %>%
mutate(across(everything(), ~ as.integer(. > 0))) %>%
UpSetR::upset(
sets = columns,
order.by = "freq",
sets.bar.color = c("red", "orange", "yellow", "green", "cyan", "blue", "purple", "pink", "salmon"),
empty.intersections = "on",
number.angles = 0,
point.size = 2,
line.size = 1,
mainbar.y.label = "Fabric combinations by frequency",
sets.x.label = "Types of fabric present in samples")
逐部分浏览您的代码:
# this turns every value into "yes" if positive, or NA if 0
for (col in columns) {
data[[col]][data[[col]] > 0] <- "yes"
data[[col]][data[[col]] == 0] <- NA
}
# this is the same as above, but all of the "yes" values have been turned into 1s. Note that (frustratingly!) NA == "yes" is NA, not FALSE, as you would think. The way to check for NA values is with the function is.na()
data %>%
mutate(acrylrel = ifelse(acrylrel == "yes", 1, 0),
cottonrel = ifelse(cottonrel == "yes", 1, 0),
polyestrel = ifelse(polyestrel == "yes", 1, 0),
polyamiderel = ifelse(polyamiderel == "yes", 1, 0),
elastaanrel = ifelse(elastaanrel == "yes", 1, 0),
lyocellrel = ifelse(lyocellrel == "yes", 1, 0),
viscoserel = ifelse(viscoserel == "yes", 1, 0),
nylonrel = ifelse(nylonrel == "yes", 1, 0),
wolrel = ifelse(wolrel== "yes", 1, 0),)
# with this line, because you've already turned the "yes" values into 1s, `. %in% c("yes", NA)` evaluates to FALSE for the 1s and TRUE for the NA values (oddly this works)
data <- data %>%
mutate(across(all_of(columns), ~ as.integer(. %in% c("yes", NA))))