我确信 R 中有一种方法可以做到这一点。我想根据
humidity_catog
列中的值创建一个新列 humidity
。具体来说,我想将湿度级别分类如下:
humidity
大于90
,则应归类为“>90%”。humidity
大于 80
和 90
,则应归类为 >80 & <=100%
。我可以成功地将值分类为“>90%”,但对于 >80 和 <=100% category, R filters values between 80 and 90 because values greater than 90 have already been categorized as ">90%。”
我想要 >90% 和 >80 & <=100% in a single column
humidity_catog
,以便我可以在一张图表中将它们一起可视化。目前,我的值在 80-90% 之间,但我想将 80-90% 更改为 >80 & <=100% to include both categories in the same graph for comparison.
df <- df %>% dplyr::mutate(humidity_catog = as.factor(
case_when(humidity > 90 ~ ">90%",
humidity > 80 & humidity <=100 ~ ">80 & <=100%",
TRUE ~ "<80%")))
可重现的示例:
df <- structure(list(humidity = c(95.1, 95.2, 95.3, 95.3, 95.3, 95.4,
95.6, 95.3, 95, 94.3, 92.6, 92.6, 91.9, 92.9, 93.4, 93.2, 94.1,
95.5, 94.2, 90.4, 100, 100, 100, 100, 100, 100, 100, 100, 100,
100, 97.6, 97.9, 98, 98.3, 98.6, 98.6, 98.1, 97.2, 97.6, 97.8,
84.2, 92.1, 95.9, 97.5, 98.4, 98.9, 99.5, 100, 100, 100, 88.3,
88.7, 88.8, 88.9, 88.7, 88.6, 88.7, 88.4, 88, 87, 84.2, 84.6,
84.8, 84.8, 85.2, 85.7, 85.7, 86.3, 87.1, 87.8, 94.4, 94.3, 94.2,
94.2, 94.2, 94.2, 94.2, 94.2, 94.2, 94.3, 92.8, 93.2, 93.4, 93.5,
93.7, 94.3, 94.3, 94.4, 94.4, 94.3), humidity_catog = c(">90%",
">90%", ">90%", ">90%", ">90%", ">90%", ">90%", ">90%", ">90%",
">90%", ">90%", ">90%", ">90%", ">90%", ">90%", ">90%", ">90%",
">90%", ">90%", ">90%", ">90%", ">90%", ">90%", ">90%", ">90%",
">90%", ">90%", ">90%", ">90%", ">90%", ">90%", ">90%", ">90%",
">90%", ">90%", ">90%", ">90%", ">90%", ">90%", ">90%", ">80",
">90%", ">90%", ">90%", ">90%", ">90%", ">90%", ">90%", ">90%",
">90%", ">80", ">80", ">80", ">80", ">80", ">80", ">80", ">80",
">80", ">80", ">80", ">80", ">80", ">80", ">80", ">80", ">80",
">80", ">80", ">80", ">90%", ">90%", ">90%", ">90%", ">90%",
">90%", ">90%", ">90%", ">90%", ">90%", ">90%", ">90%", ">90%",
">90%", ">90%", ">90%", ">90%", ">90%", ">90%", ">90%")), row.names = c(NA,
-90L), class = c("tbl_df", "tbl", "data.frame"))
数据一览
要使点属于多个组(以便它们被多次绘制),您可以使用组分配连接到表。
在这里,我制作了一个包含分类的表格(注意它们重叠),并加入其中。这使得 90-100 分多次出现,因为他们分在两组中。我使用
geom_jitter
和一些透明度来使其更清晰 - 否则我们可能只会看到最后出现的点的版本(因此它们是最后绘制的)。
(lookup_tbl <- data.frame(low = c(90, 80),
high = c(Inf, 100),
cat = c(">90", "80-100")))
low high cat
1 90 Inf >90
2 80 100 80-100
library(dplyr); library(ggplot2)
df %>%
mutate(day = row_number()) %>%
left_join(lookup_tbl, join_by(humidity >= low, humidity <= high)) %>%
ggplot(aes(day, humidity, color = cat)) +
geom_jitter(alpha = 0.5)