根据现有列中的条件创建新列

问题描述 投票:0回答:1

我确信 R 中有一种方法可以做到这一点。我想根据

humidity_catog
列中的值创建一个新列
humidity
。具体来说,我想将湿度级别分类如下:

  1. 如果
    humidity
    大于
    90
    ,则应归类为“>90%”。
  2. 如果
    humidity
    大于
    80
    90
    ,则应归类为
    >80 & <=100%

我可以成功地将值分类为“>90%”,但对于 >80 和 <=100% category, R filters values between 80 and 90 because values greater than 90 have already been categorized as ">90%。”

我想要 >90% 和 >80 & <=100% in a single column

humidity_catog
,以便我可以在一张图表中将它们一起可视化。目前,我的值在 80-90% 之间,但我想将 80-90% 更改为 >80 & <=100% to include both categories in the same graph for comparison.

我尝试了以下代码:

df <- df %>% dplyr::mutate(humidity_catog = as.factor(
    case_when(humidity > 90  ~ ">90%",
              humidity > 80 & humidity <=100 ~ ">80 & <=100%",
              TRUE ~ "<80%")))

可重现的示例:

df <- structure(list(humidity = c(95.1, 95.2, 95.3, 95.3, 95.3, 95.4, 
                                  95.6, 95.3, 95, 94.3, 92.6, 92.6, 91.9, 92.9, 93.4, 93.2, 94.1, 
                                  95.5, 94.2, 90.4, 100, 100, 100, 100, 100, 100, 100, 100, 100, 
                                  100, 97.6, 97.9, 98, 98.3, 98.6, 98.6, 98.1, 97.2, 97.6, 97.8, 
                                  84.2, 92.1, 95.9, 97.5, 98.4, 98.9, 99.5, 100, 100, 100, 88.3, 
                                  88.7, 88.8, 88.9, 88.7, 88.6, 88.7, 88.4, 88, 87, 84.2, 84.6, 
                                  84.8, 84.8, 85.2, 85.7, 85.7, 86.3, 87.1, 87.8, 94.4, 94.3, 94.2, 
                                  94.2, 94.2, 94.2, 94.2, 94.2, 94.2, 94.3, 92.8, 93.2, 93.4, 93.5, 
                                  93.7, 94.3, 94.3, 94.4, 94.4, 94.3), humidity_catog = c(">90%", 
                                                                                          ">90%", ">90%", ">90%", ">90%", ">90%", ">90%", ">90%", ">90%", 
                                                                                          ">90%", ">90%", ">90%", ">90%", ">90%", ">90%", ">90%", ">90%", 
                                                                                          ">90%", ">90%", ">90%", ">90%", ">90%", ">90%", ">90%", ">90%", 
                                                                                          ">90%", ">90%", ">90%", ">90%", ">90%", ">90%", ">90%", ">90%", 
                                                                                          ">90%", ">90%", ">90%", ">90%", ">90%", ">90%", ">90%", ">80", 
                                                                                          ">90%", ">90%", ">90%", ">90%", ">90%", ">90%", ">90%", ">90%", 
                                                                                          ">90%", ">80", ">80", ">80", ">80", ">80", ">80", ">80", ">80", 
                                                                                          ">80", ">80", ">80", ">80", ">80", ">80", ">80", ">80", ">80", 
                                                                                          ">80", ">80", ">80", ">90%", ">90%", ">90%", ">90%", ">90%", 
                                                                                          ">90%", ">90%", ">90%", ">90%", ">90%", ">90%", ">90%", ">90%", 
                                                                                          ">90%", ">90%", ">90%", ">90%", ">90%", ">90%", ">90%")), row.names = c(NA, 
                                                                                                                                                                  -90L), class = c("tbl_df", "tbl", "data.frame"))

数据一览

r dataframe dplyr data-manipulation
1个回答
0
投票

要使点属于多个组(以便它们被多次绘制),您可以使用组分配连接到表。

在这里,我制作了一个包含分类的表格(注意它们重叠),并加入其中。这使得 90-100 分多次出现,因为他们分在两组中。我使用

geom_jitter
和一些透明度来使其更清晰 - 否则我们可能只会看到最后出现的点的版本(因此它们是最后绘制的)。

(lookup_tbl <- data.frame(low = c(90, 80),
          high = c(Inf, 100),
          cat = c(">90", "80-100")))

  low high    cat
1  90  Inf    >90
2  80  100 80-100


library(dplyr); library(ggplot2)
df %>% 
  mutate(day = row_number()) %>%
  left_join(lookup_tbl, join_by(humidity >= low, humidity <= high)) %>%
  ggplot(aes(day, humidity, color = cat)) +
  geom_jitter(alpha = 0.5)

© www.soinside.com 2019 - 2024. All rights reserved.