如果条件匹配,使用 mutate 粘贴列名

问题描述 投票:0回答:4

假设我在心理学领域工作,我想知道患者有多少危险因素。之后,我想列出所有风险,然后发现最普遍的风险(模式)。我正在考虑使用

mutate
,然后使用
paste0
,如果该行的值为“风险”,则获取
colname
。然而,我对此感到很难。 感谢任何帮助。

代码如下:

library(tidyverse)
df = data.frame(
  patient = seq(1:60),
  cancer = c("risk","ok"), 
  blood_pres = c("risk", "ok"),
  low_education = c("risk","ok")
) 

df = df %>% mutate(how_many_risks =
                     rowSums(. == "risk"))
r dplyr tidyr rowwise
4个回答
1
投票

让我们想出一些更有趣的数据。

set.seed(43)
df <- data.frame(patient = 1:10, cancer = sample(c("risk","ok"), size=10, replace=TRUE), blood_pres = sample(c("risk","ok"), size=10, replace=TRUE), low_education = sample(c("risk","ok"), size=10, replace=TRUE))
df
#    patient cancer blood_pres low_education
# 1        1     ok       risk          risk
# 2        2     ok       risk          risk
# 3        3     ok         ok            ok
# 4        4   risk       risk          risk
# 5        5     ok         ok          risk
# 6        6   risk       risk            ok
# 7        7     ok         ok            ok
# 8        8     ok       risk            ok
# 9        9     ok         ok            ok
# 10      10   risk       risk          risk

从这里开始,我们将进行旋转、总结,然后连接回原始数据。

library(dplyr)
library(tidyr) # pivot_*
df %>%
  pivot_longer(cols = -patient, values_to = "risk") %>%
  filter(risk == "risk") %>%
  summarize(risks = toString(name), .by = patient) %>%
  left_join(df, ., by = "patient")
#    patient cancer blood_pres low_education                             risks
# 1        1     ok       risk          risk         blood_pres, low_education
# 2        2     ok       risk          risk         blood_pres, low_education
# 3        3     ok         ok            ok                              <NA>
# 4        4   risk       risk          risk cancer, blood_pres, low_education
# 5        5     ok         ok          risk                     low_education
# 6        6   risk       risk            ok                cancer, blood_pres
# 7        7     ok         ok            ok                              <NA>
# 8        8     ok       risk            ok                        blood_pres
# 9        9     ok         ok            ok                              <NA>
# 10      10   risk       risk          risk cancer, blood_pres, low_education

(请注意,使用

dplyr_1.1.0
需要
.by=
或更高版本。如果您有较旧的 dplyr 并且不会更新,请改用
group_by(patient)
而不是
.by=patient
。)

您可能需要考虑的事情:除非这仅适用于演示表格,否则将

risks
作为列表列而不是逗号分隔的字符串有时会更有利。为此,只需将
toString
替换为
list
,虽然它可能在控制台上 render 相同,但它将允许在其上执行诸如设置操作之类的操作(尽管正常的列/向量操作可能无法按您的预期工作) ):

out <- df %>%
  pivot_longer(cols = -patient, values_to = "risk") %>%
  filter(risk == "risk") %>%
  summarize(risks = list(name), .by = patient) %>%
  left_join(df, ., by = "patient")
out
#    patient cancer blood_pres low_education                             risks
# 1        1     ok       risk          risk         blood_pres, low_education
# 2        2     ok       risk          risk         blood_pres, low_education
# 3        3     ok         ok            ok                              NULL
# 4        4   risk       risk          risk cancer, blood_pres, low_education
# 5        5     ok         ok          risk                     low_education
# 6        6   risk       risk            ok                cancer, blood_pres
# 7        7     ok         ok            ok                              NULL
# 8        8     ok       risk            ok                        blood_pres
# 9        9     ok         ok            ok                              NULL
# 10      10   risk       risk          risk cancer, blood_pres, low_education

如果此数据是小标题 (

tbl_df
),则相同的数据将呈现为

tibble(out)
# # A tibble: 10 × 5
#    patient cancer blood_pres low_education risks    
#      <int> <chr>  <chr>      <chr>         <list>   
#  1       1 ok     risk       risk          <chr [2]>
#  2       2 ok     risk       risk          <chr [2]>
#  3       3 ok     ok         ok            <NULL>   
#  4       4 risk   risk       risk          <chr [3]>
#  5       5 ok     ok         risk          <chr [1]>
#  6       6 risk   risk       ok            <chr [2]>
#  7       7 ok     ok         ok            <NULL>   
#  8       8 ok     risk       ok            <chr [1]>
#  9       9 ok     ok         ok            <NULL>   
# 10      10 risk   risk       risk          <chr [3]>

我们可以直接做一些事情,比如检查该列中每一行的长度;或者快速检查确切的集合成员资格:

lengths(out$risks)
#  [1] 2 2 0 3 1 2 0 1 0 3

sapply(out$risks, `%in%`, x = "cancer")
#  [1] FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE  TRUE

当然,这两个都可以用正则表达式来完成,但是..如果名称有任何歧义,正则表达式会带来一点开销。


0
投票

风险因素<- c('cancer', 'blood_pres', 'low_education')


0
投票

c_across()
功能是您所缺少的。使用您的示例数据:


risk_factors <- c('cancer', 'blood_pres', 'low_education')

df <- df %>%
  rowwise() %>% 
  mutate(how_many_risks = sum(c_across(all_of(risk_factors)) == "risk"),
         what_risks = paste0(risk_factors[which(c_across(all_of(risk_factors)) == "risk")], collapse = ";")) %>% 
  ungroup()

您可以添加额外的逻辑行,将空案例报告为“无”(如您的示例中所示):

df2 <- df %>% 
  mutate(what_risks = if_else(what_risks == "", "none", what_risks))

0
投票

我认为一次

mutate
调用就足以完成此操作(数据取自@r2evans)。

这里我没有使用

rowwise
,而是使用
sapply
来迭代行以查找与“risk”匹配的值。

library(dplyr)

set.seed(43)
df <- data.frame(patient = 1:10, cancer = sample(c("risk","ok"), size=10, replace=TRUE), blood_pres = sample(c("risk","ok"), size=10, replace=TRUE), low_education = sample(c("risk","ok"), size=10, replace=TRUE))

df %>% 
  mutate(how_many_risks = rowSums(. == "risk"),
         which_risks = ifelse(how_many_risks == 0, "no risk", paste0(sapply(1:nrow(df), \(x) paste(colnames(df[x, -1])[df[x, -1] == "risk"], collapse = ", ")))))

   patient cancer blood_pres low_education how_many_risks                       which_risks
1        1     ok       risk          risk              2         blood_pres, low_education
2        2     ok       risk          risk              2         blood_pres, low_education
3        3     ok         ok            ok              0                           no risk
4        4   risk       risk          risk              3 cancer, blood_pres, low_education
5        5     ok         ok          risk              1                     low_education
6        6   risk       risk            ok              2                cancer, blood_pres
7        7     ok         ok            ok              0                           no risk
8        8     ok       risk            ok              1                        blood_pres
9        9     ok         ok            ok              0                           no risk
10      10   risk       risk          risk              3 cancer, blood_pres, low_education
© www.soinside.com 2019 - 2024. All rights reserved.