在小标题的整行中搜索字符串？

Question

我正在尝试清理来自许多不同组的样本信息表，因此我关心的治疗信息可能位于任意数量的不同列中。这是一个抽象的例子：

sample_info = tribble(
  ~id, ~could_be_here, ~or_here,    ~or_even_in_this_one,
  1,   NA,             "not_me",    "find_me_other_stuff",
  2,   "Extra_Find_Me", NA,         "diff_stuff",
  3,   NA,              "Find_me",  NA,
  4,   NA,              "not_here", "not_here_either"
)

我想要在哪里找到“find_me”1）不区分大小写，2）它可以在任何列中，3）它可以作为更大字符串的一部分。我想创建一列，判断是否在任何列中找到“find_me”，该列为 TRUE 或 FALSE。我怎样才能做到这一点？（我想过对所有列进行

unite

操作，然后在混乱的情况下运行

str_detect

，但一定有一种不那么老套的方法，对吧？）

要明确的是，我想要一个相当于

sample_info %>% mutate(find_me = c(TRUE, TRUE, TRUE, FALSE))

的最终小标题。

我希望我会像下面链接的类似情况一样使用

stringr::str_detect(., regex('find_me', ignore_case = T))

和

pmap_lgl(any(c(...) <insert logic check>))

之类的东西，但我不确定如何将它们组合在一起形成一个兼容 mutate 的语句。

我看过的事情：
按行操作查看是否有任何列位于任何其他列表中

R：使用str_detect时如何忽略大小写？

在R中，检查字符串是否出现在数据帧的行中（在任何列中）

Answer 1

一个

dplyr

和

purrr

选项可以是：

sample_info %>%
 mutate(find_me = pmap_lgl(across(-id), ~ any(str_detect(c(...), regex("find_me", ignore_case = TRUE)), na.rm = TRUE)))

     id could_be_here or_here  or_even_in_this_one find_me
  <dbl> <chr>         <chr>    <chr>               <lgl>  
1     1 <NA>          not_me   find_me_other_stuff TRUE   
2     2 Extra_Find_Me <NA>     diff_stuff          TRUE   
3     3 <NA>          Find_me  <NA>                TRUE   
4     4 <NA>          not_here not_here_either     FALSE

或者只使用

dplyr

:

sample_info %>%
 rowwise() %>%
 mutate(find_me = any(str_detect(c_across(-id), regex("find_me", ignore_case = TRUE)), na.rm = TRUE))

Answer 2

我希望我明白你的意思。这就是我在多列中查找所有

find_me

的方法：

library(dplyr)
library(purrr)
library(stringr)

sample_info = tribble(
  ~id, ~could_be_here, ~or_here,    ~or_even_in_this_one,
  1,   NA,             "not_me",    "find_me_other_stuff",
  2,   "Extra_Find_Me", NA,         "diff_stuff",
  3,   NA,              "Find_me",  NA,
  4,   NA,              "not_here", "not_here_either"
)

sample_info %>%
  mutate(find_me_exist = if_any(, ~ str_detect(., regex("find_me", ignore_case = TRUE), )
                                , .names = "{.col}.fn{.fn}"))

# A tibble: 4 x 5
     id could_be_here or_here  or_even_in_this_one find_me_exist
  <dbl> <chr>         <chr>    <chr>               <lgl>        
1     1 NA            not_me   find_me             TRUE         
2     2 Extra_Find_me NA       diff_stuff          TRUE         
3     3 NA            find_Me  NA                  TRUE         
4     4 NA            not_here not_here_either     FALSE

抱歉，我必须编辑我的代码，使其不区分大小写。

Answer 3

如果您确实想尝试这种 hacky 方式，您使用

unite

的想法确实有效：

 sample_info %>% unite(new, remove = FALSE) %>% 
    mutate(found = str_detect(.$new, regex("find_me", ignore_case = TRUE))) %>% 
    select(-new)

Answer 4

这是

dplyr::if_any

的典型用例。所选列的

if_any

匹配，新列输出为 TRUE。将

regex()

与参数

ignore_case = TRUE

一起使用可实现不区分大小写的匹配。

library(dplyr)
library(stringr)

sample_info |> 
    mutate(find_me = if_any(-id,\(x) str_detect(x, regex("find_me", ignore_case = TRUE))))

# A tibble: 4 × 5
     id could_be_here or_here  or_even_in_this_one find_me
  <dbl> <chr>         <chr>    <chr>               <lgl>  
1     1 NA            not_me   find_me_other_stuff TRUE   
2     2 Extra_Find_Me NA       diff_stuff          TRUE   
3     3 NA            Find_me  NA                  TRUE   
4     4 NA            not_here not_here_either     NA

在小标题的整行中搜索字符串？

问题描述投票：0回答：4

4个回答

最新问题

在小标题的整行中搜索字符串？

问题描述 投票：0回答：4

4个回答

最新问题

问题描述投票：0回答：4