在 dplyr 中的字符串列上过滤多个值

Question

我有一个

data.frame

，其中一列中有字符数据。我想从同一列中过滤

data.frame

中的多个选项。有没有一种我想念的简单方法可以做到这一点？

例子：

data.frame

名字 =

dat

days      name
88        Lynn
11        Tom
2         Chris
5         Lisa
22        Kyla
1         Tom
222       Lynn
2         Lynn

我想过滤掉

Tom

和

Lynn

例如。
当我这样做时：

target <- c("Tom", "Lynn")
filt <- filter(dat, name == target)

我得到这个错误：

longer object length is not a multiple of shorter object length

Answer 1

你需要

%in%

而不是

==

：

library(dplyr)
target <- c("Tom", "Lynn")
filter(dat, name %in% target)  # equivalently, dat %>% filter(name %in% target)

生产

  days name
1   88 Lynn
2   11  Tom
3    1  Tom
4  222 Lynn
5    2 Lynn

要了解原因，请考虑这里发生的事情：

dat$name == target
# [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE

基本上，我们将两个长度为

target

的向量循环四次以匹配

dat$name

的长度。换句话说，我们在做：

 Lynn == Tom
  Tom == Lynn
Chris == Tom
 Lisa == Lynn
 ... continue repeating Tom and Lynn until end of data frame

在这种情况下，我们不会收到错误，因为我怀疑您的数据框实际上有不同数量的不允许回收的行，但您提供的样本确实如此（8 行）。如果样本的行数是奇数，我会得到和你一样的错误。但即使回收有效，这显然也不是您想要的。基本上，声明

dat$name == target

等同于说：

为每个等于“Tom”的奇数值或每个等于“Lynn”的偶数值返回
TRUE
。

碰巧样本数据框中的最后一个值是偶数且等于“Lynn”，因此是上面的

TRUE

。

对比，

dat$name %in% target

说：

对于
dat$name
中的每个值，检查它是否存在于
target
中。

很不一样。这是结果：

[1]  TRUE  TRUE FALSE FALSE FALSE  TRUE  TRUE  TRUE

请注意，您的问题与

dplyr

无关，只是误用了

==

。

Answer 2

使用

base

包：

df <- data.frame(days = c(88, 11, 2, 5, 22, 1, 222, 2), name = c("Lynn", "Tom", "Chris", "Lisa", "Kyla", "Tom", "Lynn", "Lynn"))

# Three lines
target <- c("Tom", "Lynn")
index <- df$name %in% target
df[index, ]

# One line
df[df$name %in% c("Tom", "Lynn"), ]

输出：

  days name
1   88 Lynn
2   11  Tom
6    1  Tom
7  222 Lynn
8    2 Lynn

使用

sqldf

：

library(sqldf)
# Two alternatives:
sqldf('SELECT *
      FROM df 
      WHERE name = "Tom" OR name = "Lynn"')
sqldf('SELECT *
      FROM df 
      WHERE name IN ("Tom", "Lynn")')

Answer 3

这可以使用 CRAN 中提供的 dplyr 包来实现。实现这一目标的简单方法：

安装
```
dplyr
```
包。
运行下面的代码

library(dplyr) 

df<- select(filter(dat,name=='tom'| name=='Lynn'), c('days','name))

说明：

所以，一旦我们下载了 dplyr，我们就可以使用这个包中的两个不同函数创建一个新的数据框：

filter：第一个参数是数据框；第二个参数是我们希望它被子集化的条件。结果是整个数据框只有我们想要的行。 select：第一个参数是数据框；第二个参数是我们要从中选择的列的名称。我们不必使用 names() 函数，甚至不必使用引号。我们只是将列名列为对象。

Answer 4

 by_type_year_tag_filtered <- by_type_year_tag %>%
      dplyr:: filter(tag_name %in% c("dplyr", "ggplot2"))

Answer 5

写那个。例子：

library (dplyr)

target <- YourData%>% filter (YourColum %in% c("variable1","variable2"))

您的数据示例

target <- df%>% filter (names %in% c("Tom","Lynn"))

Answer 6

如果您的字符串列中有长字符串作为值您可以通过

stringr

包使用这个强大的方法。

filter( %in% )

和base R做不到的方法

library(dplyr)
library(stringr)

sentences_tb = as_tibble(sentences) %>%
                 mutate(row_number())
sentences_tb
# A tibble: 720 x 2
   value                                       `row_number()`
   <chr>                                                <int>
 1 The birch canoe slid on the smooth planks.               1
 2 Glue the sheet to the dark blue background.              2
 3 Its easy to tell the depth of a well.                   3
 4 These days a chicken leg is a rare dish.                 4
 5 Rice is often served in round bowls.                     5
 6 The juice of lemons makes fine punch.                    6
 7 The box was thrown beside the parked truck.              7
 8 The hogs were fed chopped corn and garbage.              8
 9 Four hours of steady work faced us.                      9
10 Large size in stockings is hard to sell.                10
# ... with 710 more rows                

matching_letters <- c(
  "canoe","dark","often","juice","hogs","hours","size"
)
matching_letters <- str_c(matching_letters, collapse = "|")
matching_letters
[1] "canoe|dark|often|juice|hogs|hours|size"

letters_found <- str_subset(sentences_tb$value,matching_letters)
letters_found_tb = as_tibble(letters_found)
inner_join(sentences_tb,letters_found_tb)

# A tibble: 16 x 2
   value                                          `row_number()`
   <chr>                                                   <int>
 1 The birch canoe slid on the smooth planks.                  1
 2 Glue the sheet to the dark blue background.                 2
 3 Rice is often served in round bowls.                        5
 4 The juice of lemons makes fine punch.                       6
 5 The hogs were fed chopped corn and garbage.                 8
 6 Four hours of steady work faced us.                         9
 7 Large size in stockings is hard to sell.                   10
 8 Note closely the size of the gas tank.                     33
 9 The bark of the pine tree was shiny and dark.             111
10 Both brothers wear the same size.                         253
11 The dark pot hung in the front closet.                    261
12 Grape juice and water mix well.                           383
13 The wall phone rang loud and often.                       454
14 The bright lanterns were gay on the dark lawn.            476
15 The pleasant hours fly by much too soon.                  516
16 A six comes up more often than a ten.                     609

有点冗长，但是如果你有很长的字符串并且想在特定单词所在的行中进行过滤，它会非常方便和强大。

与接受的答案比较：

> target <- c("canoe","dark","often","juice","hogs","hours","size")
> filter(sentences_tb, value %in% target)
# A tibble: 0 x 2
# ... with 2 variables: value <chr>, row_number() <int>

> df<- select(filter(sentences_tb,value=='canoe'| value=='dark'), c('value','row_number()'))
> df
# A tibble: 0 x 2
# ... with 2 variables: value <chr>, row_number() <int>

> target <- c("canoe","dark","often","juice","hogs","hours","size")
> index <- sentences_tb$value %in% target
> sentences_tb[index, ]
# A tibble: 0 x 2
# ... with 2 variables: value <chr>, row_number() <int>

你需要写下所有的句子才能得到想要的结果。

Answer 7

另一个选项可以使用

slice

和

which

来获取要过滤的值的索引。这是一些可重现的代码：

library(dplyr)
df %>%
  slice(which(name %in% c("Tom", "Lynn")))
#>   days name
#> 1   88 Lynn
#> 2   11  Tom
#> 3    1  Tom
#> 4  222 Lynn
#> 5    2 Lynn

^{创建于 2023-05-05 与 reprex v2.0.2}

使用的数据：

df = read.table(text = "days      name
88        Lynn
11        Tom
2         Chris
5         Lisa
22        Kyla
1         Tom
222       Lynn
2         Lynn", header = TRUE)

在 dplyr 中的字符串列上过滤多个值

问题描述投票：0回答：7

7个回答

有点冗长，但是如果你有很长的字符串并且想在特定单词所在的行中进行过滤，它会非常方便和强大。

最新问题

在 dplyr 中的字符串列上过滤多个值

问题描述 投票：0回答：7

7个回答

有点冗长，但是如果你有很长的字符串并且想在特定单词所在的行中进行过滤，它会非常方便和强大。

最新问题

问题描述投票：0回答：7