使用grepl创建基于另一个列的列

问题描述 投票:3回答:3

让我们考虑具有两列dfwordstem。我想创建一个新列,检查stem中是否包含word中的值,以及该值是在其他字符之前还是之后。最终结果应如下所示:

WORD     STEM     NEW
rerun    run      prefixed
runner   run      suffixed
run      run      none
...      ...      ...

下面您可以看到我的代码。但是,它不起作用,因为grepl表达式应用于df的所有行。无论如何,我认为这应该使我的想法更明确。

df$new <- ifelse(grepl(paste0('.+', df$stem, '.+'), df$word), 'both',
             ifelse(grepl(paste0(df$stem, '.+'), df$word), 'suffixed',
                ifelse(grepl(paste0('.+', df$stem), df$word), 'prefixed','none')))
r dataframe grepl
3个回答
1
投票

您可以使用startsWithendsWith子集矢量,例如:

c("none", "suffixed", "prefixed", "both")[1 + startsWith(x$WORD, x$STEM) +
 2*endsWith(x$WORD, x$STEM)]
#[1] "prefixed" "suffixed" "both"    

或者在WORDSTEM相等的情况下,应返回none

c("none", "suffixed", "prefixed", "both")[1 + (startsWith(x$WORD, x$STEM) +
 2*endsWith(x$WORD, x$STEM)) * !(x$WORD == x$STEM)]
#[1] "prefixed" "suffixed" "none"    

0
投票

您可以像这样创建new

df$new <- ifelse(startsWith(df$word, df$stem) & endsWith(df$word, df$stem), 'both',
                 ifelse(startsWith(df$word, df$stem), 'suffixed',
                        ifelse(endsWith(df$word, df$stem), 'prefixed',
                               'none')))

或者,如果您在dplyr管道中,并且想要避免所有烦人的df$

df %>% 
  mutate(new = ifelse(startsWith(word, stem) & endsWith(word, stem), 'both',
                      ifelse(startsWith(word, stem), 'suffixed',
                             ifelse(endsWith(word, stem), 'prefixed',
                                    'none'))))

输出

#       word stem     new1
# 1    rerun  run prefixed
# 2   runner  run suffixed
# 3      run  run     both

0
投票

这是str_locatestringr中使用dplyr的方法:

library(dplyr)
library(stringr)
data %>%
  mutate_at(vars(WORD,STEM), as.character) %>%
  mutate(NEW = 
         case_when(str_locate(WORD,STEM)[,"start"] > 1 &
                   str_locate(WORD,STEM)[,"end"] < nchar(WORD) ~ "both",
                   str_locate(WORD,STEM)[,"start"] > 1 ~ "prefixed",
                   str_locate(WORD,STEM)[,"end"] < nchar(WORD) ~ "suffixed",
                   TRUE ~ "none"))
    WORD STEM      NEW
1  rerun  run prefixed
2 runner  run suffixed
3    run  run     none

我加了一行以将WORDSTEM转换为字符,以防它们成为因素。

© www.soinside.com 2019 - 2024. All rights reserved.