我在列中有一组随机文本,就像这样。
dplyr::tibble(text = c("I have a (brown) clock", "surrounded by (red) walls", "inside of a (blue) building with (dirty) windows",
"where (magical) things (unexpectedly) occur (spontaneously)"))
# A tibble: 4 x 1
text
<chr>
1 I have a (brown) clock
2 surrounded by (red) walls
3 inside of a (blue) building with (dirty) windows
4 where (magical) things (unexpectedly) occur (spontaneously)
我想把括号内最后出现的字符串 提取到另一列中 这样看起来就像这样了
dplyr::tibble(text = c("I have a (brown) clock", "surrounded by (red) walls", "inside of a (blue) building with (dirty) windows",
"where (magical) things (unexpectedly) occur (spontaneously)"),
extract = c("brown", "red", "dirty", "spontaneously"))
# A tibble: 4 x 2
text extract
<chr> <chr>
1 I have a (brown) clock brown
2 surrounded by (red) walls red
3 inside of a (blue) building with (dirty) windows dirty
4 where (magical) things (unexpectedly) occur (spontaneously) spontaneously
一个选项是 stri_extract_last
从 stringi
而且应该很快。 在这里,我们做一个regex lookaround来匹配开头的括号((?<=\\()
)后跟一个或多个非括号的字符([^\\)]+
)
library(dplyr)
df1 %>%
mutate(extract = stringi::stri_extract_last(text, regex = "(?<=\\()[^\\)]+"))
# A tibble: 4 x 2
# text extract
# <chr> <chr>
#1 I have a (brown) clock brown
#2 surrounded by (red) walls red
#3 inside of a (blue) building with (dirty) windows dirty
#4 where (magical) things (unexpectedly) occur (spontaneously) spontaneously