我有一个单词列表,如下
wordlist
所示:
wordlist <- data.frame(words = c("anywhere", "youll", "feel", "comfortable", "please", "dont"))
我有另一个带有辅音列表的数据框:
consonants <- data.frame(consonants = c("b", "c", "d", "f", "g", "h"))
我想在
wordlist
中创建一个名为 word_structure
的新变量,其中所有辅音都替换为 "C"
,所有元音都替换为 "V"
:
wordlist$word_structure <- c("VCCCCVCV", "CVVCC", "CVVC", "CVCCVCCVCCV", "CCVVCV", "CVCC")
我不知道如何将条件格式与
gsub
结合起来以获得我需要的内容。
这似乎比
chartr()
更适合 gsub()
:
vowels <- c("a", "e", "i", "o", "u")
consonants <- letters[!letters %in% vowels]
wordlist$word_structure <- chartr(
new = paste(c(rep("V", 5), rep("C", 21)), collapse = ""),
old = paste(c(vowels, consonants), collapse = ""),
wordlist$words)
wordlist
words word_structure
1 anywhere VCCCCVCV
2 youll CVVCC
3 feel CVVC
4 comfortable CVCCVCCVCCV
5 please CCVVCV
6 dont CVCC
使用
stringr::str_replace_all
可以替换多种模式:
注意:我使用了五个元音的向量,而不是不完整的辅音列表。
wordlist <- data.frame(words = c("anywhere", "youll", "feel", "comfortable", "please", "dont"))
vowels <- c("a", "e", "i", "o", "u")
patterns <- setNames(
c("C", "V"),
c(
paste0(c("[^", vowels, "]"), collapse = ""),
paste0(c("[", vowels, "]"), collapse = "")
)
)
patterns
#> [^aeiou] [aeiou]
#> "C" "V"
stringr::str_replace_all(
wordlist$words,
patterns
)
#> [1] "VCCCCVCV" "CVVCC" "CVVC" "CVCCVCCVCCV" "CCVVCV"
#> [6] "CVCC"