我有一个包含很多行的 Excel 工作表,我想要:用逗号分割特定列中的行(该列描述祖先,它有数字和逗号),然后创建一个函数,其中我只接受以以下开头的单词大写字母。然后抽象这些单词并将它们放入一个循环中,这样我就可以创建一个以大写字母开头的连续单词列表。之后我想创建一个列表,在其中我可以看到每个单词的频率。
我使用了函数 str_extract_all(data$
INITIAL SAMPLE DESCRIPTION
, " [A-Z]\w*") |> unique()
其中 INITIAL SAMPLE DESCRIPTION
是我感兴趣的专栏名称。
有这样的事吗?提取首字母大写字母后跟任意字母字符零次或多次的单词。将以下代码应用到每个列元素。
要在上面列出结果,好吧,
unlist
和table
它。
x <- 'I have an excel sheet with a lot of rows and i want: to split the rows in a specific column by commas (this column describes ancestry and it has numbers and commas), then create a function where i only take words that start with capital letters. Then abstract these words and put them in a loop, so I can create a list of words that go together in a row that start with capital letters. After that i want to create a list where i can see the frequencies of each of these words.
I used the function str_extract_all(data$INITIAL SAMPLE DESCRIPTION, "\\b[A-Z]\\w*") |> unique() Where INITIAL SAMPLE DESCRIPTION is the name of the column of my interest.
'
cap <- stringr::str_extract_all(x, "[A-Z][[:alpha:]]*")
cap
#> [[1]]
#> [1] "I" "Then" "I" "After" "I"
#> [6] "INITIAL" "SAMPLE" "DESCRIPTION" "A" "Z"
#> [11] "Where" "INITIAL" "SAMPLE" "DESCRIPTION"
cap |> unlist() |> table()
#>
#> A After DESCRIPTION I INITIAL SAMPLE
#> 1 1 2 3 2 2
#> Then Where Z
#> 1 1 1
创建于 2023 年 12 月 22 日,使用 reprex v2.0.2