如何计算R中文本的年？

Question

我想对以下名为txt的文本中的左括号和右括号之间的年份进行计数。

library(stringr)
txt <- "Text Mining exercise (2020) Mining, p. 628508; Computer Science text analysis (1998) Computer Science, p.345-355; Introduction to data mining (2015) J. Data Science, pp. 31-33"

lengths(strsplit(txt,"\\(\\d{4}\\)"))给我4，这是错误的。有什么帮助吗？

Answer 1

您可以将str_extract_all与正向和正则表达式一起使用。

stringr::str_extract_all(txt, '(?<=\\()\\d+(?=\\))')[[1]]
#[1] "2020" "1998" "2015"

如果要计算存在的数量，请在其上使用length。

length(stringr::str_extract_all(txt, '(?<=\\()\\d+(?=\\))')[[1]])
#[1] 3

[也许使用str_match_all更容易

stringr::str_match_all(txt, '\\((\\d+)\\)')[[1]][, 2]
#[1] "2020" "1998" "2015"

Answer 2

如果您喜欢Base-R

regmatches(txt, gregexpr("[^0-9]\\d{4}[^0-9]", txt))

给予

[[1]]
[1] "(2020)" "(1998)" "(2015)"

并且如果将其包装在lengths( ... )中，我们将获得正确的答案

编辑：或者如果您真的只想要计数，我们可以缩短为

lengths(gregexpr("[^0-9]\\d{4}[^0-9]", txt))

Answer 3

我认为您正在寻找stringr::str_count()：

str_count(txt, "\\([0-9]{4}\\)")
[1] 3

仅在括号内仅包含四位数的数字，该数字也以1或2开头，后跟0或9：

str_count(txt, "\\([1-2][0|9][0-9]{2}\\)")

如何计算R中文本的年？

问题描述投票：1回答：3

3个回答

最新问题

如何计算R中文本的年？

问题描述 投票：1回答：3

3个回答

最新问题

问题描述投票：1回答：3