我有一个列表如下:
list(c("\n", "\n", "oesophagus graded and fine\n",
"\n", "\n", "\n", "stomach and antrum altough with some rfa response rfa\n",
"\n", "mucosa washed a lot\n", "\n", "treated with halo rfa ultra \n",
"\n", "total of 100 times\n", "\n", "duodenum looks ok"))
我想从列表,最接近于不同的列表中找到另一个词术语提取。
我期望的输出
antrum:rfa
我的第一清单:
EventList<-c("rfa", "apc", "dilat", "emr", "clip", "grasp", "probe", "iodine",
"acetic", "nac", "peg", "botox")
我的第二个名单是:
tofind<-"ascending|descending|sigmoid|rectum|transverse|caecum|splenic|ileum|rectosigmoid|ileocaecal|hepatic|colon|terminal|terminal ileum|ileoanal|prepouch|pouch|stomach|antrum|duodenum|oesophagus|goj|ogj|cardia|anastomosis"
我使用的代码是:
EventList %>%
map(
~words %>%
str_which(paste0('^.*', .x)) %>%
map_chr(
~words[1:.x] %>%
str_c(collapse = ' ') %>%
str_extract_all(regex(tofind, ignore_case = TRUE)) %>%
map_if(is_empty, ~ NA_character_) %>%
flatten_chr()%>%
`[[`(1) %>%
.[length(.)]
) %>%
paste0(':', .x)
) %>%
unlist() %>%
str_subset('.+:')
这使我的事件(在这种情况下rfa
),但不是分配它antrum
,它分配它oesophagus
。
因此,给它在tofind
列表中找到的第一个任期内,而不是最接近事件术语。
我怀疑行
`[[`(1) %>%
.[length(.)]
是罪魁祸首,但我不知道如何让它给了我最接近的期限,而不是第一项更改
下面给你的tofind
在EventList
匹配对于每一个匹配元素的最后一个元素
map(EventList,
function(event) {
indices <- map(words, str_which, pattern = event)
map(indices, function(i)
map2_chr(words, i, ~ .x[seq_len(.y)] %>%
str_c(collapse = ' ') %>%
str_extract_all(regex(tofind, ignore_case = TRUE), simplify = TRUE) %>%
last()) %>%
map_if(is_empty, ~ NA_character_)
) %>%
unlist() %>%
paste0(':', event)
}) %>%
unlist() %>%
str_subset('.+:')
# [1] "antrum:rfa" "oesophagus:rfa"