计数 1 或 0，如果列表元素标签匹配 var2

Question

我在 R 中有一个数据框

mydf

.

mydf<-data.frame(
  last_name <- c("Jay", "Kelly", "Mark", "Lisa", "Jay", "Kelly", "Mark", "Lisa", "Lisa", "Lisa", "Kelly", "Kelly"),
  first_name <- c("Lee", "Ty", "Ben", "Joe", "Lee", "Ty", "Ben", "Joe", "Joe", "Joe", "Ty", "Ty"),
  state_abbrevs <- c("KY", "UT", "OH", "IA", "KY", "UT", "OH", "IA", "IA", "IA", "UT", "UT"),
  tw_year <-c(1998, 2001, 2001, 2003, 1998, 2001, 2001, 2003, 2004, 2003, 2000, 2002),
  tw_month <-c(1, 3, 4, 5, 12, 1, 3, 4, 5, 3, 1, 10),
  text <-c("Thanks to everyone in Orange County!", "Several cities and townships in the state are flooded", "New Mexico's communities are suffering",
         "Today is the LAST day to register", "Ohio Senator Jay is great", "I love UT and KY - both are awesome",
         "On Monday, the prez will release a statement", "Ohio Senator Jay is great", "The villages in Iowa are nice", 
         "The villages, cities and towns in iowa are nice", "Salt Late City is crazy this time of year", "I only drink S.Pellegrino"))

我还有一个名为

terms

的列表列表

terms <- list(
  NH = c("New Hampshire", "NH", "Village", "Villages", "villages", "County", "county", "City", "cities"),
  IA = c("iowa", "Iowa", "IA", "Village", "Villages", "villages", "County", "county", "City", "cities"),
  KY = c("Kentucy", "KY", "ky", "Village", "Villages", "villages", "County", "county", "City", "cities"),
  OH = c("Ohio", "OH", "oh", "Village", "Villages", "villages", "County", "county", "City", "cities"))

我想创建一个新的数据集，它提供了一个计数，即

mydf$text

中的每个观察值是否至少包含列表元素中的一个字符串（不确定列表元素是否是正确的术语？）但前提是

mydf$state_abbrev

匹配列表的相应元素标签。例如，如果列表标签（“NH”、“IA”、“KY”或“OH”）与 mydf$text 观察对应的

mydf$text

匹配，我只想匹配

mydf$state_abbrev

的字符串.

在尝试了多种无效的代码变体之后，我一直在使用

dplyr

关注下面的代码 - 问题是我已经尝试解决这个问题好几天了，我需要一双新的眼睛。

library(dplyr)

df_count <- mydf %>% 
  mutate(state_abbrev = tolower(state_abbrev)) %>% # convert state abbreviations to lowercase
  mutate(terms = terms[[state_abbrev]]) %>% # match st_terms to state_abbrev
  group_by(last_name, first_name, tw_year, tw_month) %>% # group by desired variables
  summarize(terms_count = sum(str_detect(text, regex(paste(terms, collapse = "|"), ignore_case = TRUE)) %>% as.integer())) # count the number of matches for each group

Error in `mutate()`:
ℹ In argument: `state_abbrev = tolower(state_abbrev)`.
Caused by error in `tolower()`:
! object 'state_abbrev' not found
Run `rlang::last_trace()` to see where the error occurred.
> rlang::last_trace()
<error/dplyr:::mutate_error>
Error in `mutate()`:
ℹ In argument: `state_abbrev = tolower(state_abbrev)`.
Caused by error in `tolower()`:
! object 'state_abbrev' not found
---
Backtrace:
     ▆
  1. ├─... %>% ...
  2. ├─dplyr::summarize(...)
  3. ├─dplyr::group_by(., last_name, first_name, tw_year, tw_month)
  4. ├─dplyr::mutate(., terms = terms[[state_abbrev]])
  5. ├─dplyr::mutate(., state_abbrev = tolower(state_abbrev))
  6. ├─dplyr:::mutate.data.frame(., state_abbrev = tolower(state_abbrev))
  7. │ └─dplyr:::mutate_cols(.data, dplyr_quosures(...), by)
  8. │   ├─base::withCallingHandlers(...)
  9. │   └─dplyr:::mutate_col(dots[[i]], data, mask, new_columns)
 10. │     └─mask$eval_all_mutate(quo)
 11. │       └─dplyr (local) eval()
 12. └─base::tolower(state_abbrev)
Run rlang::last_trace(drop = FALSE) to see 3 hidden frames.

新的数据集应该是这个样子：

姓氏	名字	state_abbrev	tw_month	tw_year	count_text
杰	李	KY	1	1998	2
杰	李	KY	1	2000	0
丽莎	乔	IA	5	2004	1
丽莎	乔	IA	4	2003	0
丽莎	乔	IA	3	2003	1

计数 1 或 0，如果列表元素标签匹配 var2

问题描述投票：0回答：0

最新问题

计数 1 或 0，如果列表元素标签匹配 var2

问题描述 投票：0回答：0

最新问题

问题描述投票：0回答：0