计数 1 或 0,如果列表元素标签匹配 var2

问题描述 投票:0回答:0

我在 R 中有一个数据框

mydf
.

mydf<-data.frame(
  last_name <- c("Jay", "Kelly", "Mark", "Lisa", "Jay", "Kelly", "Mark", "Lisa", "Lisa", "Lisa", "Kelly", "Kelly"),
  first_name <- c("Lee", "Ty", "Ben", "Joe", "Lee", "Ty", "Ben", "Joe", "Joe", "Joe", "Ty", "Ty"),
  state_abbrevs <- c("KY", "UT", "OH", "IA", "KY", "UT", "OH", "IA", "IA", "IA", "UT", "UT"),
  tw_year <-c(1998, 2001, 2001, 2003, 1998, 2001, 2001, 2003, 2004, 2003, 2000, 2002),
  tw_month <-c(1, 3, 4, 5, 12, 1, 3, 4, 5, 3, 1, 10),
  text <-c("Thanks to everyone in Orange County!", "Several cities and townships in the state are flooded", "New Mexico's communities are suffering",
         "Today is the LAST day to register", "Ohio Senator Jay is great", "I love UT and KY - both are awesome",
         "On Monday, the prez will release a statement", "Ohio Senator Jay is great", "The villages in Iowa are nice", 
         "The villages, cities and towns in iowa are nice", "Salt Late City is crazy this time of year", "I only drink S.Pellegrino"))

我还有一个名为

terms

的列表列表
terms <- list(
  NH = c("New Hampshire", "NH", "Village", "Villages", "villages", "County", "county", "City", "cities"),
  IA = c("iowa", "Iowa", "IA", "Village", "Villages", "villages", "County", "county", "City", "cities"),
  KY = c("Kentucy", "KY", "ky", "Village", "Villages", "villages", "County", "county", "City", "cities"),
  OH = c("Ohio", "OH", "oh", "Village", "Villages", "villages", "County", "county", "City", "cities"))

我想创建一个新的数据集,它提供了一个计数,即

mydf$text 
中的每个观察值是否至少包含列表元素中的一个字符串(不确定列表元素是否是正确的术语?)但前提是
mydf$state_abbrev
匹配列表的相应元素标签。例如,如果列表标签(“NH”、“IA”、“KY”或“OH”)与 mydf$text 观察对应的
mydf$text
匹配,我只想匹配
mydf$state_abbrev
的字符串.

在尝试了多种无效的代码变体之后,我一直在使用

dplyr
关注下面的代码 - 问题是我已经尝试解决这个问题好几天了,我需要一双新的眼睛。

library(dplyr)

df_count <- mydf %>% 
  mutate(state_abbrev = tolower(state_abbrev)) %>% # convert state abbreviations to lowercase
  mutate(terms = terms[[state_abbrev]]) %>% # match st_terms to state_abbrev
  group_by(last_name, first_name, tw_year, tw_month) %>% # group by desired variables
  summarize(terms_count = sum(str_detect(text, regex(paste(terms, collapse = "|"), ignore_case = TRUE)) %>% as.integer())) # count the number of matches for each group

Error in `mutate()`:
ℹ In argument: `state_abbrev = tolower(state_abbrev)`.
Caused by error in `tolower()`:
! object 'state_abbrev' not found
Run `rlang::last_trace()` to see where the error occurred.
> rlang::last_trace()
<error/dplyr:::mutate_error>
Error in `mutate()`:
ℹ In argument: `state_abbrev = tolower(state_abbrev)`.
Caused by error in `tolower()`:
! object 'state_abbrev' not found
---
Backtrace:
     ▆
  1. ├─... %>% ...
  2. ├─dplyr::summarize(...)
  3. ├─dplyr::group_by(., last_name, first_name, tw_year, tw_month)
  4. ├─dplyr::mutate(., terms = terms[[state_abbrev]])
  5. ├─dplyr::mutate(., state_abbrev = tolower(state_abbrev))
  6. ├─dplyr:::mutate.data.frame(., state_abbrev = tolower(state_abbrev))
  7. │ └─dplyr:::mutate_cols(.data, dplyr_quosures(...), by)
  8. │   ├─base::withCallingHandlers(...)
  9. │   └─dplyr:::mutate_col(dots[[i]], data, mask, new_columns)
 10. │     └─mask$eval_all_mutate(quo)
 11. │       └─dplyr (local) eval()
 12. └─base::tolower(state_abbrev)
Run rlang::last_trace(drop = FALSE) to see 3 hidden frames.

新的数据集应该是这个样子:

姓氏 名字 state_abbrev tw_month tw_year count_text
KY 1 1998 2
KY 1 2000 0
丽莎 IA 5 2004 1
丽莎 IA 4 2003 0
丽莎 IA 3 2003 1
r dplyr mutate
© www.soinside.com 2019 - 2024. All rights reserved.